Re: utf8 odd behavior with Gtk2

From: Torsten Schoenfeld <kaffeetisch gmx de>
To: gtk-perl-list gnome org
Subject: Re: utf8 odd behavior with Gtk2
Date: Sun, 01 Jul 2012 19:24:28 +0200

On 01.07.2012 14:32, zentara wrote:

What you will notice, or I do with perl 5.14.1, is that the placement
of the "use utf8::all" changes what is decoded properly.  If that use line
comes before the Gtk2 modules, it dosn't decode input. If placed after, it
works fine.  Furthermore, if you comment out the Gtk2 modules, it works
right.

This is due to Gtk2's treatment of @ARGV. When you call Gtk2::init (forexample via 'use Gtk2 -init'), it copies @ARGV into a C array and passesit on to gtk_init, which might remove entries from it. To make thesechanges visible to the Perl programmer, Gtk2::init then clears @ARGV andcopies the contents of the C array back into it. The problem you foundoccurs because all this copying does not take the UTF8 flag into account(it simply uses SvPV and newSVpv).

So when you use utf8::all before Gtk2, @ARGV contains strings whoseinternal representation is in UTF8. When Gtk2::init then reconstructs@ARGV from the C array, it creates Perl strings from UTF8 encoded bytesequences but does not mark the strings as such (i.e. it does not setthe UTF8 flag). When you print these strings, perl sees no UTF8 flagand so assumes they contain Latin1-encoded byte sequences and tries toconvert them to UTF8. This leads to the doubly-encoded output that you see.

So the diagnosis is easy enough. I'm not so certain about the correctfix, though.

â Do we continue to use SvPV/newSVpv but also store the UTF8 flag, andif it was set, restore it?

â Do we switch to always using SvPVutf8/newSVpvn_utf8, assuming that@ARGV always contains UTF-8-encoded data?

â Do we switch to always using SvPVbyte/newSVpv, assuming that @ARGValways contains Latin1-encoded data?

I'm leaning towards the first option, but I'm not sure. I don't have afirm grasp on the Perl/UTF-8/XS complex yet, and I've yet to see cleardocumentation for XS authors.

Follow-Ups:
- Re: utf8 odd behavior with Gtk2
  - From: zentara
- Re: utf8 odd behavior with Gtk2
  - From: Kevin Ryde

References:
- utf8 odd behavior with Gtk2
  - From: zentara

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]