Re: utf8 odd behavior with Gtk2
- From: Torsten Schoenfeld <kaffeetisch gmx de>
- To: gtk-perl-list gnome org
- Subject: Re: utf8 odd behavior with Gtk2
- Date: Sun, 01 Jul 2012 19:24:28 +0200
On 01.07.2012 14:32, zentara wrote:
What you will notice, or I do with perl 5.14.1, is that the placement
of the "use utf8::all" changes what is decoded properly. If that use line
comes before the Gtk2 modules, it dosn't decode input. If placed after, it
works fine. Furthermore, if you comment out the Gtk2 modules, it works
right.
This is due to Gtk2's treatment of @ARGV. When you call Gtk2::init (for
example via 'use Gtk2 -init'), it copies @ARGV into a C array and passes
it on to gtk_init, which might remove entries from it. To make these
changes visible to the Perl programmer, Gtk2::init then clears @ARGV and
copies the contents of the C array back into it. The problem you found
occurs because all this copying does not take the UTF8 flag into account
(it simply uses SvPV and newSVpv).
So when you use utf8::all before Gtk2, @ARGV contains strings whose
internal representation is in UTF8. When Gtk2::init then reconstructs
@ARGV from the C array, it creates Perl strings from UTF8 encoded byte
sequences but does not mark the strings as such (i.e. it does not set
the UTF8 flag). When you print these strings, perl sees no UTF8 flag
and so assumes they contain Latin1-encoded byte sequences and tries to
convert them to UTF8. This leads to the doubly-encoded output that you see.
So the diagnosis is easy enough. I'm not so certain about the correct
fix, though.
â Do we continue to use SvPV/newSVpv but also store the UTF8 flag, and
if it was set, restore it?
â Do we switch to always using SvPVutf8/newSVpvn_utf8, assuming that
@ARGV always contains UTF-8-encoded data?
â Do we switch to always using SvPVbyte/newSVpv, assuming that @ARGV
always contains Latin1-encoded data?
I'm leaning towards the first option, but I'm not sure. I don't have a
firm grasp on the Perl/UTF-8/XS complex yet, and I've yet to see clear
documentation for XS authors.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]