Re: String Encocoding in Gtk-Perl: Hoping for Clarification




On 2010-05-09 08:41, Emmanuel Rodriguez wrote:
Gtk2-Perl also returns all strings in UTF-8. If you want to see if a
string is in UTF-8 or not I suggest that you use Devel::Peek [1]
 

    my $button = new Gtk2::Button( "print" );
    my $entry  = new Gtk2::Entry;
    $button->signal_connect( "clicked",sub {print $entry->get_text,"\n"} );

 I'm certain that $entry->get_text returns an UTF-8 string. I'm guessing
that the problem lies on the encoding of STDOUT.

I am not concerned with the Perl-internal representation of the string
(looking with Devel::Peek as you suggested, it seems that internally it
is indeed stored as utf8 data)- when I access the data, I get 8-bit
latin1 data, long before any file IO is involved. In the example above,
when I print the data using

 print map( {sprintf("%X ", $_) } unpack("C*", $entry->get_text )), "\n"

and enter "ÃÃ" (LATIN SMALL LETTER O WITH DIAERESIS,
LATIN SMALL LETTER A WITH DIAERESIS) I get "F6 E4".

I just managed to get an old system up and running again (Perl 5.8.5,
Gtk2-Perl 1.144) and there the same little program will print
"C3 B6 C3 A4".

It's not like I prefer one over the other, but I am pretty sure that
this would be the case on all older systems (my original program that
brought up this issue was many years old; in this program there is a
comparison between a string that is known to be ISO-8859-1 and a string
from GTK; to get correct results, I always had to explicitly convert the
latter).

So obviously there was a change somewhere, and I would like to know when
and where this change occurred so I can adjust my program that it will
equally run on old and new systems (my current solution to check if the
data I get looks like utf8 is quite insane, because the result does not
depend on user input but on the system the program is running on ...)

Regards,
                              Peter




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]