On Mon, 2006-05-15 at 12:02 +0300, Guido Flohr wrote: [snip]
Why all the hassle? Why does libintl-perl not "respect" that utf-8 flag? The answer is a little off-topic, and therefore I only summarize the problem: The gettext API does not allow you to portably find out the character set of a string returned by gettext() (or ngettext, etc.). It doesn't even tell you whether a string has actually been translated or not. On the other hand, the API allows you to enforce a certain output character set by the use of bind_textdomain_codeset(), a relatively new function. Therefore, libintl-perl does the right thing(tm): Since the character set of the output of gettext() and friends is unknown, the library turns the utf-8 flag unconditionally off on these strings. However, if you have enforced a certain character set, you can override the library by unconditionally turning the flag on (or use an even smarter filter). A lot of hassle, but honestly, I don't understand why Gtk2 uses this flag at all in the first place. We can perfectly make do without in the C version, why make a difference in Perl?
gtk+ requires all strings to be utf8, widgets will croak if strings are not valid. All strings returned from gtk+ will be utf8 also. String operators in perl will no work correctly on uft8 encoded strings if the utf8 flag is not set. Therefore we need to set the flag on output. And seeing that we set the flag on output, we might as well let perl handle the "upgrading" of strings to uft8 on input. One might argue that the dual encoding setup in perl is a bad idea, but that really doesn't matter. Bad idea or not - its there. And perl will break if you don't flag strings correctly. Obviously, this kind of problem doesn't exist i C, as C doesn't have any string operators... I hope my answer make sense. Feel free to catch me on irc if it does not.. ./borup
Attachment:
smime.p7s
Description: S/MIME cryptographic signature