Re: UTF characters from gettext not displaying properly




On Nov 11, 2004, at 9:46 PM, Erik Meitner wrote:

I am in need of assistance with creating an i18n enabled GTK2/Perl application.

i know next to nothing about gettext and internationalization and all that (except that "i18n" stands for :-), but thought i'd have a go at it.


Below is a simple test case that exhibits this behavior:

--- SNIP ---
[snip]
--- /SNIP ---

I run the program like this:
LANG=bg_BG ./i18ntest2.pl

All textual aspects of the MessageDialog are in Bulgarian cyrillic,
but the message text is not. It displays:
Message: Òàçè å ïðåâîä

i didn't even get that, at first, until i figured out that you have to make a .mo from the .po. ;-)


This thread leads me to believe that there may be a bug in gettext
causing this:
http://mail.gnome.org/archives/gtk-perl-list/2004-June/msg00121.html
In the thread it is sid that gettext is not marking returned UTF-8 as
such. I tried doing as the author did: using Unicode::MapUTF8 to
change the text back to IOS-8859-1. The results seem to indicate this
is not the problem.

appears to be the problem, indeed. i had a copy of the source for Locale::gettext 1.03 lying around (probably for podbrowser), and hacked at it for a second to make it spit out text marked as utf8...

---------------
--- gettext.xs.old      2004-11-13 14:51:38.450544824 -0500
+++ gettext.xs  2004-11-13 14:52:23.017769576 -0500
@@ -36,6 +36,10 @@
 char *
 gettext(msgid)
        char *          msgid
+    CLEANUP:
+       /* HACK: force utf-8.  this is not always correct.
+        * do not try this at home.  post no bills.  move along. */
+       SvUTF8_on (ST (0));

 char *
 dcgettext(domainname, msgid, category)
---------------

then when i ran the program as you showed, i got this:

---------------
homie:~$ LANG=bg_BG perl -I src/gettext-1.03/blib/lib/ -I src/gettext-1.03/blib/arch/ i18ntest.pl

WARNING **: Invalid UTF8 string passed to pango_layout_set_text() at i18ntest.pl line 27.
----------------

erm, looks like my hack messed things up, because the text returned by gettext is not actually utf8.

then i thought, can't i force it to be?  what about LANG=bg_BG.UTF-8?

voila --- valid text in the message box.


so, yes, it looks like it's a gettext() bug. i have no idea how else you could work around it.

--
Our enemies are innovative and resourceful, and so are we. They never stop thinking about new ways to harm our country and our people, and neither do we.
  -- President George W. Bush




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]