Re: glade localization woes



On Wed, Jun 16, 2004 at 10:52:00 +0200, Thierry Vignaud wrote:
Jan Hudec <bulb ucw cz> writes:

It seems that there is a bug in gettext. Perl has two types of
strings -- octet streams and unicode. It's a mess,
sometimes. Read the perlunicode manpage to find more about
it. Now what really should be done is forcing gettext to mark
the string as unicode one if it is one.

Is it anything I'm doing wrong, or is it a Perl Glade bug (which
is what I'm inclined to believe)?

No. It's a gettext bug. It returns unicode string, but does not mark
it as such. And all the glib-perl/gtk2-perl/gtk2-glade-perl stuff
honors that marks.

gettext hasn't to tag a perl string as utf8 since it's not aware of
perl internals and you cannot expect it from doing so.

The binding is aware of the internals and should properly recode and
mark the string. The C interface to gettext function returns just
character arrays.

nice but there's no gettext binding (at least not in perl-Gtk2).

I hoped you were talking about the interface there is --
Locale::gettext. I just looked at the source and it does not seem it
does anything with the unicode mess however. So it may be, that it won't
work right. Well, then just use the utf8:: utility functions. No need to
invent all the CRAP 

you need to call "some_module::bind_textdomain_codeset("mydomain",
'UTF8');" in order to get utf8 strings

That's a bug in the bindings. The binding of gettext SHOULD
*convert* the string to utf-8 from whatever gettext returns and mark
it as utf-8 string. After all, in other languages that only have
utf-8 strings it has to do the same.

no, read the gettext API: the program has to tell gettext in which
encoding it expects strings, else gettext assume the program will use
the encoding specified by the current locale.

Yes. And since perl knows about unicode strings, it should always tell
it wants unicode. Though, well, the I18N support in perl is lacking,
unfinished and broken. It's a horror to get perl to properly honor
locales.

Well, actualy, we should have Glib::I18N module taking care of it,
because of how much broken and inconsistent the support in perl is.

then, if needed, gettext will do the conversion between the encoding
used by the translator and the one expected by the program

with this functions defined in a xs as:

==================================================>
char *                                           >
bind_textdomain_codeset(domainname, codeset)     >
   char * domainname                             >
   char * codeset                                >
==================================================>

That's nice. But:
    1) Undocumented

just look at gettext doc

Sorry, found it now.

    2) Shouldn't be -- the bindings should set utf-8 behind the
scenes.

a) about common.pm, it had to cover both gtk+-1 and gtk+-2 since until
   a few monthes ago, we still had a few gtk+1 app using these
   modules.

That explains a lot.

b) gtk+2 perl bindinb does not include any gettext support. it just
   expect you to pass it unicode strings.

   BUT gettext will give it strings encoded in the *locale encoding*
   (the value of nl_langinfo (CODESET)', which depends on the
   `LC_CTYPE')

   just RTFM !

That's OK -- for the C version. But not for the bindings. The bindings
can simply:

    * Use bind_textdomain_charset to se utf-8 in the binding of
      bindtextdomain -- there is no point in not using unicode.
      (note: bind_textdomain_charset is broken -- it should take empty
      domain as "override for each and every domain", but it does not)
    * Mark return from gettext function as utf-8 string (it was forced
      to be one)

Perhaps we should have own bindings in Glib-perl, that will do just
this...


and then you've to tag strings as utf8

see N() implementation in common.pm from drakx installer and tools:

The c:: package does not exist. That's pretty useless.
The code is damn lot complicated.

the c package is availlable in our cvs at
http://cvs.mandrakesoft.com/cgi-bin/cvsweb.cgi/gi/perl-install/

You didn't mention it before...

-------------------------------------------------------------------------------
                                                 Jan 'Bulb' Hudec <bulb ucw cz>

Attachment: signature.asc
Description: Digital signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]