Re: zh_TW.Big5.po: illegal control sequence

From: Pablo Saratxaga <pablo mandrakesoft com>
To: Pavel Roskin <proski gnu org>
Cc: Pablo Saratxaga <pablo mandrakesoft com>, Martin Bialasinski <martin internet-treff uni-koeln de>, mc-devel gnome org, Jing-Jong Shyue <shyue sonoma com tw>
Subject: Re: zh_TW.Big5.po: illegal control sequence
Date: Tue, 22 May 2001 23:38:41 +0200

Kaixo!

On Tue, May 22, 2001 at 01:05:59PM -0400, Pavel Roskin wrote:

> > > I also tested the charset translation with 0.10.37. MC works fine with
> > > LANG=ru_RU.ISO8859-5 on rxvt with an iso-8859-5 font. But the hints are
> > > displayed incorrectly, because they are not using gettext.
> >
> > whzt hints, the icons ones? That is done in the *.desktop files, and
> > needs a different handling.
> 
> No, MC is a text program running on a terminal and displaying nice blue
> panels :-)
> 
> Hints are text messages that appear above the command line be default.
> They are loaded from files with names like mc.hint.es and mc.hint.ru,
> dependent on the locale. The charset is not specified in those files.

Ah ok.
So, my solution for that problem: use utf-8.
When a charset isn't specified trough a special keyword or header, then
it must be utf-8. then mc can easily convert them to user encoding.
(with a fallback to simply output as is in case the conversion fails, that
will allow to still display hints in non utf-8, so translators can convert
to utf-8 in a transitional period, rather than rushing to it).

If you tell me where in the sources is the code that displays the
hints I can do the needed changes.
I can also convert the files to utf-8 (and check the conversion is done
right) and put a READMe file explaining the reasons why those files must
be in utf-8.

> > Imho it should be done like this (I already successfully used that method
> > with other prgrams):
> >
> > - change all the files to use utf-8 encoding
> 
> Perhaps it's an overkill for a text-only program. Maybe not, I don't know.

No, it's not overkill, the problem concerns text displaying only (be it on
an xterm, a virtual console or in a gtk widget, it is just text being
displayed).

>>> Unfortunately, gmc doesn't work correctly with LANG=ru_RU.ISO8859-5. Maybe
>>> it's fixed in the head branch of Gtk+.
> 
> Actually, it doesn't work on one machine (RedHat 7.1), but it does
> work on the other (RedHat 7.0). It looks great!

>> what is the contents of /etc/gtk/gtkrc.ru* and /usr/share/gtk/gtkrc.ru* ?
> 
> The contents is pretty standard. But I found /etc/gtk/gtkrc.ru_RU.iso88595
> from gtk+-1.2.10-ximian.8. That must be the reason why it works.

Indeed.

For Gtk three thinks are needed:

- the libc locale must exist (libc does canonization and smart substitution,
  eg, if no /usr/share/locale/ru_RU.ISO8859-5 exists, it will look
  at /usr/share/locale/ru_RU then /usr/share/locale/ru)
- the libX11 must know which XLC_LOCALE file to read, and, sadly, XFree86
  is very lame at it (ru_RU.ISO8859-5 won't be found if ru_RU.ISO-8859-5
  is told for exampe, or even if ru_RU.iso8859-5 is told... it could easily
  be fixed for systems where it is possible to request the libc about the user
  encoding; so probalby a future XFree86 will make it easier and working)
- there must be a gtkrc with the proper fontset.
  gtk looks first at gtkrc.$LANG (well, first LC_ALL, then LC_CTYPE, etc),
  then to gtkrc.ll_CC.xxxxx with  ll=language, CC=country, and xxxxxx being
  the encoding canonized to all lowercase and only letters and numbers
  (so ru_RU.ISO8859-5 --> ru_RU.iso88595), then gtkrc.ll_CC, then gtkrc.ll

of course; all three have to relate to the same encoding; that is sometimes
the difficultpart (gtk 1.2.* is at the end of its live; but maybe a small
patch would be to first request the libc encoding and then look for
gtkrc.encoding, and only if that fails look for gtkrc.$LANG etc.
if XFree86 is patched the same way, then the same encoding will be matched
on all three parts, and a heavy problem will be solved)

>> So, what is better you think, use big5 encoding, and requiring new gettext;
>> or using utf-8, and requiring on some systems to use internal gettext
>> (but on those systems, don't they also need GNU gettext to compile the po
>> files?).
> 
> Changing the charset is a political issue. Upgrading gettext is not. Thus
> I prefer the later.
> 
> As much as I would like to see all people in the world using the same
> encoding, I realize that we here cannot do it by applying iconv to the
> files that we didn't write.

It depends, if the file has a way to tell the encoding it uses (which is the
case of *.po file, of html, email and news, etc) then anyone can user whatever
he wants, just tell it on the appropriate header.

If the file format has no way to tell that info, then you have only three
possibilities:
- force the translator to use utf-8, in order to let the user choose the
  encoding he wants
- let the translator choose the encoding, which means the user is forced to 
  use the same encoding (and first to figure out what encoding it is, as
  it is not announced)
- force both the translator and the users to switch to another file format
  that is able to tell the charset info.

I prefer the first solution, it involves the less pain (only one person
involved).

> 
> -- 
> Regards,
> Pavel Roskin

-- 
Ki ça vos våye bén,
Pablo Saratxaga

http://www.srtxg.easynet.be/		PGP Key available, key ID: 0x8F0E4975

References:
- Re: zh_TW.Big5.po: illegal control sequence
  - From: Pablo Saratxaga
- Re: zh_TW.Big5.po: illegal control sequence
  - From: Pavel Roskin

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]