Re: Character normalization ?



Joel Becker <jlbec evilplan org> writes:

> On Mon, Mar 25, 2002 at 04:03:37PM -0500, Daniel Veillard wrote:
> >   Hum, by the way, now that we have a decent internationalized
> > framework, one of the annoyances of Unicode is character normalization,
> 
> 	Ok, this isn't quite character normalization, but it is
> normalization nonetheless.  A problem I noticed while trying to run
> en_US.UTF-8 and en_US.ISO-8859-1.
> 	Here's the issue.  glibc normalizes the encoding name by
> stripping all '-' characters and lowercasing all alphabetic characters.
> So, UTF-8 because utf8 and ISO-8859-1 becomes iso88591 (see
> glibc/intl/l10nflist.c:_nl_normalize_codeset()).  However, X does not.
> X expects specific encoding names.  You can see these in
> /usr/X11R6/lib/X11/locale/.  X expects UTF-8 to be spelled UTF-8 and
> ISO-8859-1 to be spelled iso8859-1.
> 	As it currently stands, GDM for "English" sets en_US.ISO-8859-1
> (IIRC, it's been a month).  This spelling normalizes properly for glibc,
> but does not work at all under X.  All apps in X give the usual "falling
> back to C" error.  I was wondering if anyone had given any thought to
> this issue, either making X normalize names or having gdm and/or glib
> think about name normalization.  The value GDM sets may, of course, not
> come from GDM directly.
> 	Someone in the past (I think it was Owen) guaranteed that Red
> Hat tested all combinations and made sure they worked.  My machine is
> Debian, so I cannot speak to that.  However, I do see this issue and I
> expect it to be an issue we will see later.  Thoughts?

This is purely an X configuration issue. There is a standard for what
should be used for codeset names on Linux:

 http://www.li18nux.org/subgroups/sa/locnameguide/index.html

If your X doesn't support these names, it needs to be fixed :-)

(Various people have had plans to make X flexible for character set
names the same way glibc is, but nobody has every gotten around to
doing it, so for now it's a matter of locale.alias munging. The libc
normalization to iso88591 is not meant to mean that iso88591 is the
real name, it's just a strategy for matching.)

To the extent GLib has stndard encoding names, they are the ones
libiconv/libcharset use, and are more or less the same, though
glancing at the table on li18nux.org, there are a few discrepancies,
like GB2312, instead of GB-2312.

Regards,
                                        Owen



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]