Re: Character normalization ?
- From: Owen Taylor <otaylor redhat com>
- To: Joel Becker <jlbec evilplan org>
- Cc: Daniel Veillard <veillard redhat com>, gnome-hackers gnome org
- Subject: Re: Character normalization ?
- Date: Mon, 25 Mar 2002 17:01:31 -0500 (EST)
Joel Becker <jlbec evilplan org> writes:
> On Mon, Mar 25, 2002 at 04:03:37PM -0500, Daniel Veillard wrote:
> > Hum, by the way, now that we have a decent internationalized
> > framework, one of the annoyances of Unicode is character normalization,
>
> Ok, this isn't quite character normalization, but it is
> normalization nonetheless. A problem I noticed while trying to run
> en_US.UTF-8 and en_US.ISO-8859-1.
> Here's the issue. glibc normalizes the encoding name by
> stripping all '-' characters and lowercasing all alphabetic characters.
> So, UTF-8 because utf8 and ISO-8859-1 becomes iso88591 (see
> glibc/intl/l10nflist.c:_nl_normalize_codeset()). However, X does not.
> X expects specific encoding names. You can see these in
> /usr/X11R6/lib/X11/locale/. X expects UTF-8 to be spelled UTF-8 and
> ISO-8859-1 to be spelled iso8859-1.
> As it currently stands, GDM for "English" sets en_US.ISO-8859-1
> (IIRC, it's been a month). This spelling normalizes properly for glibc,
> but does not work at all under X. All apps in X give the usual "falling
> back to C" error. I was wondering if anyone had given any thought to
> this issue, either making X normalize names or having gdm and/or glib
> think about name normalization. The value GDM sets may, of course, not
> come from GDM directly.
> Someone in the past (I think it was Owen) guaranteed that Red
> Hat tested all combinations and made sure they worked. My machine is
> Debian, so I cannot speak to that. However, I do see this issue and I
> expect it to be an issue we will see later. Thoughts?
This is purely an X configuration issue. There is a standard for what
should be used for codeset names on Linux:
http://www.li18nux.org/subgroups/sa/locnameguide/index.html
If your X doesn't support these names, it needs to be fixed :-)
(Various people have had plans to make X flexible for character set
names the same way glibc is, but nobody has every gotten around to
doing it, so for now it's a matter of locale.alias munging. The libc
normalization to iso88591 is not meant to mean that iso88591 is the
real name, it's just a strategy for matching.)
To the extent GLib has stndard encoding names, they are the ones
libiconv/libcharset use, and are more or less the same, though
glancing at the table on li18nux.org, there are a few discrepancies,
like GB2312, instead of GB-2312.
Regards,
Owen
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]