Re: Character normalization ?



On Mon, Mar 25, 2002 at 04:03:37PM -0500, Daniel Veillard wrote:
>   Hum, by the way, now that we have a decent internationalized
> framework, one of the annoyances of Unicode is character normalization,

	Ok, this isn't quite character normalization, but it is
normalization nonetheless.  A problem I noticed while trying to run
en_US.UTF-8 and en_US.ISO-8859-1.
	Here's the issue.  glibc normalizes the encoding name by
stripping all '-' characters and lowercasing all alphabetic characters.
So, UTF-8 because utf8 and ISO-8859-1 becomes iso88591 (see
glibc/intl/l10nflist.c:_nl_normalize_codeset()).  However, X does not.
X expects specific encoding names.  You can see these in
/usr/X11R6/lib/X11/locale/.  X expects UTF-8 to be spelled UTF-8 and
ISO-8859-1 to be spelled iso8859-1.
	As it currently stands, GDM for "English" sets en_US.ISO-8859-1
(IIRC, it's been a month).  This spelling normalizes properly for glibc,
but does not work at all under X.  All apps in X give the usual "falling
back to C" error.  I was wondering if anyone had given any thought to
this issue, either making X normalize names or having gdm and/or glib
think about name normalization.  The value GDM sets may, of course, not
come from GDM directly.
	Someone in the past (I think it was Owen) guaranteed that Red
Hat tested all combinations and made sure they worked.  My machine is
Debian, so I cannot speak to that.  However, I do see this issue and I
expect it to be an issue we will see later.  Thoughts?

Joel

-- 

"You don't make the poor richer by making the rich poorer."
	- Sir Winston Churchill

			http://www.jlbec.org/
			jlbec evilplan org



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]