Re: Unicode and C++

From: Steve Underwood <steveu coppice org>
To: gtk-i18n-list redhat com
Subject: Re: Unicode and C++
Date: Tue, 11 Jul 2000 10:02:35 +0800

Nathan Myers wrote various stuff, then got to the interesting bit:

> > The C standard makes no guarantees about any of these, and
> > my copy of GCC (CVS snapshot from a few weeks ago, I think)
> > certainly doesn't do what I would consider the right thing.
> >
> >  L"utf8-text"
> >
> > Gives you a string where each 4-byte wide character contains
> > one byte of the UTF-8 string.
>
> This is exactly what a reasonable person expects.  If the characters
> are ASCII, that's also what UCS32 specifies, and exactly what you want.
>
> Non-ASCII string literals are almost always a mistake; they belong in
> a gettext archive.

A reasonable mono-lingual English reader may expect this. Most of the
population of planet earth expect something like a UTF-8 string in the totally
compatible ASCII source code ends up as a UCS32 string in memory. They might
possibly hope for the odd #pragma that allows the source encoding or target
coding to be selected, too. What they don't expect is what they actually get -
"English readers only. Tough luck to the rest.". In my experience people play
for hours trying to figure out how to get their own language strings into
source code, before they finally figure out they can't. GCC is nearly 8-bit
clean (at least I have never found problems). You can put UTF-8, Big-5,
GB2312, and other things directly into normal byte oriented strings, but only
simple ASCII into wide character strings. Am I the only one who thinks that's
dumb. For many people its a good reason to avoid UCS32, and stick with a byte
stream encoding. The documentation needs a large:

A N G L O - P H I L E S      O N L Y

sign on the front!

Steve

References:
- Re: Unicode and C++
  - From: Nathan Myers

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]