Re: char representation



On Mon, Feb 25, 2002 at 11:49:18AM -0500, Owen Taylor wrote:
> 
> Miroslaw Dobrzanski-Neumann <mne mosaic-ag com> writes:
> 
> > Hi all,
> > 
> > visiting some sources I saw code fragments that relay on a ascii like internal
> > character representation:
> > 
> > here just one I've picked up
> > gtype.c:537:  name_valid = (p[0] >= 'A' && p[0] <= 'Z') || (p[0] >= 'a' && p[0] <= 'z') || p[0] == '_';
> > 
> > the same in the pango package.
> > 
> > on systems such IBM host S/390 running Operating System OS/390 (MVS) the internal character
> > representation is EBCDIC for which p[0] >= 'A' && p[0] <= 'Z' does not hold
> > when looking for an uppercase letter.
> > 
> > Generally there is no guarantee that any character range (digits, letters,
> > ...) have consecutive values. There is also no guarantee for 'A' < 'Z'.
> > 
> > The correct way to deal with this is "isupper (c)" or something like this.
> > I guess g_ascii_isupper() would do this job perfectly, because it is based on
> > character attributes and not on on their internal representation.
> 
> There are only two forms of 8-byte character representations we support
> now:
> 
>  ASCII with uninterpreted chars > 127. [E.g. g_ascii_*]
>  UTF-8                                 [g_utf8_*]
> 
> It's an important property that the valid strings for the second 
> are a subset of the first.
> 
> The g_ascii_* functions are meant to have a standard meaning
> for each given byte and not to magically change into 
> g_ebcdic_...; EBCDIC is not UTF-8 compatible.
> 
> I'm afraid that EBCDIC systems will just be shut out from using
> GTK+. I really don't think that is going to be a major limitation
> in GTK+'s success.

It is OK when you say gtk+ contract is to use and accept only ascii and utf8.
If you want to stay portable you must arrange not to use string literals as
"abc" and character constants as 'A' which could be translated by the compiler
or other tool to unwanted internal representation. For the example above the
correct way to do this could be
enum {
	...
  	ASCII_A = 0x41,
	...
};
name_valid = (p[0] >= ASCII_A && && p[0] <= ASCII_Z ...)
or even better
name_valid = (g_ascii_isletter (p[0]) || p[0] == ASCII__)

For the string literal "abc"
gchar const literal[] = { ASCII_a, ASCII_b, ASCII_c, ASCII_NULL };

Regards
-- 
Miroslaw Dobrzanski-Neumann

MOSAIC SOFTWARE AG
Abteilung Basisentwicklung und Forschung
Tel +49-2225-882-291
Fax +49-2225-882-201
E-mail: mne mosaic-ag com




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]