Re: UTF-8 Functions

From: Pablo Saratxaga <pablo mandrakesoft com>
To: gtk-i18n-list gnome org
Subject: Re: UTF-8 Functions
Date: Mon, 2 Jul 2001 10:16:31 +0200

Kaixo!

On Sun, Jul 01, 2001 at 09:16:41PM -0400, Owen Taylor wrote:

> I've now added the following functions to GLib. I'm pretty happy with
> them as encapsulating the basic operations of this nature in a simple
> manner.

BTW, it is not really related to glib (imho it should be more in depth, in
libc), but I'm faced since some days with the problem of grep and such
having the [x-y] like ranges being case insensitive in locales other than 'C'
while previously [A-Z] was different of [a-z].

The change in behaviour comes because previouls strcmp() was used to check if
a char was in the reange, but now strcoll() is used. Imho that is wrong,
the right function would have been a mbstrcmp(), which doesn't exist.

So, I wonder, would it be worth having such a function in standard?
and should it be a multibyte version of strcmp(), that is, only comparing
numeric values of char codepoints (and in such case, are the values to be 
invariant (eg their unicode values) or encoding specific?) or do some locale
speicific things (eg, does [b-d] includes ccedilla in French?)?

And are the [x-y] like ranges defined somewhere in a standard?

Do you have som ideas about that, or know where it can be better discussed?

Thanks

-- 
Ki ça vos våye bén,
Pablo Saratxaga

http://www.srtxg.easynet.be/		PGP Key available, key ID: 0x8F0E4975

References:
- UTF-8 Functions
  - From: Owen Taylor

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]