Re: strcasecmp/tolower/toupper breakage



George <jirka 5z com> writes: 
> I suggest we get ascii only versions into glib 2.  In fact I suggest
> g_strcasecmp and g_strncasecmp work as ascii only, since there doesn't seem
> to be any legitimite reason for use of a locale specific strcasecmp (again,
> strcoll should be used).
> 

It's far worse than you think - strcoll() doesn't work on
UTF-8. What's needed is a UTF-8 strcoll() implementation.

We punted this out of glib 2, it's really hard to implement. :-(

The cheesy way is to setlocale() to current locale, convert the
strings to locale encoding, compare, restore locale. But it's not
thread safe and it's butt slow. So not really acceptable.

I believe toupper, tolower, etc. just corrupt the hell out of UTF-8
strings so all code using them is flat-out broken as in "causes
segfaults" with GTK 2, unless you know the text is ASCII
only. g_strup(), g_strdown(), etc. are also broken to use on Unicode
since they use toupper, tolower.

There are g_unichar_toupper(), etc. in glib 2 which should be used
instead.

utf8_strcasecmp() is pretty easy to implement using unichar_tolower(),
if you don't change its behavior according to locale.

It might be useful to write either a source code scanner or an
nm-based script to find suspicious locale-dependent code, either by
looking for dependencies on locale-specific C library symbols in the
binary or looking for suspect functions in the source code. Sort of
"i18n-lint." Could also find uses of GdkFont, etc.

Havoc

_______________________________________________
gnome-hackers mailing list
gnome-hackers gnome org
http://mail.gnome.org/mailman/listinfo/gnome-hackers




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]