Re: g_utf8_collate case sensitivity
- From: Owen Taylor <otaylor redhat com>
- To: Darin Adler <darin bentspoon com>
- Cc: Dan Winship <danw ximian com>, Gtk Developers <gtk-devel-list gnome org>
- Subject: Re: g_utf8_collate case sensitivity
- Date: 05 Sep 2001 22:44:33 -0400
Darin Adler <darin bentspoon com> writes:
> on 9/5/01 10:52 AM, Owen Taylor at otaylor redhat com wrote:
>
> >> g_utf8_collate sorts ASCIIbetically case-sensitively (eg, 'Z' < 'a').
> >> That's a bug, right? (The docs say "Compares two strings for ordering
> >> using the linguistically correct rules for the current locale". I think
> >> the rules for my locale say that "bar" sorts before "Foo".)
> >
> > The rules for the C locale generally have strcmp() ordering, I think.
> >
> > g_utf8_collate() is just implemented in terms of strcoll() currently.
>
> For this very reason, eel_strcoll uses strcoll outside "C" and "POSIX"
> locales, but uses eel_strcmp_case_breaks_ties in the "C" and "POSIX"
> locales.
>
> It looks like we're still going to need an eel_strcoll (although we'll
> switch to g_utf8_collate and a UTF-8 version of
> eel_strcmp_case_breaks_ties).
>
> Frankly, I'd strongly suggest providing a function with these kinds of
> semantics in glib -- it was my mistake not to bring this up when my remarks
> suggested g_utf8_collate.
Having a:
g_utf8_collate_and_fallback_for_c_locale()
is clearly wrong. g_utf8_collate() should always do what a
user would expect for a human-readable collation
So, if the right thing to do for the C locale is to not use
strcoll() because strcoll() is broken, well, then perhaps
that should be in the implementation.
There is no guarantee at all that g_utf8_strcoll() produces
the same sort order as strcoll() - the implementation in terms
of strcoll() is just what we are doing currently.
I feel a bit uncomfortable second-guessing strcoll() because:
- maybe strcoll() in the C locale is implemented to do
something smarter than strcmp().
- g_utf8_casefold() isn't exactly speedy.
But it's certainly an implementation of g_utf8_collate() issue
not an issue of missing additional interfaces.
Regards,
Owen
[ And yes, if you run in the C locale, you are almost certainly
running a broken system. Most systems have ASCII as the
character set for C locale. You might question whether you
want > 128 to be UTF-8 or to be ISO-8859-1, but you most
likely _don't_ want them to be invalid.
Red Hat switched over to always installing a system default
of some real locale sometime in the 6.x series ... maybe
6.1. ]
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]