Re: g_utf8_collate case sensitivity



on 9/5/01 7:44 PM, Owen Taylor at otaylor redhat com wrote:

> So, if the right thing to do for the C locale is to not use
> strcoll() because strcoll() is broken, well, then perhaps
> that should be in the implementation.

I think you're right.

For me, the issue is a concrete one. Many people will end up running
Nautilus in the C locale, on broken installations if not intentionally. So
the question is what behavior they should get. A complaint that the locale
is not set? Case-sensitive sorting? A simple ASCII case folding sort? A
fancy UTF-8 case folding sort?

The primary issue in my mind is what behavior we want for the end user. The
issue of how to achieve this behavior using the underlying libraries is
secondary. For Nautilus 1.0, we decided that even in the C locale it was
important to do a case folding sort, but it was OK to simply case-fold the
ASCII characters and sort the non-ASCII ones by value, since most file names
are ASCII. For locales other than C, we found that strcoll gave suitable
results.

> There is no guarantee at all that g_utf8_strcoll() produces
> the same sort order as strcoll() - the implementation in terms
> of strcoll() is just what we are doing currently.

Now I have even you calling it g_utf8_strcoll :-)

> I feel a bit uncomfortable second-guessing strcoll() because:
> 
> - maybe strcoll() in the C locale is implemented to do
>  something smarter than strcmp().

That's a good point. For gnu's glibc, strcoll in the C locale is the same as
strcmp. If it was up to me, I'd address this by making the gnu glibc strcoll
do something nice (and as locale-neutral as possible) in the C locale. But
if we don't do that, then some level above the glibc strcoll will have to
work around this problem if I want results that are different from strcmp's
ordering.

> - g_utf8_casefold() isn't exactly speedy.

I do want g_utf8_collate to be as fast as possible, but it's more important
to me what the sorting order is.

> But it's certainly an implementation of g_utf8_collate() issue
> not an issue of missing additional interfaces.

I agree. In retrospect, I believe that eel_strcoll was created to work
around an issue with the implementation of strcoll in gnu's glibc, and it
did so at the expense of possibly ignoring a higher-quality C library's
implementation of strcoll for the C locale because it doesn't check for
glibc specifically. I think it's likely that other C libraries use a similar
implementation for strcoll in the C locale.

    -- Darin





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]