Re: UTF-8 Functions



On Monday, July 2, 2001, at 04:18  PM, Owen Taylor wrote:

The question, I guess is whether it is worth adding:

g_ut8_collate_key_casefold (), which is currently

 g_utf8_collate_key (g_utf8_casefold (string));

But might eventually be implemented as:

 g_utf8_collate_key_extended (string,
                              G_COLLATE_SECONDARY,
                              G_NORMALIZE_ALL_COMPOSE);

[ There are issues of correctness here as well as efficiency ]

It's certainly easy enough to do ... just a few lines of code.  My
main hesitation is whether we know yet whether that is the right part
of the parameter space to give a special name.

Clear analysis as usual.

I think perhaps I want to retract my previous comment/request. If I understand correctly, g_utf8_collate_key (without g_utf8_casefold) will still typically sort strings in a way that is not unduly sensitive to case.
 In other words, we get this kind of order:

    A, a, B, b

not this kind:

    A, B, a, b

If that's so, then I think it's not particularly important to add the case folding version. It's only needed when you want to "partly collate" things and put a bunch of identical items into the same bucket. That's not the usual case, I don't think.

It might be common to case fold and normalize and then use the resulting string as a key. But I can't think of a case where you'd want to case fold and normalize and then still want to collate in a locale-specific way.

    -- Darin




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]