Re: UTF-8 Functions

From: Darin Adler <darin bentspoon com>
To: Owen Taylor <otaylor redhat com>
Cc: gtk-devel-list gnome org, gtk-i18n-list gnome org, trow ximian com
Subject: Re: UTF-8 Functions
Date: Mon, 2 Jul 2001 16:39:30 -0700

On Monday, July 2, 2001, at 04:18  PM, Owen Taylor wrote:

The question, I guess is whether it is worth adding:

g_ut8_collate_key_casefold (), which is currently

 g_utf8_collate_key (g_utf8_casefold (string));

But might eventually be implemented as:

 g_utf8_collate_key_extended (string,
                              G_COLLATE_SECONDARY,
                              G_NORMALIZE_ALL_COMPOSE);

[ There are issues of correctness here as well as efficiency ]

It's certainly easy enough to do ... just a few lines of code.  My
main hesitation is whether we know yet whether that is the right part
of the parameter space to give a special name.


Clear analysis as usual.

I think perhaps I want to retract my previous comment/request. If Iunderstand correctly, g_utf8_collate_key (without g_utf8_casefold) willstill typically sort strings in a way that is not unduly sensitive to case.

 In other words, we get this kind of order:

    A, a, B, b

not this kind:

    A, B, a, b

If that's so, then I think it's not particularly important to add the casefolding version. It's only needed when you want to "partly collate" thingsand put a bunch of identical items into the same bucket. That's not theusual case, I don't think.

It might be common to case fold and normalize and then use the resultingstring as a key. But I can't think of a case where you'd want to case foldand normalize and then still want to collate in a locale-specific way.


    -- Darin

Follow-Ups:
- Re: UTF-8 Functions
  - From: Owen Taylor

References:
- Re: UTF-8 Functions
  - From: Owen Taylor

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]