Re: unicode sorting algorithm?



Petr Tomasek <tomasek etf cuni cz> writes:

> Hello!
> 
> I'm just curious if there exists an unicode, fully internation, sorting
> algorithm. I need to make database of my bibliography, which counts books
> in czech, english, german, modern hebrew and arabic. Up to now don't
> have them stored in computer due to lack of international support.
> 
> Is there any standart way to do sorting on multilingual unicode text?
 
 See http://www.unicode.org/unicode/reports/tr10/

The algorithm there, when applied using the default properties without
tailoring provides _a_ sorting order for all Unicode strings. Since
different languages have different, and incompatible, conventions for
ordering such an order is, at best, a compromise.

Also, for East-Asian text, conventional sorting is often by
pronunciation, which this algorithm makes no attempt to do. (For
Japanese, determining pronunciation from the written form of a word is
not always easy even for a native-speaking human.)
 
> More specific: how will sorting be solved in the gtk-2/gnome-2 platform?
> Or should each  developer write his own sorting routines?

We'll probably have some simple hack, that will work well if:

 a) your platform has good Unicode sorting support
 b) you are running in an UTF-8 locale

and more or less well otherwise. See:

 http://bugzilla.gnome.org/show_bug.cgi?id=55836
 http://bugzilla.gnome.org/show_bug.cgi?id=55852

It's definitely going to be an area for future improvement past GLib-2.0.

Regards,
                                        Owen




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]