Re: Unicode and C++



>> For *all* languages that is the case.
>> Or do you think that sorting of English in ASCII and EBCDIC are the same ?

>> That is why the definition of LC_COLLATE (the class of a locale
>> defining the sorting order) is done using symbolic names for each
>> char, and not a hardcoded value only valid with a given charset
>> encoding.

Steve> And just what is collate supposed to do with my above example?
Steve> Both code points are the same damned character. Of course, what
Steve> you have to do is merge both characters to be treated as the
Steve> same thing for sorting purposes. Dumb, huh?

Steve> The whole point is you can't simply compare character
Steve> codes. You have to do tortuous manipulations to make a
Steve> comparison really work.

The point is that you have to do these manipulations regardless of the
encoding you are using, that is if your encoding is complete.  There
might be some limited cases where you don't, but that doesn't matter
-- you still need the machinery to solve the general case.

For instance, in English the sorting order ignores accents.  Eg take
the word "cooperate" which can be written with an umlaut over the
second "o" (I know this is an archaic form; it is just an example).
Both forms of "cooperate" sort the same in English.  Depending on your
encoding you might have to deal with a precomposed o-umlaut (by
mapping it to "o") or with a separate umlaut accent (by ignoring it).
In English sort order also ignores case, which is a problem even with
7-bit ASCII.

So this isn't a problem with Unicode per se but rather a problem with
doing any kind of locale-sensitive collation.

Tom




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]