Re: UTF-8: Case mapping


On Thu, 28 Jun 2001, Pablo Saratxaga wrote:
> > I agree with most of what you said, but I can see a practical reason why
> > your last point is a poor solution. Most people only know one or two
> > languages. We lack the skills needed to build any generally meaningful
> I wonder... the GNU libc has very complex and comprehensive per language
> (per locale even) sorting rules; and, from the source files at least, it seems
> it is possible to find the base letter of a given char in case it has
> accents, or know if it is an upper or lower case, or which script it is
> from. So, I don't understand very much the reason of this thread; what is
> exactly the problem?

	Just wondering as I don't know the GNU libc very well, but would
libc be able to handle variable length bytes for a given character?  For
example, I presume Japanese and Chinese C libraries would sort assuming
the input was two byte strings (S-JIS for Japanese and Big5 for Chinese)
[I honestly don't know what would happen if you threw ASCII characters
into a Japanese / Chinese text file; though I know there is a 2 byte and a
1 byte version of the letter "a".]

	I partly agree with your message, but for the most part, I also
don't understand the problem at hand very well.  As UTF-8 is not used
fully yet and will be used more often in the next few years, it's hard to
predict what a typical user's needs will be.

	I think having some basic sorting for 2.0 (i.e., primitives for
users to build on) and waiting until UTF-8 catches on to see what is
popular sounds like one valid idea to me... 


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]