Re: Unicode and C++



Per Hedbor <per@idonex.se> writes:

> > My point is that, from all possible encodings of unicode, utf-8 is the one
> > that the less needs conversions.
> 
> Not really. You always have to convert from the 'native'
> representation to utf8/whatever.
> 
> And I really do doubt that UTF-8 will be the generic charset any time
> soon. It's much more convenient to just continue using whatever you
> are already using. This is especially true in Japan and other regions
> that use iso-2022-*, since those charsets already covers most of the
> commonly used unicode.

iso-2022 is not an charset, it is an encoding that encodes multiple
charsets via escape sequences.

Programs that read in files do need to worry about conversions
while reading in. This is a much, much, smaller problem than
having to convert throughout the code.

> UTF-8 is also the by far least useful charset from a programming
> perspective.  Try changing character 20 in a generic UTF-8 string,
> as an example.

I guess you've never tried using iso-2022 :-). As multi-byte
encodings go, UTF-8 is a very nice encoding. It preserves ascii,
it is stateless, you can iterate backwords in a string, etc.

Regards,
                                        Owen




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]