Re: unicode string



Havoc Pennington <hp@redhat.com> wrote:
> Nathan Myers <ncm@nospam.cantrip.org> writes: 
> > Note that in any case no function should take one of these monsters 
> > as an argument.
> 
> Where would you envision using this string? In my interfaces, or only
> in user code?
> 
> (If my functions take a wstring as argument, then we're back to
> converting all strings to UTF8 on their way in to my functions, so
> this Unicode_string isn't getting me any benefits, right?)

I would use it in implementation only.  Your interfaces should take 
a wstring, or be overloaded to take either a string or wstring.

The more I think about it, the less I like the implicit conversion 
operators I had suggested.   In particular, adding an overload as
suggested above would break code that passed a utf8_string.  Better
to use named functions.  Also, the more I think about it, the less I 
like the operators []; better to support mutation only by assignment
and (perhaps) via iterators.

Shiv asked:

> ... you'll come up thinking that 16 bit Unicode is good mix between
> speed and space wastage as most of normal Unicode can be done with in
> 16 bits.  So is there something obvious that I am missing about 32bit 
> unicode? 

16-bit Unicode has turned into a multi-"byte" encoding that has no
advantages over UTF-8.  (Its "bytes" just happen to be 16 bits long.)  
I'm disappointed to learn that IBM's library relies on this broken
version of Unicode.

Havoc's observation that random access in strings is only rarely
useful is interesting.  The STL recognizes containers that offer 
only bidirectional iterative access.  Probably utf8_string should
support only bidirectional const_iterators so they can walk around 
in the UTF-8 representation without forcing a conversion.

Nathan Myers
ncm at cantrip dot org





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]