Re: [Re: [Re: [libxml++] UTF8 support]]
- From: Murray Cumming <murrayc usa net>
- To: <libxmlplusplus-general lists sourceforge net>, <libxmlplusplus-general lists sourceforge net>
- Subject: Re: [Re: [Re: [libxml++] UTF8 support]]
- Date: Tue, 25 Feb 2003 16:25:06 -0000
Stefan Seefeld <seefeld sympatico ca> wrote:
> I take it that with 'standard C++' you mean you want to be able to
> access characters with the '[]' operator. That is a requirement for
> the specific encoding you use. With utf8 characters don't have a fixed
> size, so you don't have random access. Instead you have to iterate
> over the string to find the nth character.
glib provides functions to get specific characters, calculate length etc. It's
not as efficient as with 1-byte-per-character pointer arithmetic but it works.
Glib::ustring allows uses this to implement operator[] and size() etc. So you
still get to work by-character like you do with std::string. However
operator[] is read-only with Glib::ustring.
> So, depending on what you want to do with the string, one encoding
> may be better than another.
>
> Please note that there is no way for unicode to fit into std::wstring,
> as that has >16 bit, while unicode needs 21 bits per character. Some
> 'planes' fit into these 16 bit, but for lots of characters you need
> more, so the encoding becomes variably sized (meaning, as explained
> above, there is no random access).
Thanks for the clarification. Do you know any particular languages that a
wstring couldn't cope with? This question is asked often and I would like a
simple "wstring can't do ..." answer for people.
Murray Cumming
murrayc usa net
www.murrayc.com
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]