Re: [Re: [Re: [libxml++] UTF8 support]]

From: Murray Cumming <murrayc usa net>
To: <libxmlplusplus-general lists sourceforge net>, <libxmlplusplus-general lists sourceforge net>
Subject: Re: [Re: [Re: [libxml++] UTF8 support]]
Date: Tue, 25 Feb 2003 16:25:06 -0000

Stefan Seefeld <seefeld sympatico ca> wrote:
> I take it that with 'standard C++' you mean you want to be able to
> access characters with the '[]' operator. That is a requirement for
> the specific encoding you use. With utf8 characters don't have a fixed
> size, so you don't have random access. Instead you have to iterate
> over the string to find the nth character.

glib provides functions to get specific characters, calculate length etc. It's
not as efficient as with 1-byte-per-character pointer arithmetic but it works.
Glib::ustring allows uses this to implement operator[] and size() etc. So you
still get to work by-character like you do with std::string. However
operator[] is read-only with Glib::ustring.

> So, depending on what you want to do with the string, one encoding
> may be better than another.
> 
> Please note that there is no way for  unicode to fit into std::wstring,
> as that has >16 bit, while unicode needs 21 bits per character. Some
> 'planes' fit into these 16 bit, but for lots of characters you need
> more, so the encoding becomes variably sized (meaning, as explained
> above, there is no random access).

Thanks for the clarification. Do you know any particular languages that a
wstring couldn't cope with? This question is asked often and I would like a
simple "wstring can't do ..." answer for people.

Murray Cumming
murrayc usa net
www.murrayc.com

Follow-Ups:
- Re: [Re: [Re: [libxml++] UTF8 support]]
  - From: Stefan Seefeld

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]