RE: Glib::ustring tradeoffs?
- From: "Foster, Gareth" <gareth foster siemens com>
- To: Chris Vine <chris cvine freeserve co uk>, gtkmm-list gnome org
- Cc: Matthias Kaeppler <matthias finitestate org>
- Subject: RE: Glib::ustring tradeoffs?
- Date: Mon, 31 Oct 2005 09:17:52 +0000
> UTF-8 represents Unicode characters by a series of bytes, of
> between 1 and 6
> bytes in length - true ASCII characters (of value less than
> 128) are also
> valid UTF-8 and represented by 1 byte, and all other characters are
> represented by more than one byte. You can put any char
> value you want
> (including null characters and UTF-8 byte sequences) into a
> std::string
> object. UTF-8 is just another series of bytes as far as a
> std::string object
> is concerned, as is any other byte-based encoding such as ISO8859-1.
>
> A Glib::ustring object stores its UTF-8 contents as a series
> of bytes in the
> same way that a std::string object does (in fact, it contains
> a std::string
> object for that purpose). The main difference between a
> std::string object
> and a Glib::ustring object is that the Glib::ustring object
> counts it size,
> iterates and indexes itself with operator[]() by reference to
> whole Unicode
> characters rather than bytes - operator[]() will return an
> entire Unicode
> (gunichar) character for the index rather than a byte, as
> will dereferencing
> a Glib::ustring iterator. It can also search by reference a Unicode
> (gunichar) character and a Unicode (gunichar) character can
> be inserted into
> it (for that purpose the character will be converted into the
> equivalent
> UTF-8 byte representation and then inserted in the underlying
> std::string
> object).
>
> In many applications this extra functionality is irrelevant
> and using a
> std::string object for storing and manipulating UTF-8 byte
> sequences will be
> fine and have less overhead. In addition, if you try to manipulate a
> Glib::ustring object after putting an invalid UTF-8 byte
> sequence into it the
> Glib::ustring object will be in an undefined state, so you
> need to know that
> what you are putting into it is valid. (You can check this before
> manipulating it with Glib::ustring::validate().)
>
> You can check whether a std::string object contains valid UTF-8 with
> g_utf8_validate(), and extract a Unicode character from the
> byte stream it
> contains with Glib::get_unichar_from_std_iterator(), so you
> can take your
> choice between using std::string or Glib::ustring depending
> on your needs.
>
That was very informative Chris, thanks. In fact, it would make a nice
introduction to glib:ustring in the gtkmm book me thinks (assuming there
isn't a better one already).
Gaz
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]