Re: Glib::ustring tradeoffs?
- From: Chris Vine <chris cvine freeserve co uk>
- To: gtkmm-list gnome org
- Cc: Matthias Kaeppler <matthias finitestate org>
- Subject: Re: Glib::ustring tradeoffs?
- Date: Sat, 29 Oct 2005 00:59:55 +0100
On Friday 28 October 2005 13:00, Matthias Kaeppler wrote:
> Let's say I have a filename named "übung1.txt" (Note the umlaut--if your
> newsreader can display it hehe).
> Will this filename make trouble with std::string, or be lost/replaced
> when converting to Unicode?
UTF-8 represents Unicode characters by a series of bytes, of between 1 and 6
bytes in length - true ASCII characters (of value less than 128) are also
valid UTF-8 and represented by 1 byte, and all other characters are
represented by more than one byte. You can put any char value you want
(including null characters and UTF-8 byte sequences) into a std::string
object. UTF-8 is just another series of bytes as far as a std::string object
is concerned, as is any other byte-based encoding such as ISO8859-1.
A Glib::ustring object stores its UTF-8 contents as a series of bytes in the
same way that a std::string object does (in fact, it contains a std::string
object for that purpose). The main difference between a std::string object
and a Glib::ustring object is that the Glib::ustring object counts it size,
iterates and indexes itself with operator[]() by reference to whole Unicode
characters rather than bytes - operator[]() will return an entire Unicode
(gunichar) character for the index rather than a byte, as will dereferencing
a Glib::ustring iterator. It can also search by reference a Unicode
(gunichar) character and a Unicode (gunichar) character can be inserted into
it (for that purpose the character will be converted into the equivalent
UTF-8 byte representation and then inserted in the underlying std::string
object).
In many applications this extra functionality is irrelevant and using a
std::string object for storing and manipulating UTF-8 byte sequences will be
fine and have less overhead. In addition, if you try to manipulate a
Glib::ustring object after putting an invalid UTF-8 byte sequence into it the
Glib::ustring object will be in an undefined state, so you need to know that
what you are putting into it is valid. (You can check this before
manipulating it with Glib::ustring::validate().)
You can check whether a std::string object contains valid UTF-8 with
g_utf8_validate(), and extract a Unicode character from the byte stream it
contains with Glib::get_unichar_from_std_iterator(), so you can take your
choice between using std::string or Glib::ustring depending on your needs.
Chris
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]