Re: Glib::ustring tradeoffs?



On Saturday 29 October 2005 17:13, Matthias Kaeppler wrote:
> Chris Vine wrote:
> > ...  As in your example you are hardwiring the text into the
> > source code, then the best thing is not to convert the text, but to write
> > the hardwired string in UTF-8 in the first place.  (As it happens,
> > "BUTTON" is valid ASCII and therefore valid UTF-8.)
>
> What does that mean? How do I "write in UTF-8"? ^^

See http://www.cl.cam.ac.uk/~mgk25/unicode.html

> And since the string "text" is both valid ASCII and UTF-8, why do I have
> to call locale_from_utf8() to convert it to an std::string:

You don't - see my earlier post.  (Aside from which you have a conceptual 
problem, because you don't convert anything "to a std::string": you convert 
between different codesets.  std::string is codeset agnostic, as I explained 
in my earlier post.)

> glib::ustring str1 = "text"; // is both valid ASCII and UTF8
> std::string str2 = str1; // doesn't work
Because std::string doesn't have a conversion constructor for Glib::ustring 
(but Glib::ustring has a conversion constructor for std::string, so the 
reverse will work).

This would work:
std::string str2 = str1.raw();

[snip]

> By the way: I noticed that operator[] only works in one way for
> Glib::ustring. How come I can only read characters with it, but not
> write them?
>
> Example:
>
> Glib::ustring str = "text";
> assert (text[0] == 't'); // works
> text[0] = 'n'; // fails

Because unlike std::string::operator[](), Glib::ustring::operator[]() returns 
by value and not by reference, and C/C++ does not allow a built in type 
returned by value to be modified, which in this case is important.  This is 
because Glib::ustring::operator[]() returns a unicode character (a 32 bit 
value) and not a byte (char) so if the value at any index could be modified, 
the character written to the index in question may be of different length (in 
UTF-8) than the character at present at that index position, so the 
modification may require the text contained by Glib::usting to change its 
byte size.  You have to use other methods, such as Glib::ustring::replace() 
to modify the text.  (See 
http://www.gtkmm.org/docs/glibmm-2.4/docs/reference/html/classGlib_1_1ustring.html ).

You are making this more difficult than it really is.  If you read up on 
Unicode and UTF-8 all will become clear.

Chris




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]