Re: utf8 and Glib::ustring



On 22 March 2017 at 08:52, John Emmas <john creativepost co uk> wrote:
Forgive my ignorance - this'll probably be obvious to some of you...

Suppose I've got a simple character string, like this:-

      const char* my_str = "Hello World";

I can assign it to a Glib::ustring very easily:-

      Glib::ustring ustr = my_str;

BUT... instead of pointing to a 'normal' string (simple ASCII characters), let's suppose that 'my_str' was already pointing to a string in utf8 format.  Will the same assignment still work - or is there some better way of assigning a utf8 string to a Glib::ustring?  Thanks,

John


UTF-8 is backwards compatible with ASCII. If bit 7 of any given byte in a string is 0, then that byte is treated as ASCII. Only if bit 7 is 1 do UTF-8-compatible tools start interpreting the lower bits and the following bytes differently.

In the same way, to Glib::ustring, any char* is just a block of bytes for it to interpret as ASCII or as the extended set of characters supported by UTF-8. (This typically manifests as different behaviour when getting the string length, indexing, etc.: there is no longer a 1:1 correspondence between size in bytes and length in characters when UTF-8 encoding is in play.)

IOW, the answer to the question is yes, the same assignment will/must work, and no, there is no better way: construct the Glib::ustring from the char* and let it handle the rest.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]