Re: Glib::ustring::iterator not iterating over chinese character
- From: Daniel Elstner <daniel kitta googlemail com>
- To: Weimin Xie <panyuweimin hotmail com>
- Cc: gtkmm-list gnome org
- Subject: Re: Glib::ustring::iterator not iterating over chinese character
- Date: Fri, 01 May 2009 02:06:47 +0200
Am Montag, den 13.04.2009, 18:47 +0800 schrieb Weimin Xie:
> I'm learning how to use Glib::ustring. My goal is to split an ustring
> of unicode character into a vector container. In a simple case, my
> program have read a Chinese character, for example, "你". When I tried
> to use the Glib::ustring::iterator to go over the ustring, it shows
> there are more than one entry.
>
> If description = "你", then
[...]
> Gives me
> size <5> bytes <8> char <228> char <189> char <160> char <10> char
> <10>
To me, this looks suspiciously like something that would happen if a
string gets encoded twice. That is, I suspect you already had a UTF-8
encoded string, which subsequently got interpreted as a string of
ISO-8859-1 bytes and then translated a second time to UTF-8.
With just one code point (你) plus the two trailing newline characters,
the output for size should have been 3 instead of 5. And the number of
bytes should have been 5 rather than 8. The interpretation of a UTF-8
string as ISO-8859-1 would also explain why you see exactly the numbers
you would see if you were iterating over the bytes of the correctly
encoded original string -- that's because up to code point 255, Unicode
is identical to ISO-8859-1.
> Can someone please explain why the iterator doesn't go over the
> unicode characters as expected?
It probably does. It's just that your string doesn't contain what you
think it does.
> Thanks a lot in advance!
You're welcome. If you still think it's a problem of glibmm, please
file a bug and attach a test case, so we can reproduce the problem.
--Daniel
[Date Prev][
Date Next] [Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]