Re: [evolution-patches] Fix for Evolution 1.2 shortcut migration
- From: Not Zed <notzed ximian com>
- To: Frederic Crozat <fcrozat mandrakesoft com>
- Cc: evolution-patches <evolution-patches lists ximian com>
- Subject: Re: [evolution-patches] Fix for Evolution 1.2 shortcut migration
- Date: Wed, 10 Sep 2003 10:48:53 -0500
>
> > <item name="Résumé"...
>
> Right. The problem is that libxml1 wrote out the UTF8 wrong (storing
> each *byte* of the UTF8-encoded string as a separate entity instead of
> storing each *character* as its own entity). So when you read it into
> libxml2, each byte of UTF8 encoding becomes a separate character and you
> end up with "RÃésumÃé".
>
> Converting it to locale encoding isn't the right fix though; you
> essentially want to convert to iso-8859-1 regardless of what the locale
> encoding is (because that reverses the translation above: the "Ã"s
> become 0xC3, and the "é"s become 0xE9, and then when you hand the data
> back to libxml, it sees "0x52 0xC3 0xE9 0x73 0x75 0x6D 0xC3 0xE9", which
> is the UTF-8 encoding of "Résumé").
>
> But it would be less confusing to just do the transformation by hand,
> since you don't really mean "convert from utf-8 to iso-8859-1", you just
> mean "replace each multibyte utf-8 character with the corresponding
> single-byte value".
Hmm, I'm not so sure of that : it will work for iso-8859-1 badly libxml1
encoded strings (ie French) but I'm not sure it will work for non
ISO8859-1 encoded strings (like Chinese ...)
Naah what dan sais is the content is 8 bit utf8 converted to xml entities byte-by-byte rather than as unicode characters.
So what you do is take the input stream, read it as utf8, but then take each unicode character input as a single utf8 byte, rather than as a gunichar_t.
Chinese for e.g. will have multiple chars encoded similarly, e.g. a 4 byte sequence will be encoded like
ABCD
but if either ABC or D is > 7 bits it'll be encoded as if it was an iso-8859-1 character in 2 utf8 bytes.
e.g. AaBCcD, which is how libxml2 will read it back as.
Does that make any more sense?
Michael
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]