Re: [evolution-patches] Fix for Evolution 1.2 shortcut migration



> > But it would be less confusing to just do the transformation by hand,
> > since you don't really mean "convert from utf-8 to iso-8859-1", you just
> > mean "replace each multibyte utf-8 character with the corresponding
> > single-byte value".
> 
> Hmm, I'm not so sure of that : it will work for iso-8859-1 badly libxml1
> encoded strings (ie French) but I'm not sure it will work for non
> ISO8859-1 encoded strings (like Chinese ...)

No, really, it would. Say you had a shortcut named "[U+65E5][U+672C]"
("日本", "Japan"). In UTF-8, that's 0xE6 0x97 0xA5 0xE6 0x9C 0xAC, so
libxml1 would have incorrectly written "æ—¥æœ
¬" (well, it would write them in decimal instead of hex, but that
doesn't matter). So when you read that into libxml2, you get
"[U+00E6][U+0097][U+00A5][U+00E6][U+009C][U+00AC]", or "日本", which
is junk. But if you just convert each of the six unicode characters to
the equivalent single byte, you get 0xE6 0x97 0xA5 0xE6 0x9C 0xAC again,
which is the correct UTF-8 representation of your shortcut name, so when
you write it back out with libxml2, it will write it as 日&#672C;.

-- Dan

PS - Special thanks to Noah Levitt for writing gucharmap, without which
this message would have been much harder to write. :-)




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]