Re: [evolution-patches] Fix for Evolution 1.2 shortcut migration



> > > Strangely, you need to feed libxml2 with locale-encoded strings (and not
> > > UTF-8 ones) for shortcuts.xml migration from 1.2 (otherwise you'll end
> > > with very ugly strings, which is quite obvious in french).
> > 
> > Hmm, I don't understand why this patch works at all...  The current
> > shortcut code always sets the data in the libxml tree as UTF-8.  So why
> > would getting old UTF-8 data and converting it to locale generate a
> > valid tree?  (It's also the opposite of what the rest of the XML fixing
> > code in that file does...)
> 
> I don't understand either :)) Maybe libxml2 switched to locale encoding
> when reading shortcuts.xml and then expected all strings to be in locale
> encoding.. 

> <item name="R&#195;&#169;sum&#195;&#169;"...

Right. The problem is that libxml1 wrote out the UTF8 wrong (storing
each *byte* of the UTF8-encoded string as a separate entity instead of
storing each *character* as its own entity). So when you read it into
libxml2, each byte of UTF8 encoding becomes a separate character and you
end up with "RÃésumÃé".

Converting it to locale encoding isn't the right fix though; you
essentially want to convert to iso-8859-1 regardless of what the locale
encoding is (because that reverses the translation above: the "Ã"s
become 0xC3, and the "é"s become 0xE9, and then when you hand the data
back to libxml, it sees "0x52 0xC3 0xE9 0x73 0x75 0x6D 0xC3 0xE9", which
is the UTF-8 encoding of "Résumé").

But it would be less confusing to just do the transformation by hand,
since you don't really mean "convert from utf-8 to iso-8859-1", you just
mean "replace each multibyte utf-8 character with the corresponding
single-byte value".

-- Dan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]