Re: [evolution-patches] Fix for Evolution 1.2 shortcut migration



> > > 
> > > > <item name="R&#195;&#169;sum&#195;&#169;"...
> > > 
> > > Right. The problem is that libxml1 wrote out the UTF8 wrong (storing
> > > each *byte* of the UTF8-encoded string as a separate entity instead of
> > > storing each *character* as its own entity). So when you read it into
> > > libxml2, each byte of UTF8 encoding becomes a separate character and you
> > > end up with "RÃésumÃé".
> > > 
> > > Converting it to locale encoding isn't the right fix though; you
> > > essentially want to convert to iso-8859-1 regardless of what the locale
> > > encoding is (because that reverses the translation above: the "Ã"s
> > > become 0xC3, and the "é"s become 0xE9, and then when you hand the data
> > > back to libxml, it sees "0x52 0xC3 0xE9 0x73 0x75 0x6D 0xC3 0xE9", which
> > > is the UTF-8 encoding of "Résumé").
> > > 
> > > But it would be less confusing to just do the transformation by hand,
> > > since you don't really mean "convert from utf-8 to iso-8859-1", you just
> > > mean "replace each multibyte utf-8 character with the corresponding
> > > single-byte value".
> > 
> > Hmm, I'm not so sure of that : it will work for iso-8859-1 badly libxml1
> > encoded strings (ie French) but I'm not sure it will work for non
> > ISO8859-1 encoded strings (like Chinese ...)
> Naah what dan sais is the content is 8 bit utf8 converted to xml
> entities byte-by-byte rather than as unicode characters.
> 
> So what you do is take the input stream, read it as utf8, but then
> take each unicode character input as a single utf8 byte, rather than
> as a gunichar_t.
> 
> Chinese for e.g. will have multiple chars encoded similarly, e.g. a 4
> byte sequence will be encoded like
> 
> ABCD
> 
> but if either ABC or D is > 7 bits it'll be encoded as if it was an
> iso-8859-1 character in 2 utf8 bytes.
> 
> e.g. AaBCcD, which is how libxml2 will read it back as.
> 
> Does that make any more sense?

Yes, with your explanation and Dan latest explanations :)

I'll try to do a new version of my patch tomorrow.. (probably not in
time for 1.4.5)..

-- 
Frederic Crozat <fcrozat mandrakesoft com>
Mandrakesoft




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]