Re: [xslt] utf8 encoding and conversion

On 26.05.2004 13:23, Thomas Bopp wrote:

Hi there!

Sorry, if this is not the correct list for this question:

I have a html document which I am parsing with libxml2 SAX parser.
I am converting the document from iso-8859-1 to utf8 and then the parser works fine.
Unfortunately libxml fails to convert the utf8 code back to iso-8859-1. Shouldnt this work
since the orginal file is iso-8859-1 ? It works for most files ....
The buggy file contains some entities like ” - maybe that is the problem ?

I believe that is the problem. Whatever the entity is, it can be represented as a sequence of UTF-8 code units. The entity probably ends up converted to an UTF-8 character, means it does not remain an entity. Conversion back to ISO-8859-1 fails because this character does not exist in the target encoding.


