Re: [xml] xmlTextReader and character encoding



* Shane Dempsey (shdempse) wrote:
I am using libxml2 and the xmlTextReader to parse the xml content below.

Libxml somehow interprets the content contained in the xml node and uses 
that information to encode the parsed content resulting in the insertion 
of the  character. Is there a way to stop the libxml2 from interpreting 
this i.e. charset=iso-8859-15?

XML to process :
==============
<SPAN style="FONT-STYLE: normal; FONT-FAMILY: Segoe UI; COLOR: #1a1a1a; 
FONT-SIZE: 10pt; FONT-WEIGHT: normal; TEXT-DECORATION: none">&nbsp;meta 
http-equiv="content-type" content="text/html; charset=iso-8859-15" 
/</SPAN>

Processed XML
=============
<span>Â meta http-equiv=&quot;content-type&quot; 
content=&quot;text/html; charset=iso-8859-15&quot; /</span>

Your XML document is not well-formed, the `&nbsp;` is not one of the
pre-defined named entities and there is no document type declaration.
So you are probably not showing us the whole input, or at not telling
us exactly how you are processing it. Anyway, `nbsp` is usually de-
fined as U+00A0, a non-breaking space, and the UTF-8 encoding of that
character when incorrectly interpreted as ISO-8859-x will look similar
to the string you say is being inserted.
-- 
Björn Höhrmann · mailto:bjoern hoehrmann de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]