Re: [xml] xmlTextReader and character encoding
- From: Bjoern Hoehrmann <derhoermi gmx net>
- To: "Shane Dempsey (shdempse)" <shdempse cisco com>
- Cc: "xml gnome org" <xml gnome org>
- Subject: Re: [xml] xmlTextReader and character encoding
- Date: Thu, 30 Jan 2014 17:08:57 +0100
* Shane Dempsey (shdempse) wrote:
I am using libxml2 and the xmlTextReader to parse the xml content below.
Libxml somehow interprets the content contained in the xml node and uses
that information to encode the parsed content resulting in the insertion
of the  character. Is there a way to stop the libxml2 from interpreting
this i.e. charset=iso-8859-15?
XML to process :
==============
<SPAN style="FONT-STYLE: normal; FONT-FAMILY: Segoe UI; COLOR: #1a1a1a;
FONT-SIZE: 10pt; FONT-WEIGHT: normal; TEXT-DECORATION: none"> meta
http-equiv="content-type" content="text/html; charset=iso-8859-15"
/</SPAN>
Processed XML
=============
<span>Â meta http-equiv="content-type"
content="text/html; charset=iso-8859-15" /</span>
Your XML document is not well-formed, the ` ` is not one of the
pre-defined named entities and there is no document type declaration.
So you are probably not showing us the whole input, or at not telling
us exactly how you are processing it. Anyway, `nbsp` is usually de-
fined as U+00A0, a non-breaking space, and the UTF-8 encoding of that
character when incorrectly interpreted as ISO-8859-x will look similar
to the string you say is being inserted.
--
Björn Höhrmann · mailto:bjoern hoehrmann de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
[
Date Prev][
Date Next] [
Thread Prev][Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]