[xml] encoding and htmlCreatingPushParser
- From: Petr Pajas <pajas ufal mff cuni cz>
- To: libxml <xml gnome org>
- Subject: [xml] encoding and htmlCreatingPushParser
- Date: Fri, 10 Nov 2006 15:15:40 +0100
Hi Daniel, All,
I'm using the xmlCharEncoding argument of htmlCreatePushParserCtxt in order to
force the parser to expect a certain input encoding. It works fine but only
as long as the HTML document contains no header like
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">
where charset differs from the encoding which I'm trying to enforce. Imagine,
if you like, that I receive the data from a web server, which already sends
all as UTF-8, declares correctly Content-Type charset as UTF-8 in the HTTP
header, but somehow the document still contains a (forgotten)
I should mention that both htmlParseDoc() and htmlParseFile(), under the same
scenario, do obey the encoding I specify.
Since these methods are more high-level I'm not sure whether it's a bug or
feature that htmlCreatePushParserCtxt() favors <meta> over the encoding
specified in the constructor while htmlParse* do otherwise. In either case,
I'd be interested if there is currently some way to force encoding with the
push parser (so that the <meta> won't overrride it)?
P.S. this was also tested with CVS libxml2.
] [Thread Prev