[xml] encoding and htmlCreatingPushParser



Hi Daniel, All,

I'm using the xmlCharEncoding argument of htmlCreatePushParserCtxt in order to 
force the parser to expect a certain input encoding. It works fine but only 
as long as the HTML document contains no header like

   <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">

where charset differs from the encoding which I'm trying to enforce. Imagine, 
if you like, that I receive the data from a web server, which already sends 
all as UTF-8, declares correctly Content-Type charset as UTF-8 in the HTTP 
header, but somehow the document still contains a (forgotten) 
<meta ...charset=iso-8859-2>.

I should mention that both htmlParseDoc() and htmlParseFile(), under the same 
scenario, do obey the encoding I specify.

Since these methods are more high-level I'm not sure whether it's a bug or 
feature that htmlCreatePushParserCtxt() favors <meta> over the encoding 
specified in the constructor while htmlParse* do otherwise. In either case, 
I'd be interested if there is currently some way to force encoding with the 
push parser (so that the <meta> won't overrride it)?

Thanks,
  Petr

P.S. this was also tested with CVS libxml2.
-- 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]