Re: [xml] encoding and htmlCreatingPushParser



On Friday 10 November 2006 15:37, Daniel Veillard wrote:
On Fri, Nov 10, 2006 at 03:15:40PM +0100, Petr Pajas wrote:
Hi Daniel, All,

I'm using the xmlCharEncoding argument of htmlCreatePushParserCtxt in
order to force the parser to expect a certain input encoding. It works
fine but only as long as the HTML document contains no header like

   <meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-2">

where charset differs from the encoding which I'm trying to enforce.
Imagine, if you like, that I receive the data from a web server, which
already sends all as UTF-8, declares correctly Content-Type charset as
UTF-8 in the HTTP header, but somehow the document still contains a
(forgotten)
<meta ...charset=iso-8859-2>.

I should mention that both htmlParseDoc() and htmlParseFile(), under the
same scenario, do obey the encoding I specify.

  htmlRead... should be preferred nowadays.

This actually concerned the old dusty XML::LibXML bindings, so my first 
intention was to do just a minimum surgery, leaving the dust safely where 
lied. But switching to htmlReadIO proved to be a way better choice than that. 
It does exactly what I needed with just a few lines of code. 

Cheers,
-- Petr



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]