[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: [xml] encoding and htmlCreatingPushParser
- From: Petr Pajas <pajas ufal mff cuni cz>
- To: veillard redhat com
- Cc: libxml <xml gnome org>
- Subject: Re: [xml] encoding and htmlCreatingPushParser
- Date: Fri, 10 Nov 2006 21:58:29 +0100
On Friday 10 November 2006 15:37, Daniel Veillard wrote:
> On Fri, Nov 10, 2006 at 03:15:40PM +0100, Petr Pajas wrote:
> > Hi Daniel, All,
> >
> > I'm using the xmlCharEncoding argument of htmlCreatePushParserCtxt in
> > order to force the parser to expect a certain input encoding. It works
> > fine but only as long as the HTML document contains no header like
> >
> > <meta http-equiv="Content-Type" content="text/html;
> > charset=iso-8859-2">
> >
> > where charset differs from the encoding which I'm trying to enforce.
> > Imagine, if you like, that I receive the data from a web server, which
> > already sends all as UTF-8, declares correctly Content-Type charset as
> > UTF-8 in the HTTP header, but somehow the document still contains a
> > (forgotten)
> > <meta ...charset=iso-8859-2>.
> >
> > I should mention that both htmlParseDoc() and htmlParseFile(), under the
> > same scenario, do obey the encoding I specify.
>
> htmlRead... should be preferred nowadays.
This actually concerned the old dusty XML::LibXML bindings, so my first
intention was to do just a minimum surgery, leaving the dust safely where
lied. But switching to htmlReadIO proved to be a way better choice than that.
It does exactly what I needed with just a few lines of code.
Cheers,
-- Petr
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]