[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: [xml] encoding and htmlCreatingPushParser
- From: Daniel Veillard <veillard redhat com>
- To: Petr Pajas <pajas ufal ms mff cuni cz>
- Cc: libxml <xml gnome org>
- Subject: Re: [xml] encoding and htmlCreatingPushParser
- Date: Fri, 10 Nov 2006 09:37:40 -0500
On Fri, Nov 10, 2006 at 03:15:40PM +0100, Petr Pajas wrote:
> Hi Daniel, All,
>
> I'm using the xmlCharEncoding argument of htmlCreatePushParserCtxt in order to
> force the parser to expect a certain input encoding. It works fine but only
> as long as the HTML document contains no header like
>
> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">
>
> where charset differs from the encoding which I'm trying to enforce. Imagine,
> if you like, that I receive the data from a web server, which already sends
> all as UTF-8, declares correctly Content-Type charset as UTF-8 in the HTTP
> header, but somehow the document still contains a (forgotten)
> <meta ...charset=iso-8859-2>.
>
> I should mention that both htmlParseDoc() and htmlParseFile(), under the same
> scenario, do obey the encoding I specify.
htmlRead... should be preferred nowadays.
> Since these methods are more high-level I'm not sure whether it's a bug or
> feature that htmlCreatePushParserCtxt() favors <meta> over the encoding
> specified in the constructor while htmlParse* do otherwise. In either case,
> I'd be interested if there is currently some way to force encoding with the
> push parser (so that the <meta> won't overrride it)?
I don't have time right now to go check and potentially make a patch. Best
is to bugzilla, but don't hold your breath !
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]