Re: [xml] Push-parsing Unicode with LibXML2
- From: Eric Seidel <eseidel apple com>
- To: veillard redhat com
- Cc: xml gnome org
- Subject: Re: [xml] Push-parsing Unicode with LibXML2
- Date: Tue, 14 Feb 2006 01:38:45 -0800
Daniel-
On Feb 14, 2006, at 12:59 AM, Daniel Veillard wrote:
On Tue, Feb 14, 2006 at 12:45:14AM -0800, Eric Seidel wrote:
I'm now looking for a way to make libxml ignore the
encoding="iso-8859-1" attribute, and instead rely on the utf-16 it
autodetected (or which I can manually specify).
xmlCreatePushParserCtxt() doesn't have an encoding option, but
calling xmlCtxtResetPush() after its creation with the parameters
might help.
xmlParserCtxtPtr parser = xmlCreatePushParserCtxt(handlers, 0,
0, 0, 0);
xmlCtxtResetPush(parser, 0, 0, 0, "UTF-16BE");
Has no effect. Looking at the code for xmlParseChunk (and more
specifically xmlParseEncodingDecl), I can see why. The code seems
written such that an encoding="<name here>" attribute will always
trump any previously detected encoding. (see parser.c 8786, in
xmlParseEncodingDecl) At least if the parser is in XML_PARSER_START
mode.
Just for grins, I tried forcing the parser to start in
XML_PARSER_MISC after manually specifying the encoding, but that only
resulted in an "XML declaration allowed only at the start of the
document" error.
As I see it, my only options are:
1. Find (with your help) some way to hack around libxml's encoding-
overrides-everything behavior. (This might mean detecting and
stripping <?xml... lines or encoding="" attributes from the input
stream.)
2. Ask you nicely to add an API for disabling this behavior (or
otherwise manually overriding the encoding.)
3. Hack some such manual-encoding-override behavior into the Mac OS
X system version of libxml2 for our next release. (My least favorite
option.)
Any suggestions are most welcome...
Note that you really should try to pass all parameters
an not NULLs/0, things like the filename which sets the base URI are
important for further processing of URI references.
I will certainly consider adding the URI.
And please don't push one byte at a time, after that people may
claim that libxml2 is a poor performer !
:) Of course not. This is just for testing. I'm pushing one byte
at a time to make this easier to debug.
Thanks again for all your help.
-eric
Daniel
--
Daniel Veillard | Red Hat http://redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://
xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]