Re: [xml] Push-parsing Unicode with LibXML2
- From: Daniel Veillard <veillard redhat com>
- To: Eric Seidel <eseidel apple com>
- Cc: xml gnome org
- Subject: Re: [xml] Push-parsing Unicode with LibXML2
- Date: Tue, 14 Feb 2006 03:33:33 -0500
On Mon, Feb 13, 2006 at 03:40:48PM -0800, Eric Seidel wrote:
We convert everything to UTF16, and pass around only UTF16 strings
internally in WebKit (http://www.webkit.org). If that means we have
to also removed the encoding information from the string before
passing it into libxml (or better yet, tell libxml to ignore it) we
can do that.
In our case, we don't want the parser to autodetect. We do all that
already in WebKit, we'd just like to pass an already properly decoded
utf16 string off to libxml and let it do its magic.
In my example it still seems that libxml falls over well before
actually reaching any xml encoding declaration. The first byte
passed seems to put the parser context into an error state. Any
thoughts on what might be causing this? Again, removing my bogus
xmlSwitchEncoding call, does not change the behavior.
First thing I notice is that you pass one byte at a time. At best
this is just massively inefficient, at worse you're hitting a bug .
The source from parse4.c does not do this.
Also if you have converted to a memory string, why do you need to use
progressive parsing ? If the conversion is progressive, I still doubt
it delivers data byte by byte, just pass the blocks as they are converted.
Daniel
--
Daniel Veillard | Red Hat http://redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]