[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: [xml] Push-parsing Unicode with LibXML2
- From: Daniel Veillard <veillard redhat com>
- To: Eric Seidel <eseidel apple com>
- Cc: xml gnome org
- Subject: Re: [xml] Push-parsing Unicode with LibXML2
- Date: Tue, 14 Feb 2006 03:33:33 -0500
On Mon, Feb 13, 2006 at 03:40:48PM -0800, Eric Seidel wrote:
> We convert everything to UTF16, and pass around only UTF16 strings
> internally in WebKit (http://www.webkit.org). If that means we have
> to also removed the encoding information from the string before
> passing it into libxml (or better yet, tell libxml to ignore it) we
> can do that.
>
> In our case, we don't want the parser to autodetect. We do all that
> already in WebKit, we'd just like to pass an already properly decoded
> utf16 string off to libxml and let it do its magic.
>
> In my example it still seems that libxml falls over well before
> actually reaching any xml encoding declaration. The first byte
> passed seems to put the parser context into an error state. Any
> thoughts on what might be causing this? Again, removing my bogus
> xmlSwitchEncoding call, does not change the behavior.
First thing I notice is that you pass one byte at a time. At best
this is just massively inefficient, at worse you're hitting a bug .
The source from parse4.c does not do this.
Also if you have converted to a memory string, why do you need to use
progressive parsing ? If the conversion is progressive, I still doubt
it delivers data byte by byte, just pass the blocks as they are converted.
Daniel
--
Daniel Veillard | Red Hat http://redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]