Re: [xml] Push-parsing Unicode with LibXML2



On Mon, Feb 13, 2006 at 03:40:48PM -0800, Eric Seidel wrote:
We convert everything to UTF16, and pass around only UTF16 strings  
internally in WebKit (http://www.webkit.org).  If that means we have  
to also removed the encoding information from the string before  
passing it into libxml (or better yet, tell libxml to ignore it) we  
can do that.

In our case, we don't want the parser to autodetect.  We do all that  
already in WebKit, we'd just like to pass an already properly decoded  
utf16 string off to libxml and let it do its magic.

In my example it still seems that libxml falls over well before  
actually reaching any xml encoding declaration.  The first byte  
passed seems to put the parser context into an error state.  Any  
thoughts on what might be causing this?  Again, removing my bogus  
xmlSwitchEncoding call, does not change the behavior.

  First thing I notice is that you pass one byte at a time. At best
this is just massively inefficient, at worse you're hitting a bug .
The source from parse4.c does not do this.
  Also if you have converted to a memory string, why do you need to use
progressive parsing ? If the conversion is progressive, I still doubt
it delivers data byte by byte, just pass the blocks as they are converted.

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]