Re: [xml] Push-parsing Unicode with LibXML2



On Tue, Feb 14, 2006 at 12:45:14AM -0800, Eric Seidel wrote:
I'm now looking for a way to make libxml ignore the  
encoding="iso-8859-1" attribute, and instead rely on the utf-16 it  
autodetected (or which I can manually specify).

  xmlCreatePushParserCtxt() doesn't have an encoding option, but
calling xmlCtxtResetPush() after its creation with the parameters
might help. Note that you really should try to pass all parameters
an not NULLs/0, things like the filename which sets the base URI are
important for further processing of URI references.
  And please don't push one byte at a time, after that people may
claim that libxml2 is a poor performer !

I also saw at:
http://xmlsoft.org/encoding.html#extend
you mention it might be possible to make libxml use all utf-16  
internally.  Do you know if anyone has tried?

  Not possible, that was written like 5 years ago. Won't work without
massive changes diverting from the normal code base, I'm removing that
paragraph. In general UTF-16 is a really bad choice, it makes data larger
in most cases, killing caches performances, and you don't even have the
nice property of a constant byte per character ratio.

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]