Re: [xml] Push-parsing Unicode with LibXML2



On Tue, Feb 14, 2006 at 01:38:45AM -0800, Eric Seidel wrote:
As I see it, my only options are:

1.  Find (with your help) some way to hack around libxml's encoding- 
overrides-everything behavior.  (This might mean detecting and  
stripping <?xml... lines or encoding="" attributes from the input  
stream.)
2.  Ask you nicely to add an API for disabling this behavior (or  
otherwise manually overriding the encoding.)
3.  Hack some such manual-encoding-override behavior into the Mac OS  
X system version of libxml2 for our next release.  (My least favorite  
option.)

Any suggestions are most welcome...

  To me the most logical would be to do surgery on your input stream
you are modifying it by changing its encoding, you should then also 
change or remove the encoding declaration of the xmlDecl if present.
  However to follow appendix F2 the user provided encoding should
override the detected one, so that could be considered a libxml2 bug,
I'm just really worried about breaking existing code in changing this.

  Other suggestion: don't mess with the LE or BE specific names for
UTF-16, just use "UTF-16", the parser automatically ajust anyway.

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]