Re: [xml] Push-parsing Unicode with LibXML2



On Tue, Feb 14, 2006 at 01:14:21PM +0100, Kasimier Buchcik wrote:
  To me the most logical would be to do surgery on your input stream
you are modifying it by changing its encoding, you should then also 
change or remove the encoding declaration of the xmlDecl if present.

We are doing this in our Delphi DOM-wrapper and lxml does it as well.
I guess PHP does something similar.

  Okay this really means libxml2 need to be fixed to follow appendix F2
of the spec for the APIs where the encoding is passed to the parser 
prior any parsing was done.
  Please bugzilla I will get to this, it may actually be relatively
simple ...

  However to follow appendix F2 the user provided encoding should
override the detected one, so that could be considered a libxml2 bug,
I'm just really worried about breaking existing code in changing this.

Fooling the parser in order to eat the user's encoding works, but
it's not nice.
I wonder if we could have an additional xmlParserOption,
e.g. XML_PARSE_OVERRIDEENCDECL, to explicitely instruct the parser to
parse the encoding declaration, but not to use it; this wouldn't break
existing code.

  I don't think it should be a parser option. Clearly if you pass a non-NULL
encoding string when preparing the parse it has to override the XMLDecl
(I fought that decision, I really think it's one of the broken aspects of
the spec and leads to major incompatibilities, but well that's how it is).

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]