Re: [xml] Push-parsing Unicode with LibXML2



After reading this thread and the comments in the bug report I have a few questions/comments.

Kasimier Buchcik wrote:
  To me the most logical would be to do surgery on your input stream
you are modifying it by changing its encoding, you should then also change or remove the encoding declaration of the xmlDecl if present.
We are doing this in our Delphi DOM-wrapper and lxml does it as well.
I guess PHP does something similar.

Since in Delphi we defined the DOMString to be little-endian with
no BOM, we currently do the following if parsing a DOMString:
PHP doesn't play around with encoding or even implement a DOMString in the DOM extension. If any special encoding needs to be handled using a string it's up to the user to encode it as needed. The specified document encoding or BOM is what is used to determine encoding as I really dont agree with overriding encoding and haven't heard any complaints yet.

I do have a question on Kasimier's latest comment in the bug report about keeping any specified encoding if the document. If this value is not kept, then what encoding is used when the document is serialized and not explicitly passed to the save functions? Would it use the overriding value rather than the origional one specified in the XMLDecl?

In any event whatever change is made to this I doubt it will have any impact on my side in terms of breakage since I don't muck around with encoding while parsing and use different I/O routines in the event any changes are made here for some sort of encoding detection (i.e. http headers, etc..).

Rob



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]