Re: [xml] Push-parsing Unicode with LibXML2
- From: Rob Richards <rrichards ctindustries net>
- To: Kasimier Buchcik <K Buchcik 4commerce de>
- Cc: ML-libxml2 <xml gnome org>, Daniel Veillard <veillard redhat com>
- Subject: Re: [xml] Push-parsing Unicode with LibXML2
- Date: Wed, 15 Feb 2006 08:50:00 -0500
After reading this thread and the comments in the bug report I have a
few questions/comments.
Kasimier Buchcik wrote:
To me the most logical would be to do surgery on your input stream
you are modifying it by changing its encoding, you should then also
change or remove the encoding declaration of the xmlDecl if present.
We are doing this in our Delphi DOM-wrapper and lxml does it as well.
I guess PHP does something similar.
Since in Delphi we defined the DOMString to be little-endian with
no BOM, we currently do the following if parsing a DOMString:
PHP doesn't play around with encoding or even implement a DOMString in
the DOM extension. If any special encoding needs to be handled using a
string it's up to the user to encode it as needed. The specified
document encoding or BOM is what is used to determine encoding as I
really dont agree with overriding encoding and haven't heard any
complaints yet.
I do have a question on Kasimier's latest comment in the bug report
about keeping any specified encoding if the document. If this value is
not kept, then what encoding is used when the document is serialized and
not explicitly passed to the save functions? Would it use the overriding
value rather than the origional one specified in the XMLDecl?
In any event whatever change is made to this I doubt it will have any
impact on my side in terms of breakage since I don't muck around with
encoding while parsing and use different I/O routines in the event any
changes are made here for some sort of encoding detection (i.e. http
headers, etc..).
Rob
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]