Hi all, I’ve found 3 problems with xmlTextReader, used
from Python. I provide my code and a test example, so as to reproduce them or
discard them since maybe I misused the API. Some brief context: I’m interested in
processing a XML file in “semi-streaming” mode: the input XML is
copied without change to the output except for a series of sub-trees
(identified for instance by their node name, e.g. the <PAGE> nodes),
which I want to process in DOM using the expand method of the xmlTextReader
API. Sounds nice, but copying isn’t so easy in fact. The (little) problems: Pb 1 - how to process the XML declaration ,
e.g. <?xml version="1.0"?> Pb 2 – the QuoteChar() method seems to always
return “ even if a ‘ was used to enclose an attribute, e.g. a=’123’ Pb 3 – in text node and attribute values,
entities are strangely dealt with by the Value() method: for instance a
& becomes a & in the returned string
Actually a rdr.CurrentDoc().encodeEntitiesReentrant(rdr.Value()) gives a
correct output, so it’s even more strange to me Those problems are visible using the attached xmldump.py
code below which simply copies its input to its output. A test file is also
there. Thanks for your help/comments, JL |
Attachment:
xmldump.py
Description: xmldump.py
Attachment:
test_simple.xml
Description: test_simple.xml