[xml] 3 pb with the xmlTextReader APi from Python



Hi all,

 

I’ve found 3 problems with xmlTextReader, used from Python. I provide my code and a test example, so as to reproduce them or discard them since maybe I misused the API.

 

Some brief context: I’m interested in processing a XML file in “semi-streaming” mode: the input XML is copied without change to the output except for a series of sub-trees (identified for instance by their node name, e.g. the <PAGE> nodes), which I want to process in DOM using the expand method of the xmlTextReader API. Sounds nice, but copying isn’t so easy in fact.

 

 

The (little) problems:

Pb 1 -  how to process the XML declaration , e.g. <?xml version="1.0"?>

Pb 2 – the QuoteChar() method seems to always return “ even if a ‘ was used to enclose an attribute, e.g. a=’123’

Pb 3 – in text node and attribute values, entities are strangely dealt with by the Value() method: for instance a &amp; becomes a & in the returned string

            Actually a rdr.CurrentDoc().encodeEntitiesReentrant(rdr.Value()) gives a correct output, so it’s even more strange to me

 

Those problems are visible using the attached xmldump.py code below which simply copies its input to its output. A test file is also there.

 

Thanks for your help/comments,

 

JL

 

 

Attachment: xmldump.py
Description: xmldump.py

Attachment: test_simple.xml
Description: test_simple.xml



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]