[xml] Questions on usage: xmlTextReaderCurrentDoc, XPath and xmlTextReaderRead

From: Eric West <rice cruft gmail com>
To: xml gnome org
Subject: [xml] Questions on usage: xmlTextReaderCurrentDoc, XPath and xmlTextReaderRead
Date: Mon, 24 Dec 2007 13:38:51 -0500

[disclaimer: I am new to coding with libxml2]

I have a test program to write and read some sample xml. It "works", but I have noticed that

valgrind reports some problems related to xmlTextReaderRead and xmlNewTextReaderFilename.

[Details below]

My test program uses the TextReader APIs to extract XML content. At parent nodes, it utilizes

XPath queries to extract child and grandchild content. In pseudocode, this is

reader = xmlNewReaderFilename();

ret = xmlReaderRead( reader);

doc = xmlTextReaderCurrentDoc( reader);

while ( ret == 1) {

processNode( reader, doc);

xmlTextReaderRead( reader);

}

xmlFree( doc);

xmlFreeTextReader( reader);

xmlCleanupParser();

Within processNode(), I get path context via the doc handle and then make

a series of XPath queries. The various libxml2 free routines are called on

memory allocated as appropriate. valgrind finds no issues in processNode().

Now at the risk of solving my own problem... If I comment out the call to processNode, valgrind

still flags memory mismanagement. If I also comment out the call to xmlTextReaderCurrentDoc,

voilà! -- valgrind is happy.

Q: Thus I must conclude that there is an order of operations problem here. I noticed that the sample code

textReader3.c does the parsing with xmlTextReader and then calls xmlTextReaderCurrentDoc. That

observation and the documentation suggests that the appropriate approach is (a) parse the entire

file via xmlTextReader and then (b) get a doc pointer to process the in-memory data. Is this correct?

Q: Can xmlTextReader calls be interwoven with XPath queries? (Je pense que non.). The python

example does this, but the equivalent in C is not apparent to me. As best I can tell one needs a

doc pointer to call xmlXPath API:

node = xmlTextReaderExpand( reader);

ctx = xmlXPathNewContext( docPtr );

ctx->node = node

pObj = xmlXPathEval( BAD_CAST xmlXPathQuery, ctx);

Q: Does mixing xmlTextReader API calls with XPath APIs defeat the memory utilization benefits

of the xmlTextReader implementation? At least as per the example, the series of xmlTextReader calls

will build a tree in-memory so that the subsequent call to xmlTextReaderCurrentDoc returns a

pointer to the complete tree.

Thanks in advance.

--Eric

###################

$ valgrind --leak-check=full ./xmlTest

...

==4610== 23 bytes in 3 blocks are definitely lost in loss record 1 of 6

==4610== at 0x4C21D06: malloc (in /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)

==4610== by 0x4ED27BF: xmlStrndup (in /usr/lib64/libxml2.so.2.6.30)

==4610== by 0x4E8483E: xmlNewDoc (in /usr/lib64/libxml2.so.2.6.30)

==4610== by 0x4F20BDD: xmlSAX2StartDocument (in /usr/lib64/libxml2.so.2.6.30)

==4610== by 0x4E7BBB8: xmlParseChunk (in /usr/lib64/libxml2.so.2.6.30)

==4610== by 0x4F0C99D: (within /usr/lib64/libxml2.so.2.6.30)

==4610== by 0x4F0D5CD: xmlTextReaderRead (in /usr/lib64/libxml2.so.2.6.30)

==4610== by 0x40221E: main (xmlTest.c:492)

==4610==

==4610== 4,312 (48 direct, 4,264 indirect) bytes in 1 blocks are definitely lost in loss record 3 of 6

==4610== at 0x4C21D06: malloc (in /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)

==4610== by 0x4F1DA89: xmlDictCreate (in /usr/lib64/libxml2.so.2.6.30)

==4610== by 0x4E65F94: xmlInitParserCtxt (in /usr/lib64/libxml2.so.2.6.30)

==4610== by 0x4E6600D: xmlNewParserCtxt (in /usr/lib64/libxml2.so.2.6.30)

==4610== by 0x4E7E195: xmlCreatePushParserCtxt (in /usr/lib64/libxml2.so.2.6.30)

==4610== by 0x4F0E11C: xmlNewTextReader (in /usr/lib64/libxml2.so.2.6.30)

==4610== by 0x4F0E587: xmlNewTextReaderFilename (in /usr/lib64/libxml2.so.2.6.30)

==4610== by 0x402206: main (xmlTest.c:488)

==4610==

==4610== LEAK SUMMARY:

==4610== definitely lost: 71 bytes in 4 blocks.

==4610== indirectly lost: 4,264 bytes in 5 blocks.

==4610== possibly lost: 0 bytes in 0 blocks.

==4610== still reachable: 0 bytes in 0 blocks.

==4610== suppressed: 0 bytes in 0 blocks.

The problem points seem to be related to xml

E r i c W e s t

Spark! Creative Group

Boston, MA 02134-1406

http://www.sparkcg.com

-- E r i c W e s trice cruft gmail comB o s t o n , M A

Follow-Ups:
- Re: [xml] Questions on usage: xmlTextReaderCurrentDoc, XPath and xmlTextReaderRead
  - From: William M. Brack

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]