Re: [xml] Questions on usage: xmlTextReaderCurrentDoc, XPath and xmlTextReaderRead



I'm responding to your questions on Valgrind, but leaving the
questions on Text Reader vs. XPath to others:

Eric West wrote:
[disclaimer: I am new to coding with libxml2]

Good reason to rely heavily on libxml2/doc/examples for code models.

I have a test program to write and read some sample xml. It "works",
but I have noticed that
valgrind reports some problems related to xmlTextReaderRead and
xmlNewTextReaderFilename.

Using Valgrind on your programs is a GOOD thing.  I only wish more
would follow your example.

[Details below]

My test program uses the TextReader APIs to extract XML content. At
parent nodes, it utilizes
XPath queries to extract child and grandchild content. In
pseudocode, this is

     reader = xmlNewReaderFilename();
     ret = xmlReaderRead( reader);
     doc = xmlTextReaderCurrentDoc( reader);
     while ( ret == 1) {
           processNode( reader, doc);
           xmlTextReaderRead( reader);
      }

      xmlFree( doc);

There is room for improvement here - check what reader3.c and
reader4.c do with the doc returned by xmlTextReaderCurrentDoc. 
(Hint: they *don't* use xmlFree for this).

      xmlFreeTextReader( reader);
      xmlCleanupParser();

Within processNode(), I get path context via the doc handle and then
make
a series of XPath queries. The various libxml2 free routines are
called on
memory allocated as appropriate. valgrind finds no issues in
processNode().

Now at the risk of solving my own problem... If I comment out the
call to processNode, valgrind
still flags memory mismanagement. If I also comment out the call to
xmlTextReaderCurrentDoc,
voilà! -- valgrind is happy.

Q: Thus I must conclude that there is an order of operations problem
here. I noticed that the sample code
textReader3.c does the parsing with xmlTextReader and then calls
xmlTextReaderCurrentDoc. That
observation and the documentation suggests that the appropriate
approach is (a) parse the entire
file via xmlTextReader and then (b) get a doc pointer to process the
in-memory data. Is this correct?

A: follow the sequence(s) used by the example programs.

Q: Can xmlTextReader calls be interwoven with XPath queries? (Je
pense que non.). The python
example does this, but the equivalent in C is not apparent to me. As
best I can tell one needs a
doc pointer to call xmlXPath API:

      node =  xmlTextReaderExpand( reader);
      ctx = xmlXPathNewContext( docPtr );
      ctx->node = node
      pObj = xmlXPathEval( BAD_CAST xmlXPathQuery, ctx);

Q: Does mixing xmlTextReader API calls with XPath APIs defeat the
memory utilization benefits
of the xmlTextReader implementation? At least as per the example,
the series of xmlTextReader calls
will build a tree in-memory so that the subsequent call to
xmlTextReaderCurrentDoc returns a
pointer to the complete tree.

Thanks in advance.


  --Eric


###################

$ valgrind --leak-check=full ./xmlTest
    ...
==4610== 23 bytes in 3 blocks are definitely lost in loss record 1
of 6
==4610==    at 0x4C21D06: malloc (in
/usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)
==4610==    by 0x4ED27BF: xmlStrndup (in
/usr/lib64/libxml2.so.2.6.30)
==4610==    by 0x4E8483E: xmlNewDoc (in
/usr/lib64/libxml2.so.2.6.30)
==4610==    by 0x4F20BDD: xmlSAX2StartDocument (in
/usr/lib64/libxml2.so.2.6.30)
==4610==    by 0x4E7BBB8: xmlParseChunk (in
/usr/lib64/libxml2.so.2.6.30)
==4610==    by 0x4F0C99D: (within /usr/lib64/libxml2.so.2.6.30)
==4610==    by 0x4F0D5CD: xmlTextReaderRead (in
/usr/lib64/libxml2.so.2.6.30)
==4610==    by 0x40221E: main (xmlTest.c:492)
==4610==
==4610==
==4610== 4,312 (48 direct, 4,264 indirect) bytes in 1 blocks are
definitely lost in loss record 3 of 6
==4610==    at 0x4C21D06: malloc (in
/usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)
==4610==    by 0x4F1DA89: xmlDictCreate (in
/usr/lib64/libxml2.so.2.6.30)
==4610==    by 0x4E65F94: xmlInitParserCtxt (in
/usr/lib64/libxml2.so.2.6.30)
==4610==    by 0x4E6600D: xmlNewParserCtxt (in
/usr/lib64/libxml2.so.2.6.30)
==4610==    by 0x4E7E195: xmlCreatePushParserCtxt (in
/usr/lib64/libxml2.so.2.6.30)
==4610==    by 0x4F0E11C: xmlNewTextReader (in
/usr/lib64/libxml2.so.2.6.30)
==4610==    by 0x4F0E587: xmlNewTextReaderFilename (in
/usr/lib64/libxml2.so.2.6.30)
==4610==    by 0x402206: main (xmlTest.c:488)
==4610==
==4610== LEAK SUMMARY:
==4610==    definitely lost: 71 bytes in 4 blocks.
==4610==    indirectly lost: 4,264 bytes in 5 blocks.
==4610==      possibly lost: 0 bytes in 0 blocks.
==4610==    still reachable: 0 bytes in 0 blocks.
==4610==         suppressed: 0 bytes in 0 blocks.

The problem points seem to be related to xml



--
E r i c   W e s t
Spark! Creative Group
Boston, MA 02134-1406
http://www.sparkcg.com

-- E r i c   W e s trice cruft gmail comB o s t o n ,   M
A_______________________________________________

Bill




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]