[xml] parsing an infinite sequence of XML documents



The task is to parse an infinite sequence of (rather simple) XML
documents, to be read from a socket. This is what I'm doing, using
lixml2-2.6.3 on Linux:

- read the next document (the end is known without having to parse)
  and store it into an xmlBuffer
- create a push parser (parser = xmlCreatePushParserCtxt(...))
- set up an error callback (xmlSetStructuredErrorFunc)
- turn on parser options (xmlCtxtuseOptions(XML_PARSE_NOERRORS,
  XML_PARSE_NOBLANKS, XML_PARSE_NONET))
- feed the contents of the buffer to the parser, free the buffer
- extract the document (doc = parser->myDoc), process it

...and now, to clean up, as memory leaks are obviously a no-no (and
compiling with --with-mem-debug and xmlMemoryDump has shown me what
would be left behind):

- xmlFreeParserCtxt( parser )
  xmlFreeDoc(doc)

This, however, causes problems, segfaulting while recursing through the
document, somwhere in connection with dictionary lookup while freeing
memory. (I could and would provide details, if this is possibly a bug.)
Reversing the order of these two calls wasn't successful either.

Luckily, I hit upon adding the option XML_PARSE_NODICT, and now the
shown sequence works fine.

Questions:
(1) Is there a better sequence to achieve the goal outlined above,
e.g. by just *resetting*, i.e. not destroying and recreating the parser
(and freeing the document tree). I tried a few calls, but nothing seemed
to work.

(2) I'm somewhat uneasy about using the XML_PARSE_NODICT option:
does it have some disadvantage?

(3) Shouldn't the above sequence work even without the XML_PARSE_NODICT?

Thanks for any non-NULL pointers ;-)
Wolfgang Laun




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]