Re: [xml] Parsing XML in embedded environment



On 06/12/2011 03:17 AM, Daniel Veillard wrote:
On Sat, Jun 11, 2011 at 01:28:37PM +0200, David Kubicek wrote:
Perhaps one more question. This is going to be hard to find for me.
Now, with every document I parse, a new context is created before
and freed after "xmlParseDocument(ctx)".

Also, my external loader is called for every document, every time
creating a new xmlParserInput for the same DTD. Using Valgrind, I
can see that without repeatedly creating xmlParserInput, the app
only eats about 7MB of heap after processing all documents. When I
enable the external loader, it consumes over 57MB and runs visibly
slower (it's a weak HW).

How do I make this work without repeating the xmlParserInput
allocation for every document (every external entity loader
invocation)?
   Impossible to answer correctly. A parser input when attached to a
parser context will be freed when parsing finishes as part of the
context freeing work.
   If your memory augment, either your set of documents augments or
you forgot to free some of the parser contexts, or ...
   But if you are parsing N document you will need at least N
xmlParserInput one per entity parsed. You can't "reuse" them as they
are freed.

   Debugging can't be done out of a cristal ball... take your code,
compile and run it on a platform where you have something like valgrind
and see where your memory go.

   There are things that are available for limiting memory usage
for parsing and parsed document, like using the new APIs xmlRead...
which build a dictionary of shared strings for the document(s)
You can also reuse parsing context and there are tricks to reuse the
same dictionary for multiple documents.

Thank you for your answer.

I found the same thing when I analyzed memory allocation throughout your library. The xmlSAX2ExternalSubset() handler frees the input when it's done, that's why my external entity handler couldn't reuse it. I could disable the free, manage a cache in the loader and free when finished, and it actually worked fine and was much faster (no leaks, no unalloc'd/free'd reads/writes, 0 bytes in use on app exit), but I don't want to keep a modified version of libxml along with the app.

In the end, I stripped the DTD down to only entities instead of the whole XML schema and the performance & memory usage is satisfactory. Plus, I also found the possibility of reusing the context and I'm doing that too.

BTW, those mem numbers actually were from Valgrind. I wouldn't write a line without it, it is a must-have tool, isn't it.

--
David Kubicek




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]