Re: [xml] Help, but this looks like a massive memory leak



On Fri, Mar 23, 2001 at 05:16:35PM -0600, stan boehm philips com wrote:
Am I the only one using this code that is experiencing a major memory leak?

I performed the following experiment that is easily duplicated, and I
provide the code below to do so.

The code is a near exact implementation of the code as suggested
in 
         "The XML C library for Gnome"
section:   "Invoking the parser: the push method"

except that I added code to test for parse errors.

The idea is to  
      1) read a file
      2) parse it
      3) release all memory using  xmlParseChunk
      4) close the file
      repeat 10 times on the same file.

As I understand the documentation, I only need to call
xmlParseChunk  to free everything created by xmlCreatePushParserCtxt
and xmlParseChunk.   If this is not true, then the documentation
needs to be corrected.   If this is true, it sure looks like a massive
leak to me.

  Well 
  http://xmlsoft.org/#Invoking

------------------------------------
FILE *f;

f = fopen(filename, "r");
if (f != NULL) {
    int res, size = 1024;
    char chars[1024];
    xmlParserCtxtPtr ctxt;

    res = fread(chars, 1, 4, f);
    if (res > 0) {
        ctxt = xmlCreatePushParserCtxt(NULL, NULL,
                    chars, res, filename);
        while ((res = fread(chars, 1, size, f)) > 0) {
            xmlParseChunk(ctxt, chars, res, 0);
        }
        xmlParseChunk(ctxt, chars, 0, 1);
        doc = ctxt->myDoc;
        xmlFreeParserCtxt(ctxt);
    }
}
------------------------------------

  Sounds clear to me, but maybe I'm just more aware of the code
but xmlParseChunk() parses some of the input, and does not free them
as your are suggesting in your 3) point. It doesn't free anything
because:
  - it's called in the loop
  - the array passed is actually allocated on the stack
  Looks clear to mee too from this except (which is actually nearly
a direct cut and paste from some of the xmllint --push code) that
the routine used for freeing the parser data is actually 
xmlFreeParserCtxt() .

   You can check that this sequence doesn't doesn't leak memory
by configuring the package using:
  --with-mem-debug        Add the memory debugging module (off)
at the end of the run, once you have called xmlCleanupParser();
(to remove the general shared data needed by the parser(s) like
predefined entities), you can call xmlMemoryDump(); which will dump
the blocks allocated left. I always run with memory allocation debug
in my devel environment, and the content of the .memdumps is also
checked when I do make test for all the regression suite. There may
be mem leaks but not in any of the cases exercisez by the regression
tests of libxml (or libxslt). I save you the time for testing this if
you accept to believe me:

orchis:~/XML -> ./xmllint --push --valid --repeat --noout test/valid/REC-xml-19980210.xml 
orchis:~/XML -> cat .memdump 
      02:59:30 PM

      MEMORY ALLOCATED : 0, MAX was 878821
BLOCK  NUMBER   SIZE  TYPE
orchis:~/XML -> 

  the tests loops 100 times over parsing and validating the
XML from the XML specification. You can grab the code from
xmllint.c directly (check for push).

If one could count all the mallocs plus strdups, the result
should be exactly equall to the "free"s.    Ignoring any 
the use of realloc.

  Ignoring realloc is a mistake, libxml do use buffers which
may be resized dynamically as needed.

At least that is my understanding of how malloc and free
relate to one another.

  Forgetting realloc and strdup like routines sounds fishy
if you want to make a decent evaluation. Your best bet is to
compile with memory debug and modify xmlmemory.c to do whatever
checking you need to. You can also change the basic initial
buffer size if you don't like it or the allocation strategy
using :
    xmlSetBufferAllocationScheme()
and
    xmlDefaultBufferSize = xxx; (this is a global varibale).

  You're definitely not the first one to use libxml for embedded
stuff (as the use of QNX can lead me to believe), others have paved the
way. You can also trimm down the library by disably at configure
time most of the extra parts (XPath/XPointer ...).

As it turns out it is not difficult to do this counting.  Simply tweak
xmlmemory.c as shown in my code example: leak.
To me it looks like there are 2 mallocs for ever free.

  don't forget the Realloc()s and don't forget to call
xmlFreeParserCtxt() . Last but not least call xmlMemoryDump()
to see the blocks left in use and where they were allocated.

Perhaps I am an idiot, but I made this code as close to the
documented recommendation as possible, but the number
of malloc's for exceeds the number of frees and not by just
a little bit.   

  Check the P.S. at the bottom ...

I did this on the QNX  OS, if anyone out there could perform
this test on a different OS, I would appreciate knowing if QNX
has a problem.

  I don't see how this could relate to a given OS, all the parsing
tree manipulation, and memory interfaces don't have a single 
OS specific code in them.

  Well good luck for the leak quest,

Daniel

P.S.: I just looked at your code excerpt and I think there is no
      need to check much longer:
        - you fopen() twice the file, but this won't leak memory
          for libxml (but surely file descriptors and libc allocated
          one !)
        - you save the tree generated by the parser at the end of the
          computation but never free it, of course the example coming
          in the doc assume you make use of it and free it after use,
          and you don't add an xmlFreeDoc(doc) that's the memory leak !
      also the xmlFreeParserCtxt() should be called within the 
      if (ctxt) {} but that's minor.
      Enjoy !

-- 
Daniel Veillard      | Red Hat Network http://redhat.com/products/network/
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]