Re: [xml] =?iso-8859-1?q?=BFleak_using_libxml2_for_sax-parsing=09html?= =?iso-8859-1?q?_in_python=3F?=

On Fri, 2007-01-05 at 10:26 -0500, Daniel Veillard wrote:
On Fri, Jan 05, 2007 at 04:06:57PM +0100, Cesar Ortiz wrote:
Hi all,

I am using libxml2 for parsing html in python. I was thinking that libxml2
could be involved, so I modified one of the website python examples in order
to process a revelant number of html files while I checked the memory
comsuption with the top command.

  Which is just a very wrong way to try to assert memory leak.

And... yes! the program does increase the memory consumption till it finish.

  Can be perfectly normal to some point.

if libxml2.debugMemory(1) == 0:
   print "OK"
   print "Memory leak %d bytes" % (libxml2.debugMemory(1))

Libxml2 wise that's the only serious way to check for leaks, check that output.

Daniel is right on the money here. A few other comments:

The python bindings for libxml2 are not "pythonic" in the sense they do
not automatically manage the lifetime of python objects. You must
explicitly free some of the libxml2 objects which is something python
programmers are not used to and may as a consequence overlook thus
producing excessive memory use and leaks.

Top and ps are very poor ways to evaluate memory usage, they often
contain misleading information due to a host of reasons. You're better
off reading the /proc filesystem. Here is a tool which will format that
information in a pleasant way.

One can also be fooled when investigating memory usage with python as
python creates everything on the heap, not just objects., Every piece of
python code which gets loaded directly and indirectly (including all the
doc strings) is allocated as an object, python programs use a lot of
memory and large memory use is not an indicator of leaks.
John Dennis <jdennis redhat com>

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]