Re: [xml] =?iso-8859-1?q?=BFleak_using_libxml2_for_sax-parsing_html_i?= =?iso-8859-1?q?n_python=3F?=



Hi again,
It has sense what you say. I changed the test code to force the calling of  htmlFreeParserCtxt and it works the same ;).

I have also done a translation of the example to C, and it works nicely in my environtment.

One last attempt about the python example... I am sending you the testing python code and a html file it uses. You just have to run it and check the memory with top (or checking /proc) to see the memory consumption (If instead a for loop you change it with an infinite loop the system gets out of memory.  Could you just say to me how the memory consumption evolves in your environment? (Your memory functions report that everything is OK.)

I really do not know what else to do.
Thank you for your patience and help. Cesar

Note: I am really really amazed that I am not able to find in the web anything about people using libxml2 for sax-parsing html files in python :(.

On 1/9/07, Daniel Veillard <veillard redhat com > wrote:
On Tue, Jan 09, 2007 at 11:40:40AM +0100, Cesar Ortiz wrote:
> I´ve been looking at the python bindings and I think I have seen
> something....
> We´ll I have to understand how it exactly works... as I don´t know what is
> every file for.
>
> The thing is that I guess that the function that needs to be called to free
> the context is htmlFreeParserCtxt.

  that's not a problem the structures are the same basically:

/**
* htmlFreeParserCtxt:
* @ctxt:  an HTML parser context
*
* Free all the memory used by a parser context. However the parsed
* document in ctxt->myDoc is not freed.
*/

void
htmlFreeParserCtxt(htmlParserCtxtPtr ctxt)
{
    xmlFreeParserCtxt(ctxt);
}

> But in the context returned by htmlCreatePushParser (class parserCtxt) in
> the __del__ method the function that gets called is xmlFreeParserCtxt.

  which is okay

> So, it is not true that the developer has to free the resources

  I said that the unit of allocation is the *document* . If you get a document
you need to free it. Otherwise cleanup should be automatic in Python. And with
SAX you never got a document.

>  (no python
> style), because when you assign None to the context a 'freeresources'
> function is called.
> Furthemore,  it looks that the wrong funcion is called.

  To me pushSAXhtml.py does not leak memory allocated by libxml2.
Maybe there is a leak, maybe not. Your test case relies on all the .html
and .htm present in your directory somewhere and possibly other things like
your version of python, of the bindings, and of libxml2.
To get back to something debuggable and I can work on, you need to follow
what I said, i.e. get back to the simple case showing a leak when the execution
stops at the end of the script and its output:

libxml2.cleanupParser()
if libxml2.debugMemory(1) == 0:
    print "OK"
else:
    print "Memory leak %d bytes" % (libxml2.debugMemory(1))
    libxml2.dumpMemory()

You can chase that bug on your way too, but sorry, then I can't help, except
by reviewing a patch if you can suggest one in the end.

Daniel

--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library   http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit   http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine   http://rpmfind.net/

Attachment: test_python_leak.tgz
Description: GNU Zip compressed data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]