Re: [xml] strange transformCtxt free-ing problem



While waiting for a proper test file, I did some more checking, this time
with Valgrind (unfortunately, Valgrind and Python tend to work in a very
adversarial manner, so this isn't very easy).  I think I now understand
where the original trouble comes from.

The pythonDocLoaderFuncWrapper function in python/libxslt.c creates a
parser context, pctxt, using xmlNewParserCtxt().  Apparently the reason
for doing this is so that later, after calling the user's loader, some
additional error checking and cleanup can be done.  pctxt is then
converted into a Python object [pctxtobj =
libxml_xmlParserCtxtPtrWrap(pctxt)] which is passed to the user's loader
function (the routine 'loader' in your test program).

Within the loader function which you posted, you create a new object:
  parserContext = libxml2.parserCtxt(_obj=pctx)
where 'pctx' is pctxtobj.  However, parserContext is only a local python
object, so at the end of the loader function Python very kindly calls
upon it's Garbage Collector to dispose of it.  That action causes
xmlFreeParserCtxt to be called for the underlying parser context pointer,
which in this instance is the (original C) variable pctxt.  That, in
turn, causes nothing but trouble for the remainder of the code within
pythonDocLoaderFuncWrapper.

That's about as far as I can go, since it now would appear to be a
problem with the basic design of the loader code.  Please let me know if
I have made some error in my analysis described above, or if I can
further assist in any way.

Bill

William M. Brack wrote:
Could you provide the file ("file2.html") that you are using for this
test which fails?  If I use a file like libxml2/test/HTML/doc2.htm:

bill bbopt ~/gnomesvn/work $ ln -s HTML/doc2.htm file2.html
bill bbopt ~/gnomesvn/work $ python bug.py
./file2.html:10: HTML parser error : Misplaced DOCTYPE declaration
<!-- END Naviscope Javascript --><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML
4.0 Tra
                                 ^
<?xml version="1.0"?>
<html>
  <head/>
  <body>
    <div>
<!-- saved from url=(0016)http://intranet/ -->
<!-- BEGIN Naviscope Javascript -->
<!-- END Naviscope Javascript -->
<!-- saved from url=(0027)http://www.agents-tech.com/ -->
    </div>
    <div>this is xml</div>
  </body>
</html>

which seems to indicate that at least something is working :-).

(note that I'm using the latest SVN for both libxslt and libxml2)


Bill

Nic James Ferrier wrote:
Daniel Veillard <veillard redhat com> writes:

Nic said:
 *** glibc detected *** double free or corruption (!prev): 0x081b6300
***
 Aborted

  But did you update libxslt too and make install for it too ? Please
do
he fixed the problems in libxslt not in libxml2,

Ah!

Yes. It stopped segfaulting. I can't get it to parse the HTML... but
it has stopped segfaulting.

  doc.dump(sys.stdout)

shows this for every document I get back that parses:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd";>

Here's the relevant bit of the loader again:

  def loader(url, pctx, ctx, type):
      doc = None
      context_object = None
      if type:
          context_object = libxslt.stylesheet(_obj=ctx)
      else:
          context_object = libxslt.transformCtxt(_obj=ctx)
      # The parserContext and resulting document
      parserContext = libxml2.parserCtxt(_obj=pctx)
      doc = None
      if url == "/one":
          doc = parserContext.htmlCtxtReadFile("file2.html", "UTF8", 1)
      else:
          doc = parserContext.ctxtReadDoc("""<document>
  <h1>this is xml</h1>
  </document>""", url, "UTF8", 0)
      return doc


so when I ask for "/one" from my stylesheet I get back (practically)
nothing.

--
Nic Ferrier
http://www.tapsellferrier.co.uk   for all your tapsell ferrier needs
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml



_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]