Re: [xml] strange transformCtxt free-ing problem

While waiting for a proper test file, I did some more checking, this time
with Valgrind (unfortunately, Valgrind and Python tend to work in a very
adversarial manner, so this isn't very easy).  I think I now understand
where the original trouble comes from.

The pythonDocLoaderFuncWrapper function in python/libxslt.c creates a
parser context, pctxt, using xmlNewParserCtxt().  Apparently the reason
for doing this is so that later, after calling the user's loader, some
additional error checking and cleanup can be done.  pctxt is then
converted into a Python object [pctxtobj =
libxml_xmlParserCtxtPtrWrap(pctxt)] which is passed to the user's loader
function (the routine 'loader' in your test program).

Within the loader function which you posted, you create a new object:
  parserContext = libxml2.parserCtxt(_obj=pctx)
where 'pctx' is pctxtobj.  However, parserContext is only a local python
object, so at the end of the loader function Python very kindly calls
upon it's Garbage Collector to dispose of it.  That action causes
xmlFreeParserCtxt to be called for the underlying parser context pointer,
which in this instance is the (original C) variable pctxt.  That, in
turn, causes nothing but trouble for the remainder of the code within

That's about as far as I can go, since it now would appear to be a
problem with the basic design of the loader code.  Please let me know if
I have made some error in my analysis described above, or if I can
further assist in any way.


William M. Brack wrote:
Could you provide the file ("file2.html") that you are using for this
test which fails?  If I use a file like libxml2/test/HTML/doc2.htm:

bill bbopt ~/gnomesvn/work $ ln -s HTML/doc2.htm file2.html
bill bbopt ~/gnomesvn/work $ python
./file2.html:10: HTML parser error : Misplaced DOCTYPE declaration
<!-- END Naviscope Javascript --><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML
4.0 Tra
<?xml version="1.0"?>
<!-- saved from url=(0016)http://intranet/ -->
<!-- BEGIN Naviscope Javascript -->
<!-- END Naviscope Javascript -->
<!-- saved from url=(0027) -->
    <div>this is xml</div>

which seems to indicate that at least something is working :-).

(note that I'm using the latest SVN for both libxslt and libxml2)


Nic James Ferrier wrote:
Daniel Veillard <veillard redhat com> writes:

Nic said:
 *** glibc detected *** double free or corruption (!prev): 0x081b6300

  But did you update libxslt too and make install for it too ? Please
he fixed the problems in libxslt not in libxml2,


Yes. It stopped segfaulting. I can't get it to parse the HTML... but
it has stopped segfaulting.


shows this for every document I get back that parses:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"

Here's the relevant bit of the loader again:

  def loader(url, pctx, ctx, type):
      doc = None
      context_object = None
      if type:
          context_object = libxslt.stylesheet(_obj=ctx)
          context_object = libxslt.transformCtxt(_obj=ctx)
      # The parserContext and resulting document
      parserContext = libxml2.parserCtxt(_obj=pctx)
      doc = None
      if url == "/one":
          doc = parserContext.htmlCtxtReadFile("file2.html", "UTF8", 1)
          doc = parserContext.ctxtReadDoc("""<document>
  <h1>this is xml</h1>
  </document>""", url, "UTF8", 0)
      return doc

so when I ask for "/one" from my stylesheet I get back (practically)

Nic Ferrier   for all your tapsell ferrier needs
xml mailing list, project page
xml gnome org

xml mailing list, project page
xml gnome org

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]