Re: [xml-bindings]HTML parser segfaults



On Fri, 10 May 2002, Gary Benson wrote:

> I've been getting segfaults when trying to write a SAX parser for HTML. It
> looks like libxml2.htmlCreatePushParser works correctly but the first time
> you call libxml2.htmlParseChunk it will segv because, in C land,
> ctxt->input (and possibly ctxt) has been trashed somewhere. I got a little
> lost trying to debug it as python seems to be doing some wierd threading
> stuff (and I'm very tired now because of it :-/), so I thought I'd post it
> here and see if anyone else can find it whilst I sleep ;)

Hmmm, so I gave up on gdb and resorted to my old family favourite printf 
debugging and eventually found it: libxml2.htmlParseChunk was passing the 
python parserCtxt object to libxml2mod.htmlParseChunk rather than the C 
parserCtxt object. The attached patch to libxml2.py fixes the problem 
(though the generator needs fixing really) and the attached script, a 
modification of pushSAX.py, exercises the problem.

Cheers,
Gary

[ gary inauspicious org ][ GnuPG 85A8F78B ][ http://inauspicious.org/ ]
--- python/libxml2.py~	Sat May 11 13:20:10 2002
+++ python/libxml2.py	Sat May 11 13:39:53 2002
@@ -335,7 +335,7 @@
 
 def htmlParseChunk(ctxt, chunk, size, terminate):
     """Parse a Chunk of memory"""
-    ret = libxml2mod.htmlParseChunk(ctxt, chunk, size, terminate)
+    ret = libxml2mod.htmlParseChunk(ctxt._o, chunk, size, terminate)
     return ret
 
 def htmlParseDoc(cur, encoding):
#!/usr/bin/python -u
import sys
import libxml2

# Memory debug specific
libxml2.debugMemory(1)

log = ""

class callback:
    def startDocument(self):
        global log
        log = log + "startDocument:"

    def endDocument(self):
        global log
        log = log + "endDocument:"

    def startElement(self, tag, attrs):
        global log
        log = log + "startElement %s %s:" % (tag, attrs)

    def endElement(self, tag):
        global log
        log = log + "endElement %s:" % (tag)

    def characters(self, data):
        global log
        log = log + "characters: %s:" % (data)

    def warning(self, msg):
        global log
        log = log + "warning: %s:" % (msg)

    def error(self, msg):
        global log
        log = log + "error: %s:" % (msg)

    def fatalError(self, msg):
        global log
        log = log + "fatalError: %s:" % (msg)

handler = callback()

ctxt = libxml2.htmlCreatePushParser(handler, "<foo", 4, "test.xml")
chunk = " url='tst'>b"
libxml2.htmlParseChunk(ctxt, chunk, len(chunk), 0)
chunk = "ar</foo>"
libxml2.htmlParseChunk(ctxt, chunk, len(chunk), 1)
ctxt=None

reference = "startDocument:startElement foo {'url': 'tst'}:characters: bar:endElement foo:endDocument:"
if log != reference:
    print "Error got: %s" % log
    print "Exprected: %s" % reference
    sys.exit(1)

# Memory debug specific
libxml2.cleanupParser()
if libxml2.debugMemory(1) == 0:
    print "OK"
else:
    print "Memory leak %d bytes" % (libxml2.debugMemory(1))
    libxml2.dumpMemory()


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]