Re: [xml-bindings]HTML parser segfaults
- From: Gary Benson <gary inauspicious org>
- To: xml-bindings gnome org
- Subject: Re: [xml-bindings]HTML parser segfaults
- Date: Sat, 11 May 2002 14:08:32 +0100 (BST)
On Fri, 10 May 2002, Gary Benson wrote:
> I've been getting segfaults when trying to write a SAX parser for HTML. It
> looks like libxml2.htmlCreatePushParser works correctly but the first time
> you call libxml2.htmlParseChunk it will segv because, in C land,
> ctxt->input (and possibly ctxt) has been trashed somewhere. I got a little
> lost trying to debug it as python seems to be doing some wierd threading
> stuff (and I'm very tired now because of it :-/), so I thought I'd post it
> here and see if anyone else can find it whilst I sleep ;)
Hmmm, so I gave up on gdb and resorted to my old family favourite printf
debugging and eventually found it: libxml2.htmlParseChunk was passing the
python parserCtxt object to libxml2mod.htmlParseChunk rather than the C
parserCtxt object. The attached patch to libxml2.py fixes the problem
(though the generator needs fixing really) and the attached script, a
modification of pushSAX.py, exercises the problem.
Cheers,
Gary
[ gary inauspicious org ][ GnuPG 85A8F78B ][ http://inauspicious.org/ ]
--- python/libxml2.py~ Sat May 11 13:20:10 2002
+++ python/libxml2.py Sat May 11 13:39:53 2002
@@ -335,7 +335,7 @@
def htmlParseChunk(ctxt, chunk, size, terminate):
"""Parse a Chunk of memory"""
- ret = libxml2mod.htmlParseChunk(ctxt, chunk, size, terminate)
+ ret = libxml2mod.htmlParseChunk(ctxt._o, chunk, size, terminate)
return ret
def htmlParseDoc(cur, encoding):
#!/usr/bin/python -u
import sys
import libxml2
# Memory debug specific
libxml2.debugMemory(1)
log = ""
class callback:
def startDocument(self):
global log
log = log + "startDocument:"
def endDocument(self):
global log
log = log + "endDocument:"
def startElement(self, tag, attrs):
global log
log = log + "startElement %s %s:" % (tag, attrs)
def endElement(self, tag):
global log
log = log + "endElement %s:" % (tag)
def characters(self, data):
global log
log = log + "characters: %s:" % (data)
def warning(self, msg):
global log
log = log + "warning: %s:" % (msg)
def error(self, msg):
global log
log = log + "error: %s:" % (msg)
def fatalError(self, msg):
global log
log = log + "fatalError: %s:" % (msg)
handler = callback()
ctxt = libxml2.htmlCreatePushParser(handler, "<foo", 4, "test.xml")
chunk = " url='tst'>b"
libxml2.htmlParseChunk(ctxt, chunk, len(chunk), 0)
chunk = "ar</foo>"
libxml2.htmlParseChunk(ctxt, chunk, len(chunk), 1)
ctxt=None
reference = "startDocument:startElement foo {'url': 'tst'}:characters: bar:endElement foo:endDocument:"
if log != reference:
print "Error got: %s" % log
print "Exprected: %s" % reference
sys.exit(1)
# Memory debug specific
libxml2.cleanupParser()
if libxml2.debugMemory(1) == 0:
print "OK"
else:
print "Memory leak %d bytes" % (libxml2.debugMemory(1))
libxml2.dumpMemory()
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]