Re: [xml-bindings]HTML parser segfaults



On Sat, May 11, 2002 at 02:08:32PM +0100, Gary Benson wrote:
> 
> On Fri, 10 May 2002, Gary Benson wrote:
> 
> > I've been getting segfaults when trying to write a SAX parser for HTML. It
> > looks like libxml2.htmlCreatePushParser works correctly but the first time
> > you call libxml2.htmlParseChunk it will segv because, in C land,
> > ctxt->input (and possibly ctxt) has been trashed somewhere. I got a little
> > lost trying to debug it as python seems to be doing some wierd threading
> > stuff (and I'm very tired now because of it :-/), so I thought I'd post it
> > here and see if anyone else can find it whilst I sleep ;)
> 
> Hmmm, so I gave up on gdb and resorted to my old family favourite printf 

  yup debugging in the stubs can become a bit ... messy.

> debugging and eventually found it: libxml2.htmlParseChunk was passing the 
> python parserCtxt object to libxml2mod.htmlParseChunk rather than the C 
> parserCtxt object. The attached patch to libxml2.py fixes the problem 
> (though the generator needs fixing really) and the attached script, a 
> modification of pushSAX.py, exercises the problem.
> 
> Cheers,
> Gary
> 
> [ gary inauspicious org ][ GnuPG 85A8F78B ][ http://inauspicious.org/ ]

> --- python/libxml2.py~	Sat May 11 13:20:10 2002
> +++ python/libxml2.py	Sat May 11 13:39:53 2002
> @@ -335,7 +335,7 @@
>  
>  def htmlParseChunk(ctxt, chunk, size, terminate):
>      """Parse a Chunk of memory"""
> -    ret = libxml2mod.htmlParseChunk(ctxt, chunk, size, terminate)
> +    ret = libxml2mod.htmlParseChunk(ctxt._o, chunk, size, terminate)
>      return ret

  Hum, that code is generated ... this seems related to a problem in the
stub generator code which doesn't handle htmlParserCtxtPtr as xmlParserCtxtPtr
arguments, I fixed generator.py and now libxml2.py contains the proper
libxml2mod.htmlParseChunk() call, It actually then becomes a method of
the parserCtxt class in that case, I tried to fix the example bu now I get
paphio:~/XML/python/tests -> ./pushSAXhtml.py
Error got: startDocument:startElement html None:startElement body None:startElement foo {'url': 'tst'}:error: Tag foo invalid
:characters: bar:endElement foo:endElement body:endElement html:endDocument:
Exprected: startDocument:startElement foo {'url': 'tst'}:characters: bar:endElement foo:endDocument:
paphio:~/XML/python/tests ->

  I will commit anyway an fix this later,

  thanks for pointing this !

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]