Re: [xml] Crash when parsing bad HTML

On Wed, Aug 22, 2007 at 07:16:51PM -0400, Pierre Belzile wrote:
   I'm  using  the  HTML  parser  (htmlParseDocument)  and my application
   segfaults when processing this document:
     Some text
   The gdb traceback is:
   #0    0x0000002a96e3a4e4   in   xmlSAX2ProcessingInstruction  ()  from
   #1      0x0000002a96db9779     in     htmlParseEntityRef    ()    from
   #2      0x0000002a96dbb69a     in     htmlParseElement     ()     from
   # ...
   Of  course, it's bad HTML but an exception would be more acceptable. I
   tried  using  the  latest  official  version  (2.6.29) and the problem
   occurs  there too. Is there a patch somewhere? If not, a hint would be
   appreciated because I'm going to have to fix it.
   Cheers, Pierre
  I can't reproduce this. You MUST provide the full input document as an
attachment, I can't reproduce this with xmllint --html . If you do that
right now and I can reproduce this this will be fixed immediately as I think
I will do a release today. But with the current data you provided there is 
nothing more I can do:

paphio:~/XML -> valgrind xmllint --html tst.html 
tst.html:5: HTML parser error : htmlParseStartTag: misplaced <html> tag
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "";>
Some text                                                                  
paphio:~/XML -> rpm -q libxml2
paphio:~/XML -> valgrind /usr/bin/xmllint --html tst.html 
tst.html:5: HTML parser error : htmlParseStartTag: misplaced <html> tag
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "";>
Some text                                                                  
paphio:~/XML -> 

 At this point I have to expect the bug to be somewhere else in your
application because I can't reproduce it really !


Red Hat Virtualization group
Daniel Veillard      | virtualization library
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]