Re: [xml] Crash when parsing bad HTML



On Wed, Aug 22, 2007 at 07:16:51PM -0400, Pierre Belzile wrote:

   Hi,
   I'm  using  the  HTML  parser  (htmlParseDocument)  and my application
   segfaults when processing this document:
   <HTML>
   <PRE>
     Some text
   <?PRE>
   <HTML>
   The gdb traceback is:
   #0    0x0000002a96e3a4e4   in   xmlSAX2ProcessingInstruction  ()  from
   /usr/lib64/libxml2.so.2
   #1      0x0000002a96db9779     in     htmlParseEntityRef    ()    from
   /usr/lib64/libxml2.so.2
   #2      0x0000002a96dbb69a     in     htmlParseElement     ()     from
   /usr/lib64/libxml2.so.2
   # ...
   Of  course, it's bad HTML but an exception would be more acceptable. I
   tried  using  the  latest  official  version  (2.6.29) and the problem
   occurs  there too. Is there a patch somewhere? If not, a hint would be
   appreciated because I'm going to have to fix it.
   Cheers, Pierre

  I can't reproduce this. You MUST provide the full input document as an
attachment, I can't reproduce this with xmllint --html . If you do that
right now and I can reproduce this this will be fixed immediately as I think
I will do a release today. But with the current data you provided there is 
nothing more I can do:

paphio:~/XML -> valgrind xmllint --html tst.html 
tst.html:5: HTML parser error : htmlParseStartTag: misplaced <html> tag
<HTML>        
     ^
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd";>
<html><body><pre>                                                                        
Some text                                                                  
<?PRE></pre></body></html>
paphio:~/XML -> rpm -q libxml2
libxml2-2.6.29-1
paphio:~/XML -> valgrind /usr/bin/xmllint --html tst.html 
tst.html:5: HTML parser error : htmlParseStartTag: misplaced <html> tag
<HTML>        
     ^
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd";>
<html><body><pre>                                                                        
Some text                                                                  
<?PRE></pre></body></html>
paphio:~/XML -> 

 At this point I have to expect the bug to be somewhere else in your
application because I can't reproduce it really !

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]