Re: [xml] HTMLparser: comments in <style> element



On Thu, Apr 12, 2007 at 01:14:20PM +1000, Michael Day wrote:
Hi Daniel,

Here is the patch to stop htmlParseScript() interpreting <!-- as the 
start of a comment.


  yeah the patch is apparently quite simple. But this has non
neglectible side effects, this can be seen in test/HTML/doc3.htm
around line 785, you have a SCRIPT embedded in the middle of 
the document, it uses <!-- to escape some Javascript, which does
  document.write("42DF8478957377></IFRAME>");

  From a script perspective '</' immediately ends up the current
element (c.f. the comment embedded in the function and the related
 http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.2.1 )

 So changing this is not a garanteed gain in the absolute. More
errors are gonna be raised (and the regression tests will need to be fixed).
To some extend the <!-- is used to avoid errors in some environments
and basically that's what libxml2 parser was doing.

  I'm not against the change, but I must raise the drawback publicly
too before applying it.

  I have only one personal comment: HTML parsing is pure hell, you just
cannot do it right, no matter how hard you try.

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]