On Thu, Apr 12, 2007 at 01:14:20PM +1000, Michael Day wrote:
Here is the patch to stop htmlParseScript() interpreting <!-- as the 
start of a comment.

  yeah the patch is apparently quite simple. But this has non
neglectible side effects, this can be seen in test/HTML/doc3.htm
around line 785, you have a SCRIPT embedded in the middle of 
the document, it uses <!-- to escape some Javascript, which does

  From a script perspective '</' immediately ends up the current
element (c.f. the comment embedded in the function and the related )

 So changing this is not a garanteed gain in the absolute. More
errors are gonna be raised (and the regression tests will need to be fixed).
To some extend the <!-- is used to avoid errors in some environments
and basically that's what libxml2 parser was doing.

  I'm not against the change, but I must raise the drawback publicly
too before applying it.

  I have only one personal comment: HTML parsing is pure hell, you just
cannot do it right, no matter how hard you try.


