Re: [xml] HTMLparser: comments in <style> element
- From: Daniel Veillard <veillard redhat com>
- To: Michael Day <mikeday yeslogic com>
- Cc: xml gnome org
- Subject: Re: [xml] HTMLparser: comments in <style> element
- Date: Thu, 12 Apr 2007 06:17:44 -0400
On Thu, Apr 12, 2007 at 01:14:20PM +1000, Michael Day wrote:
Hi Daniel,
Here is the patch to stop htmlParseScript() interpreting <!-- as the
start of a comment.
yeah the patch is apparently quite simple. But this has non
neglectible side effects, this can be seen in test/HTML/doc3.htm
around line 785, you have a SCRIPT embedded in the middle of
the document, it uses <!-- to escape some Javascript, which does
document.write("42DF8478957377></IFRAME>");
From a script perspective '</' immediately ends up the current
element (c.f. the comment embedded in the function and the related
http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.2.1 )
So changing this is not a garanteed gain in the absolute. More
errors are gonna be raised (and the regression tests will need to be fixed).
To some extend the <!-- is used to avoid errors in some environments
and basically that's what libxml2 parser was doing.
I'm not against the change, but I must raise the drawback publicly
too before applying it.
I have only one personal comment: HTML parsing is pure hell, you just
cannot do it right, no matter how hard you try.
Daniel
--
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard | virtualization library http://libvirt.org/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]