Re: [xml] HTMLparser: comments in <style> element



On Mon, Apr 09, 2007 at 06:29:08PM +1000, Michael Day wrote:
Hi,

Currently the HTML parser seems to incorrectly parse comments in the 
<style> element. For example:

<style>
<!--
h1 { color: red }
-->
</style>

Because this is HTML not XML and the <style> element is CDATA not PCDATA 
the <!-- should be treated as text, not as the beginning of a comment. 
However, the HTML parser seems to treat it as an actual comment. 
Surprisingly, the HTML parser does not treat &amp; as an entity 
reference, so it does seem to be partially treating <style> as CDATA.

  See htmlParseScript() in HTMLparser.c , it indeed consider <!-- as
a comment parsing start.
  http://www.w3.org/TR/html4/types.html#type-cdata
says nothing about comments, sone one supposedly must know SGML specific 
on the topic and sorry I never studied SGML. If you have pointer to a
description explaining that comments are not to be interpreted in CDATA
a patch should be easy to design.
But the whole thing is a pile of ad-hoc attempts at working around code
written 10+ years ago , and honnestly I doubt there is any code possible
in libxml2 which will contempt the zillions of different expected behaviour
from various tools, agents etc ... 

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]