Re: [xml] DOM parser and HTML entities inside the <script> tag



On Fri, Jul 20, 2012 at 08:23:26PM -0400, Liam R E Quin wrote:
On Fri, 2012-07-20 at 09:03 -0500, Raymond Irving wrote: 
Thanks for the feedback Micheal.

I thought that the first occurrence of </script or </style would be signal
the end of the element's content but I guess the W3C had something else in
mind.

HTML 4 (that you are using) was based on ISO 8879 SGML, and the
ISO-defined rules for parsing CDATA elements are as described: the first
</ ends the element. It's better either to use external JavaScript or to
surround it with a CDATA section,
<![CDATA[

then your script here

]]></script>

However, be careful not to have ]]> inside the script!


Or, short answer, it wasn't a W3C decision :-) For what it's worth I
always thought it should work as you describe, although people would
still get caught out.

  I have had people complaining about either behaviour over time !
As a result I provided the 2 behaviours:

htmlParseScript(....)

        if ((cur == '<') && (NXT(1) == '/')) {
            /*
             * One should break here, the specification is clear:
             * Authors should therefore escape "</" within the content.
             * Escape mechanisms are specific to each scripting or
             * style sheet language.
             *
             * In recovery mode, only break if end tag match the
             * current tag, effectively ignoring all tags inside the
             * script/style block and treating the entire block as
             * CDATA.
             */
            if (ctxt->recovery) {


 When creating the (HTML) parser context then if you give
HTML_PARSE_RECOVER option then libxml2 will close on the matching
closing tag (after complaining of the script misbehaviour !)

 Default behaviour per the spec:

paphio:~/XML -> xmllint --html tst.html
tst.html:5: HTML parser error : Unexpected end tag : p
   var h="<p>Some other text</p>";
                                ^
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "
http://www.w3.org/TR/REC-html40/loose.dtd";>
<html><head><script type="text/javascript">
   var d="&quot;Hello world;&quot; &lt;Test&gt; &amp; ";
   var h="<p>Some other text";
</script></head></html>
paphio:~/XML ->

 Trying to recover broken HTML :-)

paphio:~/XML -> xmllint --html --recover tst.html
tst.html:5: HTML parser error : Element script embeds close tag
   var h="<p>Some other text</p>";
                            ^
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "
http://www.w3.org/TR/REC-html40/loose.dtd";>
<html><head><script type="text/javascript">
   var d="&quot;Hello world;&quot; &lt;Test&gt; &amp; ";
   var h="<p>Some other text</p>";
</script></head></html>
paphio:~/XML ->

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]