Re: [xml] DOM parser and HTML entities inside the <script> tag
- From: Daniel Veillard <veillard redhat com>
- To: Liam R E Quin <liam holoweb net>
- Cc: xml gnome org
- Subject: Re: [xml] DOM parser and HTML entities inside the <script> tag
- Date: Mon, 23 Jul 2012 15:47:26 +0800
On Fri, Jul 20, 2012 at 08:23:26PM -0400, Liam R E Quin wrote:
On Fri, 2012-07-20 at 09:03 -0500, Raymond Irving wrote:
Thanks for the feedback Micheal.
I thought that the first occurrence of </script or </style would be signal
the end of the element's content but I guess the W3C had something else in
mind.
HTML 4 (that you are using) was based on ISO 8879 SGML, and the
ISO-defined rules for parsing CDATA elements are as described: the first
</ ends the element. It's better either to use external JavaScript or to
surround it with a CDATA section,
<![CDATA[
then your script here
]]></script>
However, be careful not to have ]]> inside the script!
Or, short answer, it wasn't a W3C decision :-) For what it's worth I
always thought it should work as you describe, although people would
still get caught out.
I have had people complaining about either behaviour over time !
As a result I provided the 2 behaviours:
htmlParseScript(....)
if ((cur == '<') && (NXT(1) == '/')) {
/*
* One should break here, the specification is clear:
* Authors should therefore escape "</" within the content.
* Escape mechanisms are specific to each scripting or
* style sheet language.
*
* In recovery mode, only break if end tag match the
* current tag, effectively ignoring all tags inside the
* script/style block and treating the entire block as
* CDATA.
*/
if (ctxt->recovery) {
When creating the (HTML) parser context then if you give
HTML_PARSE_RECOVER option then libxml2 will close on the matching
closing tag (after complaining of the script misbehaviour !)
Default behaviour per the spec:
paphio:~/XML -> xmllint --html tst.html
tst.html:5: HTML parser error : Unexpected end tag : p
var h="<p>Some other text</p>";
^
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "
http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><script type="text/javascript">
var d=""Hello world;" <Test> & ";
var h="<p>Some other text";
</script></head></html>
paphio:~/XML ->
Trying to recover broken HTML :-)
paphio:~/XML -> xmllint --html --recover tst.html
tst.html:5: HTML parser error : Element script embeds close tag
var h="<p>Some other text</p>";
^
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "
http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><script type="text/javascript">
var d=""Hello world;" <Test> & ";
var h="<p>Some other text</p>";
</script></head></html>
paphio:~/XML ->
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
daniel veillard com | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library http://libvirt.org/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]