Re: [xml] HTML script/style parsing change in 2.6.28



Hi Edward,

HTMLparser.c: change the way script/style are parsed to
          not try to detect comments, reported by Mike Day (2.6.28)

That would be me.

But, despite my Google-fu, I couldn't find what exactly the change
entailed. Let's suppose we have the code:

<script><!--
alert('Test!');
// --></script>

In XML, this is a comment. In HTML, it isn't, as <script> and <style> tags are unparsed CDATA in HTML.

The habit of commenting out the content dates back to ancient browsers that didn't recognise these elements, and would include the JavaScript or CSS as literal text in the page.

Since no browsers do this any more, there is no point adding comments, but millions of existing pages already have them.

1. Is the behavior, as I observed it, true to the intention of the change?

Yes, it makes the libxml2 HTML parser consistent with web browsers.

2. Is this behavior desirable? As it turns out, the new version returns
*invalid* JavaScript (unless our js parser is smart enough to ignore a
leading <!--)

HTML comment delimiters should be ignored by the parser. If you check the CSS spec, you'll see that it actually mentions how <!-- is ignored.

3. Is it a good idea to do a libxml version sniff (2.6.28 or later) to
accomodate for this behavior change?

No idea, entirely up to you :)

Best regards,

Michael

--
Print XML with Prince!
http://www.princexml.com



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]