[xml] HTML script/style parsing change in 2.6.28
- From: "Edward Z. Yang" <edwardzyang thewritingpot com>
- To: xml gnome org
- Subject: [xml] HTML script/style parsing change in 2.6.28
- Date: Sat, 16 Feb 2008 00:01:43 -0500
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I recently upgraded from libxml 2.6.26 to 2.6.31, and was somewhat
astonished when I found that libxml's parsing behavior for HTML
documents had changed slightly. I went to the changelog and dug out this
tasty tidbit:
HTMLparser.c: change the way script/style are parsed to
not try to detect comments, reported by Mike Day (2.6.28)
But, despite my Google-fu, I couldn't find what exactly the change
entailed. Let's suppose we have the code:
<script><!--
alert('Test!');
// --></script>
When we parse it, and then inspect the node inside the <script> tag, I
get the following results:
libxml 2.06.26
int(8) == XML_COMMENT_NODE
string(20) "
alert('Test!');
// "
libxml 2.06.31
int(4) == XML_CDATA_SECTION_NODE
string(27) "<!--
alert('Test!');
// -->"
(these are according to the PHP bindings; it should be self-explanatory
what the C libxml equivalents are).
So, here's my questions:
1. Is the behavior, as I observed it, true to the intention of the change?
2. Is this behavior desirable? As it turns out, the new version returns
*invalid* JavaScript (unless our js parser is smart enough to ignore a
leading <!--)
3. Is it a good idea to do a libxml version sniff (2.6.28 or later) to
accomodate for this behavior change?
Thanks!
- --
Edward Z. Yang GnuPG: 0x869C48DA
HTML Purifier <http://htmlpurifier.org> Anti-XSS Filter
[[ 3FA8 E9A9 7385 B691 A6FC B3CB A933 BE7D 869C 48DA ]]
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHtm43qTO+fYacSNoRAjkSAJ9IB3C8v/UIu8+K5bDlBz2NesSMeACfZVm9
V/PEKDnkdWLSG8x2s0JJeDk=
=Wn5m
-----END PGP SIGNATURE-----
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]