[xml] [PATCH] Visible HTML elements close the head tag
- From: conrad irwin gmail com
- To: xml gnome org
- Subject: [xml] [PATCH] Visible HTML elements close the head tag
- Date: Fri, 27 Jul 2012 15:42:27 -0700
From: Conrad Irwin <conrad irwin gmail com>
Hi Xml,
In HTML email it's common to find arbitrary fragments of HTML, the one
that triggered this change was of the form:
<meta><font></font><div>...
Before this change the <font> tag was part of the implicit <head> that
gets created for the <meta> tag, after this change, it is part of the
<body>, which more closely matches the behaviour of modern HTML
implementations.
Is there a good reason that these tags didn't close the <head> tag
before? I'm also not sure about applet/embed/object, so I've left them
out of the list for now.
It might be better to move towards a more-HTML-5-based approach where
any non-head-supported tag causes the <head> to be closed. See Section
12.2.5.4.4 The "in head" insertion mode. [1] But I'm not sure what the
current plans are for HTML-5 in libxml2?
Conrad
[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-inhead
---
HTMLparser.c | 39 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 38 insertions(+), 1 deletion(-)
diff --git a/HTMLparser.c b/HTMLparser.c
index 66ff17b..53b3217 100644
--- a/HTMLparser.c
+++ b/HTMLparser.c
@@ -1082,7 +1082,7 @@ static const char * const htmlStartClose[] = {
"div", "p", "head", NULL,
"noscript", "p", NULL,
"center", "font", "b", "i", "p", "head", NULL,
-"a", "a", NULL,
+"a", "a", "head", NULL,
"caption", "p", NULL,
"colgroup", "caption", "colgroup", "col", "p", NULL,
"col", "caption", "col", "p", NULL,
@@ -1100,6 +1100,43 @@ static const char * const htmlStartClose[] = {
"option", "option", NULL,
"fieldset", "legend", "p", "head", "h1", "h2", "h3", "h4", "h5", "h6",
"pre", "listing", "xmp", "a", NULL,
+/* most tags in in FONTSTYLE, PHRASE and SPECIAL should close <head> */
+"tt", "head", NULL,
+"i", "head", NULL,
+"b", "head", NULL,
+"u", "head", NULL,
+"s", "head", NULL,
+"strike", "head", NULL,
+"big", "head", NULL,
+"small", "head", NULL,
+
+"em", "head", NULL,
+"strong", "head", NULL,
+"dfn", "head", NULL,
+"code", "head", NULL,
+"samp", "head", NULL,
+"kbd", "head", NULL,
+"var", "head", NULL,
+"cite", "head", NULL,
+"abbr", "head", NULL,
+"acronym", "head", NULL,
+
+/* "a" */
+"img", "head", NULL,
+/* "applet" */
+/* "embed" */
+/* "object" */
+"font", "head", NULL,
+/* "basefont" */
+"br", "head", NULL,
+/* "script" */
+"map", "head", NULL,
+"q", "head", NULL,
+"sub", "head", NULL,
+"sup", "head", NULL,
+"span", "head", NULL,
+"bdo", "head", NULL,
+"iframe", "head", NULL,
NULL
};
--
1.7.12.rc0.10.g476109f
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]