[xml] [PATCH] Visible HTML elements close the head tag



From: Conrad Irwin <conrad irwin gmail com>

Hi Xml,

In HTML email it's common to find arbitrary fragments of HTML, the one
that triggered this change was of the form:

    <meta><font></font><div>...

Before this change the <font> tag was part of the implicit <head> that
gets created for the <meta> tag, after this change, it is part of the
 <body>, which more closely matches the behaviour of modern HTML
implementations.

Is there a good reason that these tags didn't close the <head> tag
before? I'm also not sure about applet/embed/object, so I've left them
out of the list for now.

It might be better to move towards a more-HTML-5-based approach where
any non-head-supported tag causes the <head> to be closed. See Section
12.2.5.4.4 The "in head" insertion mode. [1] But I'm not sure what the
current plans are for HTML-5 in libxml2?

Conrad

[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-inhead

---
 HTMLparser.c | 39 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/HTMLparser.c b/HTMLparser.c
index 66ff17b..53b3217 100644
--- a/HTMLparser.c
+++ b/HTMLparser.c
@@ -1082,7 +1082,7 @@ static const char * const htmlStartClose[] = {
 "div",         "p", "head", NULL,
 "noscript",    "p", NULL,
 "center",      "font", "b", "i", "p", "head", NULL,
-"a",           "a", NULL,
+"a",           "a", "head", NULL,
 "caption",     "p", NULL,
 "colgroup",    "caption", "colgroup", "col", "p", NULL,
 "col",         "caption", "col", "p", NULL,
@@ -1100,6 +1100,43 @@ static const char * const htmlStartClose[] = {
 "option",      "option", NULL,
 "fieldset",    "legend", "p", "head", "h1", "h2", "h3", "h4", "h5", "h6",
                "pre", "listing", "xmp", "a", NULL,
+/* most tags in in FONTSTYLE, PHRASE and SPECIAL should close <head> */
+"tt",          "head", NULL,
+"i",           "head", NULL,
+"b",           "head", NULL,
+"u",           "head", NULL,
+"s",           "head", NULL,
+"strike",      "head", NULL,
+"big",         "head", NULL,
+"small",       "head", NULL,
+
+"em",          "head", NULL,
+"strong",      "head", NULL,
+"dfn",         "head", NULL,
+"code",                "head", NULL,
+"samp",                "head", NULL,
+"kbd",         "head", NULL,
+"var",         "head", NULL,
+"cite",                "head", NULL,
+"abbr",                "head", NULL,
+"acronym",     "head", NULL,
+
+/* "a" */
+"img",         "head", NULL,
+/* "applet" */
+/* "embed" */
+/* "object" */
+"font",                "head", NULL,
+/* "basefont" */
+"br",          "head", NULL,
+/* "script" */
+"map",         "head", NULL,
+"q",           "head", NULL,
+"sub",         "head", NULL,
+"sup",         "head", NULL,
+"span",                "head", NULL,
+"bdo",         "head", NULL,
+"iframe",      "head", NULL,
 NULL
 };
 
-- 
1.7.12.rc0.10.g476109f




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]