Re: [xml] HTML parsing problem (choking on embedded HTML tags) still exists for me



Cyrill Osterwalder wrote:
...
I'm addressing this issue once more because I'd like to find out if this kind
of HTML tag processing is by design given in the HTML parser of libxml.

No, it's by design because of standards.

...

Here is the example again:

If the parser processes the following HTML page it seems to interpret the
quoted "</HEAD>" end tag (at **) and inserts the assumed to be missing
"</script></head><body>" tags. Same thing with the subsequent quoted
"</HTML>" tag (at ***).

I'm often rather amazed at how carefully Daniel implements specs;
even specs that aren't that obvious at first reading.
HTML4 indeed treats the content of script, and style, somewhat specially.
But rather than ending with </script> (or </style>), they end
at the first </[a-zA-Z]

See  http://www.w3.org/TR/html4/appendix/notes.html#notes-specifying-data

So, according to the spec, your example is illegal; it should contain
<\/HEAD> and <\/HTML>

<html>
<head>
<title>TEST LIBXML HTML PARSER</title>
<script LANGUAGE="JavaScript">
function preview(textarea_obj) {
        var txt = get_textarea(textarea_obj);
        var pop_win = window.open("", "win", "width=400,height=250");
        pop_win.document.open("text/html", "replace");
        pop_win.document.write("<HTML>");
        pop_win.document.write("<HEAD>");
        pop_win.document.write("<title>Post Previewer</title>");
        pop_win.document.write("<link rel=stylesheet type=text/css
href=default.css>");
**      pop_win.document.write("</HEAD>");
        pop_win.document.write(txt);
***     pop_win.document.write("</HTML>");
        pop_win.focus();
}
</script>
</head>
<body>
...
...


Thanks for any hints, again.

Cyrill
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
http://mail.gnome.org/mailman/listinfo/xml



--
bruce miller nist gov
http://math.nist.gov/~BMiller/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]