Re: [xml] HTMLparser: body/noscript mismatch?



On Fri, Jan 21, 2005 at 05:30:04PM +0100, Cyrill Osterwalder wrote:

Hi there!

I'm stuck with a <body>/<noscript> HTMLparser mismatch and I don't know if
I'm seeing things wrong. It would be great if anybody could clarify. It
seems that the body/noscript blocks are parsed in the wrong order.

I put the following HTML code into the libxml2-2.6.16 HTMLparser:

<html>
<head>
<meta>http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>TEST</title>
</head>
<noscript>
<body text="#000000">
TEST-NOSCRIPT
</body>
</noscript>
<body>
anything else
</body>
</html>

Using the SAX interface for the HTMLparser I get the following calls:

start html
  start head
    start meta
    end meta
    start title
    end title
  end head
  start body         (<---- expecting noscript here)
    start noscript   (<---- expecting body here)
    end noscript
  end body
  start body
    start p
    end p
  end body
end html

The <body> and <noscript> blocks are confused and the result is not very
much appreciated by browsers. Am I missing something on my side or is this
unwanted behaviour of the HTMLparser?

  The HTML parser seems to consider that only head and body are allowed
as child of html, and so it opens a body when seeing the noscript,

Any hints would be helpful, thanks.

  First time I hear about noscript...

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]