[xml] HTMLparser: body/noscript mismatch?




Hi there!

I'm stuck with a <body>/<noscript> HTMLparser mismatch and I don't know if
I'm seeing things wrong. It would be great if anybody could clarify. It
seems that the body/noscript blocks are parsed in the wrong order.

I put the following HTML code into the libxml2-2.6.16 HTMLparser:

<html>
<head>
<meta>http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>TEST</title>
</head>
<noscript>
<body text="#000000">
TEST-NOSCRIPT
</body>
</noscript>
<body>
anything else
</body>
</html>

Using the SAX interface for the HTMLparser I get the following calls:

start html
  start head
    start meta
    end meta
    start title
    end title
  end head
  start body         (<---- expecting noscript here)
    start noscript   (<---- expecting body here)
    end noscript
  end body
  start body
    start p
    end p
  end body
end html

The <body> and <noscript> blocks are confused and the result is not very
much appreciated by browsers. Am I missing something on my side or is this
unwanted behaviour of the HTMLparser?

Any hints would be helpful, thanks.

Cyrill




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]