[xml] Change in HTML "embed" handling breaks parser in 2.6.29+



Hi,

I noticed a problem with the new way libxml2 2.6.29+ handles the HTML "embed"
tag. It serialises it without the enclosing tag, which then lets following
attempts to parse the document fail, as the information where the tag is
closed gets lost. Here's an example:

$ cat embed.html
<html><body>
<embed src="http://www.youtube.com/v/183tVH1CZpA";
type="application/x-shockwave-flash"></embed>
<embed src="http://anothersite.com/v/another";></embed>
<script src="http://www.youtube.com/example.js";></script>
<script src="/something-else.js"></script>
</body></html>

$ xmllint --html embed.html > embed2.html

$ cat embed2.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd";>
<html><body>
<embed src="http://www.youtube.com/v/183tVH1CZpA";
type="application/x-shockwave-flash"><embed
src="http://anothersite.com/v/another";><script
src="http://www.youtube.com/example.js";></script><script
src="/something-else.js"></script>
</body></html>

$ xmllint --html embed2.html > embed3.html

$ cat embed3.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd";>
<html><body>
<embed src="http://www.youtube.com/v/183tVH1CZpA";
type="application/x-shockwave-flash"><embed
src="http://anothersite.com/v/another";><script
src="http://www.youtube.com/example.js";></script><script
src="/something-else.js"></script></embed></embed>
</body></html>

Note that the "script" tags have moved into the "embed" tag, although
originally they were siblings.

I think the place to fix this is the serialiser rather than the parser. It
should always emit a closing tag here.

Stefan




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]