Fwd: Re: [xml] [HTML parser] bug in multiple attributes parsing?



--- Begin Message ---
Emmanuel Saracco writes:
hi,

it seems there is a problem in the HTML parser regarding parsing of IMG
tag. IMG tag allow multiple RECTANGLE attributes, but libxml2 seems to
extract only one.

HTML sample:
-----
<IMG RECTANGLE="(4,40) (68,55) mailto:test free fr" RECTANGLE="(4,19)
(101,35) /dir/test.html" RECTANGLE="(3,2) (82,15) http://www.perl.com";
SRC="images/image.png" BORDER="0" USEMAP="#testmap" WIDTH="139"
HEIGHT="112">
-----

am I wrong?

depends what you call HTML.

If you look at the html 4.01 spec you won't find any rectangle attribute
on img at all.
And, since HTML is based on SGML you can never have any element hanging
more than one instance of any attribute.

On the other hand, HTML as it is found on the web often is just tag soup
and I wouldn't be surprised if some browsers recognice such constructs
or some browser vendors suggested such syntax in the past.
OTOH I didn't find a rectangle attribute for any element in the rather long
list of html dtds on my system. The only occurences of rectangle is found
in ie-2.0.dtd and ie-3.0.dtd where it is used as an attribute value.

You cannot expect libxml2 to parse that.

Morus


--- End Message ---


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]