[xml] libxml-HTMLparser.html

As write here http://xmlsoft.org/html/libxml-HTMLparser.html
It should be able to parse "real world" HTML, even if severely broken from a specification point of view.
The example is based on http://www.voicenews.ca/

Using xmllint --html file 

with the input:
<table>  <tr><td><font size=1><a 
class=menu  href="1125.pdf">  1125.<tr><td><font size=1><a class=menu
href="1124.pdf">  1124.</table>

the output is:
<table><tr><td><font size="1"><a class="menu" href="1125.pdf">
1125.<tr><td><font size="1"><a class="menu" href="1124.pdf">  1124.

"<tr><td><font> <tr><td><font> <tr><td><font>
</font></td></tr> </font></td></tr> </font></td></tr>" is a wrong fix to
this HTML input.

The correct is clearly , 
"<tr><td><font> </font></td></tr> <tr><td><font> </font></td></tr>
<tr><td><font> </font></td></tr>"
is how HTML is render by any browser . 

If you feel that this is a 
bug in libxml2, please file a bug report there or report it on their 
mailing list.

I follow yours tip in xmllint --html file 

SÃrgio M. B.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]