Hi, As write here http://xmlsoft.org/html/libxml-HTMLparser.html It should be able to parse "real world" HTML, even if severely broken from a specification point of view. The example is based on http://www.voicenews.ca/ Using xmllint --html file with the input: <table> <tr><td><font size=1><a class=menu href="1125.pdf"> 1125.<tr><td><font size=1><a class=menu href="1124.pdf"> 1124.</table> the output is: <table><tr><td><font size="1"><a class="menu" href="1125.pdf"> 1125.<tr><td><font size="1"><a class="menu" href="1124.pdf"> 1124. </a></font></td></tr></a></font></td></tr></table> "<tr><td><font> <tr><td><font> <tr><td><font> </font></td></tr> </font></td></tr> </font></td></tr>" is a wrong fix to this HTML input. The correct is clearly , "<tr><td><font> </font></td></tr> <tr><td><font> </font></td></tr> <tr><td><font> </font></td></tr>" is how HTML is render by any browser .
If you feel that this is a bug in libxml2, please file a bug report there or report it on their mailing list.
I follow yours tip in xmllint --html file Thanks, -- SÃrgio M. B.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature