[xml] libxml-HTMLparser.html

From: Sergio Monteiro Basto <sergio sergiomb no-ip org>
To: xml gnome org
Subject: [xml] libxml-HTMLparser.html
Date: Fri, 16 Apr 2010 03:40:49 +0100

Hi,
As write here http://xmlsoft.org/html/libxml-HTMLparser.html
It should be able to parse "real world" HTML, even if severely broken from a specification point of view.
The example is based on http://www.voicenews.ca/

Using xmllint --html file 

with the input:
<table>  <tr><td><font size=1><a 
class=menu  href="1125.pdf">  1125.<tr><td><font size=1><a class=menu
href="1124.pdf">  1124.</table>

the output is:
<table><tr><td><font size="1"><a class="menu" href="1125.pdf">
1125.<tr><td><font size="1"><a class="menu" href="1124.pdf">  1124.
</a></font></td></tr></a></font></td></tr></table>

"<tr><td><font> <tr><td><font> <tr><td><font>
</font></td></tr> </font></td></tr> </font></td></tr>" is a wrong fix to
this HTML input.

The correct is clearly , 
"<tr><td><font> </font></td></tr> <tr><td><font> </font></td></tr>
<tr><td><font> </font></td></tr>"
is how HTML is render by any browser .

If you feel that this is a 
bug in libxml2, please file a bug report there or report it on their 
mailing list.


I follow yours tip in xmllint --html file 


Thanks,
-- 
SÃrgio M. B.

Attachment: smime.p7s
Description: S/MIME cryptographic signature

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]