Re: [xml] Bug in HTML parser output?



On Fri, Apr 26, 2002 at 11:16:51AM +0100, Matt Sergeant wrote:
In using libxml2's HTML parser to create valid XML, I noticed a "bug"...

xmllint --html --format http://www.messagelabs.com/VirusEye/ | xmllint -

Croaks on the bad ---> comment in the HTML.

Is there any way to make this just "work"?

  hum, right this seems a loophole, the HTML parser is overly flexible to
be able to parse what's found on the net, but doesn't take corrective measures
to cleanup things like HTML comments 

(yeah I know I should get them to fix thier nasty HTML too)

  I wonder what's the best approach: 
    - fix the HTML importer 
    - fix the XML serializer

the second case sounds quite more generic, I would be tempted to go that 
way. How urgent is this ?

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]