On Tue, Apr 22, 2008 at 12:18:20PM -0400, Daniel Veillard wrote:
On Tue, Apr 22, 2008 at 03:56:33PM +0200, Arnold Hendriks wrote:Daniel Veillard wrote:I think the embedding error condition should be noted somewhere in the parser state and disable at least partially the closing tag processing so that the 'end text' paragraph shows up as a sibling of the 'embbeded text' paragraph.It probably should generate an error, yes. My patch simply ignores the situtation.but break the normal cases, which is not acceptable, nice try ;-)
Proper patch, reusing ctxt->depth which is not used in the HTML parser yet to count the number of times an opening tag has been ignored, and reused to drop the closing tags. Of course extra or missing ending tags are still possible, but at this point one can only do heuristics. Works properly for me, will commit soonish unless i hear a good reason against it in the meantime: wei:~/XML -> ./xmllint --html autoskip.html autoskip.html:3: HTML parser error : htmlParseStartTag: misplaced <html> tag <html xml:lang="en" xmlns="foobar"> ^ autoskip.html:4: HTML parser error : htmlParseStartTag: misplaced <body> tag <body> ^ <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html><body> <p>some text </p> <p>embbeded text</p> <p>end text </p> </body></html> wei:~/XML -> Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
Attachment:
autoskip.patch
Description: Text document