On Fri, Jan 29, 2010 at 12:16:40AM -0800, Aaron Patterson wrote:
Hi, I'm trying to use xmlParseInNodeContext along with an HTML document. It seems to be exhibiting strange behavior, but I'm not sure. It seems that when I'm dealing with HTML documents, no matter what context node I give xmlParseInNodeContext, I always end up with an entire HTML document returned rather than just the few nodes I was hoping for. I've written a sample program to demonstrate the problem: http://gist.github.com/289553 Any help would be greatly appreciated!
Hum, xmlParseInNodeContext notices that the enclosing document is an HTML document, so invoke the HTML parser for that fragment, and the HTML parser finding a "<p>hello world!</p>" document automatically augment it with defaulted <html> and <body>. This defaulting should be turned off in the HTML parser for this to work, but there is no such HTML parser option. There is an htmlOmittedDefaultValue global variable that you could use, but really we should not rely on global variable for processing options anymore, best is to add an HTML_PARSE_NOIMPLIED. The enclosed patch seems to fix it for me, (note: the new HTML parser option correspond to a XML parser option making no sense in HTML parsing, so we can reuse it, 1 << 13). Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel veillard com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/
Attachment:
html_in_context_no_implied.patch
Description: Text document