Re: [xml] Strange behavior with xmlParseInNodeContext and HTML documents

From: Daniel Veillard <veillard redhat com>
To: Aaron Patterson <aaron patterson gmail com>
Cc: xml gnome org
Subject: Re: [xml] Strange behavior with xmlParseInNodeContext and HTML documents
Date: Fri, 29 Jan 2010 12:10:58 +0100

On Fri, Jan 29, 2010 at 12:16:40AM -0800, Aaron Patterson wrote:

Hi, I'm trying to use xmlParseInNodeContext along with an HTML
document.  It seems to be exhibiting strange behavior, but I'm not
sure.

It seems that when I'm dealing with HTML documents, no matter what
context node I give xmlParseInNodeContext, I always end up with an
entire HTML document returned rather than just the few nodes I was
hoping for.

I've written a sample program to demonstrate the problem:

http://gist.github.com/289553

Any help would be greatly appreciated!


  Hum, xmlParseInNodeContext notices that the enclosing document is
an HTML document, so invoke the HTML parser for that fragment, and
the HTML parser finding a "<p>hello world!</p>" document automatically
augment it with defaulted <html> and <body>. This defaulting should
be turned off in the HTML parser for this to work, but there is no
such HTML parser option. There is an htmlOmittedDefaultValue global
variable that you could use, but really we should not rely on global
variable for processing options anymore, best is to add an
HTML_PARSE_NOIMPLIED.

  The enclosed patch seems to fix it for me,
(note: the new HTML parser option correspond to a XML parser option
 making no sense in HTML parsing, so we can reuse it, 1 << 13).

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/

Attachment: html_in_context_no_implied.patch
Description: Text document

Follow-Ups:
- Re: [xml] Strange behavior with xmlParseInNodeContext and HTML documents
  - From: Daniel Veillard

References:
- [xml] Strange behavior with xmlParseInNodeContext and HTML documents
  - From: Aaron Patterson

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]