Re: [xml] xmlReader and HTML



On Fri, Jun 10, 2011 at 02:26:56PM +0200, Joachim Zobel wrote:
Hi.

It looks like the xmlReader parser is able to parse HTML. At least it
accepts doctype at document start. It does however behave differently
than the SAX/DOM HTML parser. For example it wants closing tags for META
and LI.

To what extend does xmlReader support HTML? I think a lot of things
would be easier for me if I could move from SAX to xmlReader, however I
need to be able to parse HTML.

  It doesn't, right now the reader is always operating on top of an
XML parser, not an HTML one, hence your result.
  Except modifying it to allow HTML parsing (probably around
xmlTextReaderSetup() ) the only way would be to process HTML documents
by parsing them to an htmlDocPtr and then passing that htmlDocPtr as the
input to xmlReaderWalker() i.e. providing the iteration on a full
document. That's the only solution I can think of without extending
the current code.

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]