Re: [xml] Fwd: HTML5 test cases



On Wed, Nov 03, 2010 at 10:56:17AM -0400, Sam Ruby wrote:
Retrying...

-------- Original Message --------
Subject: HTML5 test cases
Date: Thu, 21 Oct 2010 13:30:05 -0400
From: Sam Ruby <rubys intertwingly net>
To: xml gnome org

I've taken a quick look at comparing the output of htmlParseDocument
(via nokogiri[1]) against the HTML5 test cases, and noted quite a few
differences:

http://intertwingly.net/stories/2010/10/21/libxml2-html5-test.out
http://intertwingly.net/stories/2010/10/21/libxml2-html5-tree-test.out

Further background on my weblog[2].

  Ah the W3C tech plenary, I'm not far away, just one hour drive,
but since XML Core didn't met there this year I didn't plan to come.

Any thoughts on the best path towards making a HTML5 compliant parser
available?

 Well if there is now a good semantic about what an HTML parser should
do in corner cases, I have no problem with getting patches in !
 The current HTML parser was basically implemented using the HTML4 spec
but without the crazyness of trying to mimics what browsers do with
that input. The main usage is screen-scraping or conversion to XML
(at least for me) and that wasn't looking worth the effort.
  Now if there is a decent semantic about what a parser should do with
HTML5 and HTML5-like (that's the problem) kind of input, then nice,
I'm sure once it gets REC status then people will be enthusistaic to
develop small parsers and maybe libxml2 can be one of them.
  Me I'm really welcoming HTML5 parser patches, one can probably make
a new parsing option for the existing parser to allow old and new
behaviour (or switch automatically but we all know it's error prone :-)
But I have no time developping this myself, libvirt is what I'm
working on ATM,

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]