Re: [xml] Fwd: HTML5 test cases

From: Sam Ruby <rubys intertwingly net>
To: veillard redhat com
Cc: xml gnome org
Subject: Re: [xml] Fwd: HTML5 test cases
Date: Wed, 03 Nov 2010 16:02:57 -0400

On 11/03/2010 02:50 PM, Daniel Veillard wrote:

  Well if there is now a good semantic about what an HTML parser should
do in corner cases, I have no problem with getting patches in !
  The current HTML parser was basically implemented using the HTML4 spec
but without the crazyness of trying to mimics what browsers do with
that input. The main usage is screen-scraping or conversion to XML
(at least for me) and that wasn't looking worth the effort.
   Now if there is a decent semantic about what a parser should do with
HTML5 and HTML5-like (that's the problem) kind of input, then nice,
I'm sure once it gets REC status then people will be enthusistaic to
develop small parsers and maybe libxml2 can be one of them.
   Me I'm really welcoming HTML5 parser patches, one can probably make
a new parsing option for the existing parser to allow old and new
behaviour (or switch automatically but we all know it's error prone :-)
But I have no time developping this myself, libvirt is what I'm
working on ATM,

This does not need to wait until REC status, the parsing algorithm isfairly stable.

Some background: Henri wrote a fully compliant HTML parser in Java, andhas been keeping it in sync with the specification (at times evenwriting bug reports against the HTML5 spec as required):

http://about.validator.nu/htmlparser/

He then wrote a translator which mechanically converts his usage of Javainto a C++ program with dependencies on some Mozilla libraries:

http://groups.google.com/group/mozilla.dev.platform/msg/35ace94ab1ae1511?pli=1
http://mxr.mozilla.org/mozilla-central/source/parser/

The result is not only compliant with the HTML5 specification, it is theactual parser which will ship with Firefox 4:

http://hg.mozilla.org/mozilla-central/rev/129e19d979f0

Oversimplifying, but if this same code could target the underlyingstring and DOM handling routines, the result of an parse would beimmediately useful to applications which build on top of libxml2.

Daniel

- Sam Ruby

Follow-Ups:
- Re: [xml] Fwd: HTML5 test cases
  - From: Daniel Veillard

References:
- [xml] Fwd: HTML5 test cases
  - From: Sam Ruby
- Re: [xml] Fwd: HTML5 test cases
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]