Re: [xml] Fwd: HTML5 test cases

On Wed, Nov 03, 2010 at 04:02:57PM -0400, Sam Ruby wrote:
On 11/03/2010 02:50 PM, Daniel Veillard wrote:

This does not need to wait until REC status, the parsing algorithm
is fairly stable.


Some background: Henri wrote a fully compliant HTML parser in Java,
and has been keeping it in sync with the specification (at times
even writing bug reports against the HTML5 spec as required):


He then wrote a translator which mechanically converts his usage of
Java into a C++ program with dependencies on some Mozilla libraries:

  fine for Mozilla, maybe the Java code is easier to maintain

The result is not only compliant with the HTML5 specification, it is
the actual parser which will ship with Firefox 4:

Oversimplifying, but if this same code could target the underlying
string and DOM handling routines, the result of an parse would be
immediately useful to applications which build on top of libxml2.

  Well I see 2 major issues with that even without getting into the
  - that's generated code, that mean it cannot be modified/patched
    within the libxml2 project. That untenable from a maintainance
    POV if it were to be embbedded in libxml2
  - the internal string format of Mozilla is UTF-16, and libxml2
    operates on UTF-8, that's already one of the major problem
    we faced when we looked at using libxslt for mozilla

  That doesn't sound too easy,


