Re: [xml] Fwd: HTML5 test cases



On Wed, Nov 03, 2010 at 04:02:57PM -0400, Sam Ruby wrote:
On 11/03/2010 02:50 PM, Daniel Veillard wrote:

This does not need to wait until REC status, the parsing algorithm
is fairly stable.

  okay

Some background: Henri wrote a fully compliant HTML parser in Java,
and has been keeping it in sync with the specification (at times
even writing bug reports against the HTML5 spec as required):

http://about.validator.nu/htmlparser/

  okay

He then wrote a translator which mechanically converts his usage of
Java into a C++ program with dependencies on some Mozilla libraries:

http://groups.google.com/group/mozilla.dev.platform/msg/35ace94ab1ae1511?pli=1
http://mxr.mozilla.org/mozilla-central/source/parser/

  fine for Mozilla, maybe the Java code is easier to maintain

The result is not only compliant with the HTML5 specification, it is
the actual parser which will ship with Firefox 4:

http://hg.mozilla.org/mozilla-central/rev/129e19d979f0

Oversimplifying, but if this same code could target the underlying
string and DOM handling routines, the result of an parse would be
immediately useful to applications which build on top of libxml2.

  Well I see 2 major issues with that even without getting into the
details:
  - that's generated code, that mean it cannot be modified/patched
    within the libxml2 project. That untenable from a maintainance
    POV if it were to be embbedded in libxml2
  - the internal string format of Mozilla is UTF-16, and libxml2
    operates on UTF-8, that's already one of the major problem
    we faced when we looked at using libxslt for mozilla

  That doesn't sound too easy,

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]