Re: [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2




Le 20 août 2008 à 23:34, Andi Sidwell a écrit :
FWIW, I've spent the summer working on a C HTML5 parser which is
approaching stability, called Hubbub[1]. It's about as half as fast as
libxml2 at parsing the HTML 5 spec with an O(1) treebuilder, and it's
fairly easy to bind to the libxml2 interfaces (and is being used in lieu
of the libxml2 HTML parser in a small Web browser, NetSurf[2], in the
development branch). Note it's a) not buildable as a shared library or b) had a formal release, but if someone wants an HTML5 parser in C, then
it's probably not a bad bet.

excellent news. The HTML 5 Spec authorizes more than the usual event of parsing by retrospectively modifying the tree (ala tidy), I wonder how much it would require modification in libxml2 and if indeed it is a better strategy to make an interface than directing including the code in the library.


[1] http://www.netsurf-browser.org/projects/hubbub/
[2] http://www.netsurf-browser.org/

--
Karl Dubost - W3C
http://www.w3.org/QA/
Be Strict To Be Cool









[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]