Re: [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2

On Tue, Aug 26, 2008 at 09:36:37AM +0900, Karl Dubost wrote:

Le 20 août 2008 à 23:34, Andi Sidwell a écrit :
FWIW, I've spent the summer working on a C HTML5 parser which is
approaching stability, called Hubbub[1].  It's about as half as fast  
libxml2 at parsing the HTML 5 spec with an O(1) treebuilder, and it's
fairly easy to bind to the libxml2 interfaces (and is being used in  
of the libxml2 HTML parser in a small Web browser, NetSurf[2], in the
development branch).  Note it's a) not buildable as a shared library  
b) had a formal release, but if someone wants an HTML5 parser in C,  
it's probably not a bad bet.

excellent news. The HTML 5 Spec authorizes more than the usual event of 
parsing by retrospectively modifying the tree (ala tidy), I wonder how 
much it would require modification in libxml2 and if indeed it is a 
better strategy to make an interface than directing including the code in 
the library.

  Well, the big big difference is deployment, and maintaince !


Daniel Veillard      | libxml Gnome XML XSLT toolkit
daniel veillard com  | Rpmfind RPM search engine | virtualization library

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]