Re: [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2
- From: Daniel Veillard <veillard redhat com>
- To: Karl Dubost <karl w3 org>
- Cc: xml gnome org, "Michael \(tm\) Smith" <mike w3 org>, Nick Kew <nick webthing com>
- Subject: Re: [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2
- Date: Tue, 26 Aug 2008 09:50:03 +0200
On Tue, Aug 26, 2008 at 09:36:37AM +0900, Karl Dubost wrote:
Le 20 août 2008 à 23:34, Andi Sidwell a écrit :
FWIW, I've spent the summer working on a C HTML5 parser which is
approaching stability, called Hubbub[1]. It's about as half as fast
as
libxml2 at parsing the HTML 5 spec with an O(1) treebuilder, and it's
fairly easy to bind to the libxml2 interfaces (and is being used in
lieu
of the libxml2 HTML parser in a small Web browser, NetSurf[2], in the
development branch). Note it's a) not buildable as a shared library
or
b) had a formal release, but if someone wants an HTML5 parser in C,
then
it's probably not a bad bet.
excellent news. The HTML 5 Spec authorizes more than the usual event of
parsing by retrospectively modifying the tree (ala tidy), I wonder how
much it would require modification in libxml2 and if indeed it is a
better strategy to make an interface than directing including the code in
the library.
Well, the big big difference is deployment, and maintaince !
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
daniel veillard com | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library http://libvirt.org/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]