Re: [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2
- From: Stefan Behnel <stefan_ml behnel de>
- To: xml gnome org
- Subject: Re: [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2
- Date: Fri, 8 Aug 2008 08:25:57 +0000 (UTC)
Karl Dubost <karl <at> w3.org> writes:
I have written a short document to explain the project [Cleaning the
Web][1].
It describes what is html5 and what would be the benefits of
implementing the html 5 parsing algorithm in libxml2 html parser.
There's already an HTML5 implementation in Python (html5lib) which you can use
together with lxml (so you can benefit from both HTML5 *and* libxml2 already).
IIRC, there was also a push towards a C implementation, but I'm not sure that
really lead anywhere. What's in SVN doesn't look very complete:
http://html5lib.googlecode.com/svn/trunk/c/chtml5lib/
IMHO, it's better to stick with higher level implementations during the
specification phase, and to push the work on an optimised, low-level C
implementation back until the target is a bit more focussed. But then, maybe
that's just me...
I didn't read your proposal, so I'll just assume you meant to extend the
existing HTML parser instead of writing a new one. That would sound more
promising than a start from scratch.
Stefan
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]