Re: [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2

From: Stefan Behnel <stefan_ml behnel de>
To: xml gnome org
Subject: Re: [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2
Date: Fri, 8 Aug 2008 08:25:57 +0000 (UTC)

Karl Dubost <karl <at> w3.org> writes:

I have written a short document to explain the project [Cleaning the  
Web][1].
It describes what is html5 and what would be the benefits of  
implementing the html 5 parsing algorithm in libxml2 html parser.


There's already an HTML5 implementation in Python (html5lib) which you can use 
together with lxml (so you can benefit from both HTML5 *and* libxml2 already). 
IIRC, there was also a push towards a C implementation, but I'm not sure that 
really lead anywhere. What's in SVN doesn't look very complete:

http://html5lib.googlecode.com/svn/trunk/c/chtml5lib/

IMHO, it's better to stick with higher level implementations during the 
specification phase, and to push the work on an optimised, low-level C 
implementation back until the target is a bit more focussed. But then, maybe 
that's just me...

I didn't read your proposal, so I'll just assume you meant to extend the 
existing HTML parser instead of writing a new one. That would sound more 
promising than a start from scratch.

Stefan

Follow-Ups:
- Re: [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2
  - From: Karl Dubost

References:
- [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2
  - From: Karl Dubost

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]