Re: [xml] Cleaning the Web - Implementing HTML 5 parsing in libxml2

On Fri, Aug 08, 2008 at 11:07:52AM +0900, Karl Dubost wrote:


I have written a short document to explain the project [Cleaning the  
It describes what is html5 and what would be the benefits of  
implementing the html 5 parsing algorithm in libxml2 html parser.

It's a call for developers. Comments are welcome.
If you prefer me to post a full copy of this document as text-only  
here in the list, tell me.

  Well as long as any technical argument is kept on the list that's
  Now my gut reaction is that it's a good idea and goes in the right
direction, it's unclear yet whether this can be built as incremental 
changes on the existing HTML parser code or if this would need a new
parsing core. But that's 'technical details' and at this point not
that important.
  My main concern is that HTML5 is a working draft. I can't tell just
from the draft (or rather
if people globally agree on parsing processing or if changes are likely
n the future. I have been bitten hard with being an early implementor
in the past (e.g. XPointer) and if there is controversy about the parsing
rules it's better to wait until this is solved. On the other hand if
there is parsing agreement (which i hope) in the group and the various
Web actors then starting early is not a problem, as long as i get an
identifed person ready to follow the work until completion :-)

  I know that some people like Michael Day rely heavilly on the HTML
parser behaviour, and would very much like to hear from them too, as
the change would have more impact on them than me. I know the Webkit 
project uses libxml2 but only for parsing XML, and wonder if an HTML5
compliant parser in libxml2 might change this or not. Even if this
wasn't the case I would like to see HTML5 suport in, but being able
to assert possible impact like this would be nice.


Red Hat Virtualization group
Daniel Veillard      | virtualization library
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]