Re: [xml] Fwd: Support of new HTML5 tags in libxml



On Tue, Apr 02, 2013 at 09:06:32AM +0200, Amandine Piguel wrote:
Hello,

I would like to know if libxml2 is able to parse HTML5 files, and if
not, if it will be supported in the futur.

  Bonjour,

actually libxml2 is able to parse the html5, but using an html4
predefined set of markup declarations. As such it will generate
element and attributes in the tree for syntax it doesn't know but
it cannot do specific handling if needed.

In fact, I already tried to load pure HTML5 document user the
HTMLparser libxml is providing. I am getting error such as : "Tag
section invalid", "Tag header invalid", "Tag article invalid", "Tag
output invalid", ... It seems to be related to all HTML5 specific
tags, the ones that were not existing in HTML4 and appreared in
HTML5.

you should get a resulting tree, those are more like warnings than
fatal errors, but it is true libxml2 should be extended to at least
not complain on the new syntactic constructs of HTML5.

Do you intend to provide the support of these tag in the HTML parser ?

I'm not sure I would have time in the near future to do those additions,
but I definitely take patches ! In the meantime you can catch those
specific errors and discard them.
Since HTML 5 is no in Candidate REC at W3C I hope someone will have
the time to help on fixing this in the next months,

Daniel

-- 
Daniel Veillard      | Open Source and Standards, Red Hat
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]