On Tue, 2013-04-02 at 09:06 +0200, Amandine Piguel wrote:Note, libxml's HTML parser is really good at making sense of HTML input,
> Hello,
>
> I would like to know if libxml2 is able to parse HTML5 files, and if
> not, if it will be supported in the futur.
but it is not a formal HTML parser - the tree you get is not guaranteed
to be the same as the one a Web browser would make, and even with HTML 4
there are differences, e.g. in when a "tbody" element is inferred. This
isn't a bad thing - often it's exactly what you want.
I'd guess that patches to provide an option to use the HTML 5 parsing
algorithm would be plausible.
Example: try the following input, and compare with a Web browser in the
DOM...
<body>
<table><th>a</th><td>b</td>
</body>
Again, this isn't saying anything bad about libxml - I'm trying to give
examples so you can understand what it's doing. I don't actually know of
a good HTML 5 parser that can replace libxml2; I don't follow these
things, and in any case I'd rather see it folded into libxml2 in some
way I think.
Liam
--
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml
_______________________________________________
xml mailing list, project page http://xmlsoft.org/
xml gnome org
https://mail.gnome.org/mailman/listinfo/xml