Re: [xml] Fwd: Support of new HTML5 tags in libxml



I'm also for HTML5 support. My only hope is that you guys can also speed up the parsing time and reduce the memory usage when parsing HTML documents. Don't get me wrong libxml2 is fast but making even faster would be great :)


On Tue, Apr 2, 2013 at 6:51 PM, Liam R E Quin <liam holoweb net> wrote:
On Tue, 2013-04-02 at 09:06 +0200, Amandine Piguel wrote:
> Hello,
>
> I would like to know if libxml2 is able to parse HTML5 files, and if
> not, if it will be supported in the futur.

Note, libxml's HTML parser is really good at making sense of HTML input,
but it is not a formal HTML parser - the tree you get is not guaranteed
to be the same as the one a Web browser would make, and even with HTML 4
there are differences, e.g. in when a "tbody" element is inferred. This
isn't a bad thing - often it's exactly what you want.

I'd guess that patches to provide an option to use the HTML 5 parsing
algorithm would be plausible.

Example: try the following input, and compare with a Web browser in the
DOM...

<body>
    <table><th>a</th><td>b</td>
</body>

Again, this isn't saying anything bad about libxml - I'm trying to give
examples so you can understand what it's doing. I don't actually know of
a good HTML 5 parser that can replace libxml2; I don't follow these
things, and in any case I'd rather see it folded into libxml2 in some
way I think.

Liam

--
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml gnome org
https://mail.gnome.org/mailman/listinfo/xml



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]