Re: [xml] Fwd: Support of new HTML5 tags in libxml



On Tue, 2013-04-02 at 09:06 +0200, Amandine Piguel wrote:
Hello,

I would like to know if libxml2 is able to parse HTML5 files, and if 
not, if it will be supported in the futur.

Note, libxml's HTML parser is really good at making sense of HTML input,
but it is not a formal HTML parser - the tree you get is not guaranteed
to be the same as the one a Web browser would make, and even with HTML 4
there are differences, e.g. in when a "tbody" element is inferred. This
isn't a bad thing - often it's exactly what you want.

I'd guess that patches to provide an option to use the HTML 5 parsing
algorithm would be plausible.

Example: try the following input, and compare with a Web browser in the
DOM...

<body>
    <table><th>a</th><td>b</td>
</body>

Again, this isn't saying anything bad about libxml - I'm trying to give
examples so you can understand what it's doing. I don't actually know of
a good HTML 5 parser that can replace libxml2; I don't follow these
things, and in any case I'd rather see it folded into libxml2 in some
way I think.

Liam

-- 
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]