Re: [xml] Support of HTML v5 parsing



On 05/28/2015 12:29 PM, Noam Postavsky wrote:
On Thu, May 28, 2015 at 12:13 PM, Frank Gross <fg 4js com> wrote:
  Are there any plans to support parsing of HTML V5 in libxml ? I tried
function htmlCtxtReadMemory(), but it raises an error for HTML document
containing tags introduced in HTML V5 such as : Tag header invalid.

I'd love to see this happen!  I'm so used to the libxml2 tools,
and the tools built upon them, it would SO simplify my life.

I think the same question has already been asked, and answered at
https://mail.gnome.org/archives/xml/2013-April/msg00006.html

Sorta, yes. But HTML5 is essentially _defined_ by it's parser rather than
by it's spec. In particular the (annoying) way that it rewrites the DOM
to turn what you wrote into what it wants.  That being the case, there's
more to adapting libxml's HTML parser than just being more forgiving about
the unrecognized tags --- the resulting DOM might not be quite what HTML5
specifies!

Which is all to say that it's not quite trivial; would probably amount to
importing the "official" parser and modifying it to create libxml's internal
structure.  Sadly, Daniel doesn't have the time.   Nor, alas, do I.

bruce


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]