Re: [xml] Parsing tag-soup HTML

From: Michael Day <mikeday yeslogic com>
To: veillard redhat com
Cc: xml gnome org
Subject: Re: [xml] Parsing tag-soup HTML
Date: Mon, 18 Jun 2007 16:52:00 +1000

Hi Nick,

 Coming back with some kind of definition of what a tag soup parser
behaviour is is probably more important than digging in libxml2 code.
I am not sure we can emulate web browser parsers behaviour.


It's worth looking at the HTML5 specification:

http://www.whatwg.org/specs/web-apps/current-work/

Section 8, "The HTML Syntax", is the relevant bit. It still needs somework, but it's actively being developed and is a good starting point forfiguring out how to treat messy real world HTML and hopefully getsimilar behaviour to web browsers.


Best regards,

Michael

--
Print XML with Prince!
http://www.princexml.com

References:
- [xml] Parsing tag-soup HTML
  - From: Nick Kew
- Re: [xml] Parsing tag-soup HTML
  - From: Daniel Veillard
- Re: [xml] Parsing tag-soup HTML
  - From: Nick Kew
- Re: [xml] Parsing tag-soup HTML
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]