Re: [xml] Parsing tag-soup HTML
- From: Michael Day <mikeday yeslogic com>
- To: veillard redhat com
- Cc: xml gnome org
- Subject: Re: [xml] Parsing tag-soup HTML
- Date: Mon, 18 Jun 2007 16:52:00 +1000
Hi Nick,
Coming back with some kind of definition of what a tag soup parser
behaviour is is probably more important than digging in libxml2 code.
I am not sure we can emulate web browser parsers behaviour.
It's worth looking at the HTML5 specification:
http://www.whatwg.org/specs/web-apps/current-work/
Section 8, "The HTML Syntax", is the relevant bit. It still needs some
work, but it's actively being developed and is a good starting point for
figuring out how to treat messy real world HTML and hopefully get
similar behaviour to web browsers.
Best regards,
Michael
--
Print XML with Prince!
http://www.princexml.com
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]