Re: [xml] parsing html with libxml2

From: Daniel Veillard <veillard redhat com>
To: JASON JESSO <jesso1607 rogers com>
Cc: xml gnome org
Subject: Re: [xml] parsing html with libxml2
Date: Fri, 12 Nov 2004 14:06:58 -0500

On Fri, Nov 12, 2004 at 01:52:00PM -0500, JASON JESSO wrote:

I have saved an html page to a file from my browser. 
When I parse the html with libxml2 (python) a get a
lot of html parser errors.

I looked at the html document and there are a number
of mismatched tags and the sort.

I tried several web browsers and they all have no
problem with it.

Why is that?


  Because "real html" i.e. what you find on the web is usually full
of errors, just that browsers tends to handles those errors in the
same way which makes the thing kind of work...  This real mess is the
reason why when designing XML they instructed that parser must fail
as soon as the hit an error and stop returning data from that point.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Follow-Ups:
- Re: [xml] parsing html with libxml2
  - From: JASON JESSO

References:
- [xml] parsing html with libxml2
  - From: JASON JESSO

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]