Re: [xml] parsing html with libxml2

From: JASON JESSO <jesso1607 rogers com>
To: xml gnome org
Subject: Re: [xml] parsing html with libxml2
Date: Fri, 12 Nov 2004 14:10:54 -0500 (EST)

So, is there a way with libxml2 html parser to ignore
these errors?



 --- Daniel Veillard <veillard redhat com> wrote:

On Fri, Nov 12, 2004 at 01:52:00PM -0500, JASON
JESSO wrote:

I have saved an html page to a file from my

browser.

When I parse the html with libxml2 (python) a get

lot of html parser errors.

I looked at the html document and there are a

number

of mismatched tags and the sort.

I tried several web browsers and they all have no
problem with it.

Why is that?


  Because "real html" i.e. what you find on the web
is usually full
of errors, just that browsers tends to handles those
errors in the
same way which makes the thing kind of work...  This
real mess is the
reason why when designing XML they instructed that
parser must fail
as soon as the hit an error and stop returning data
from that point.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team
http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit
 http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine
http://rpmfind.net/

Follow-Ups:
- Re: [xml] parsing html with libxml2
  - From: Daniel Veillard

References:
- Re: [xml] parsing html with libxml2
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]