Re: [xml] parsing html with libxml2



So, is there a way with libxml2 html parser to ignore
these errors?



 --- Daniel Veillard <veillard redhat com> wrote: 
On Fri, Nov 12, 2004 at 01:52:00PM -0500, JASON
JESSO wrote:
I have saved an html page to a file from my
browser. 
When I parse the html with libxml2 (python) a get
a
lot of html parser errors.

I looked at the html document and there are a
number
of mismatched tags and the sort.

I tried several web browsers and they all have no
problem with it.

Why is that?

  Because "real html" i.e. what you find on the web
is usually full
of errors, just that browsers tends to handles those
errors in the
same way which makes the thing kind of work...  This
real mess is the
reason why when designing XML they instructed that
parser must fail
as soon as the hit an error and stop returning data
from that point.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team
http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit
 http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine
http://rpmfind.net/
 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]