Re: [xml] parse from urls with python bindings
- From: Andreas Pakulat <apaku gmx de>
- To: xml gnome org
- Subject: Re: [xml] parse from urls with python bindings
- Date: Sat, 10 Jun 2006 00:39:44 +0200
On 10.06.06 00:21:33, Andreas Pakulat wrote:
I do see some functions in libxml2 module and
doc=libxml2.htmlParseFile('http://localhost/','us-ascii')
even works. But it doesn't for www.google.de and I thought libxml2's
html parser is very forgiving? In fact letting lxml parse
http://www.google.de using the HTMLParser works fine.
dooh :-)
I should've spent another 5 minutes on the api. The way it works is:
doc=libxml2.htmlReadFile('someurl',None,libxml2.HTML_PARSE_RECOVER)
to activate the not-so-strict HTML Parser (which lxml uses by default).
Hope, nobody was disturbed my be :-)
Andreas
--
Your lucky number is 3552664958674928. Watch for it everywhere.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]