Re: [xml] parse from urls with python bindings



On 10.06.06 00:21:33, Andreas Pakulat wrote:

I do see some functions in libxml2 module and 

doc=libxml2.htmlParseFile('http://localhost/','us-ascii')

even works. But it doesn't for www.google.de and I thought libxml2's
html parser is very forgiving? In fact letting lxml parse
http://www.google.de using the HTMLParser works fine.

dooh :-)

I should've spent another 5 minutes on the api. The way it works is:

doc=libxml2.htmlReadFile('someurl',None,libxml2.HTML_PARSE_RECOVER)

to activate the not-so-strict HTML Parser (which lxml uses by default).

Hope, nobody was disturbed my be :-)

Andreas

-- 
Your lucky number is 3552664958674928.  Watch for it everywhere.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]