This is a message in Mime Format. If you see this, your mail reader does not support this format.
Hello, I have tried parsing a webpage, but unfortunately, the node /html/body is not found. I used lxml in python, which is based on libxml2. Firefox does parse the page correctly and if the page is then saved to disc (from firefox), lxml parses it correctly. If the page is not fetched via firefox but urllib, parsing failes. The html-source is attached as a zipped txt-file. Thank you for taking the time, any help is appreciated. Lydia Patrovic N.B.: This is an answer from the lxml mailing list with a diagnosis: I get the same result with "xmllint --html", so it's definitely a libxml2 problem. It seems to read all tags and then just stops parsing without further notice. The next tag would be the tag, and I actually suspect this to be a problem: Note the "main&20090924_2" attribute value, which can be interpreted as an unterminated entity. Please report this on the libxml2 mailing list. Stefan
Attachment:
sccmain.zip
Description: Zip archive