[xml] Parsing a particular HTML file causes parse_html_string to hang
- From: Sam Tregar <sam tregar com>
- To: xml gnome org
- Subject: [xml] Parsing a particular HTML file causes parse_html_string to hang
- Date: Sun, 1 Feb 2009 02:44:04 -0500
Hello all. I've hit a problem using libxml2 to parse HTML files. Usually everything works great, but on a particular input file I'm getting a hang with the process hogging the CPU indefinitely until killed. When I run it through xmllint I see (aside from a bunch of run-of-the-mill HTML parsing warnings):
$ /usr/local/bin/xmllint --html fail.html
fail.html:927: parser error : Excessive depth in document: change xmlParserMaxDepth = 1024
marcy playground<br /><option><em>
Then xmllint hangs, using 100% of the CPU until killed.
Another note - my first attempt to work around this was to add an alarm() call before parsing, hoping to terminate the failed parse if it took too long. For some reason that didn't work - the alarm signal never reach my signal handler. Any ideas why? I'm ok with the parser failing to parse bad HTML - that's just a fact of life - but I can't allow it to hang indefinitely!
This is libxml2 v2.6.30 on Linux:
$ /usr/local/bin/xmllint --html --version
/usr/local/bin/xmllint: using libxml version 20630
compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib
Would you like me to send in the killer file? It's around 208k, so I didn't think it would be very polite to send unasked-for.
Thanks for any help you can give me!
[Date Prev][Date Next
] [Thread Prev][Thread Next