Re: [xml] Non recursive html parser
- From: Daniel Veillard <veillard redhat com>
- To: Eugene Pimenov <libc me com>
- Cc: xml gnome org
- Subject: Re: [xml] Non recursive html parser
- Date: Wed, 17 Feb 2010 10:04:08 +0100
On Tue, Feb 16, 2010 at 10:00:03AM +0300, Eugene Pimenov wrote:
Hello everyone,
As my colleague pointed out in December (http://mail.gnome.org/archives/xml/2009-December/msg00036.html ;
although he didn't do it in a clear manner), there're real world examples of HTML pages that overflows
stack. We're using libxml through nokogiri ( http://nokogiri.org/ it's a Ruby library).
E. g.
>> Nokogiri::HTML::SAX::Parser.new(Nokogiri::XML::SAX::Document.new).parse_memory("<b>"*100_000)
#=> SystemStackError: stack level too deep
In the patch I change htmlParseElement to return immediately and let the caller htmlParseContent do the job.
htmlParseElement is not a static function, and I changed it behavior! I googled around
(http://google.com/codesearch?q=htmlParseElement&hl=en&btnG=Search+Code) and I don't see everyone actually
using it. But if this is an issue, I can make htmlParseElement call the secret (static) htmlParseElement
and then htmlParseContent until level matches. I'd rather see htmlParseElement converted to static though.
The patch as is also breaks the ABI by changing the ordering of
elements in the public structure of a parser context.
Also changing the behaviour of the function to that extend is not
correct and googling is not sufficient to answer if that behaviour would
be in use somewhere. libxml2 is used in many embedded project, you won't
find their source in google !
So at the minimum the public function must be preserved. The new
elements in the parser context must be added at the end of the
structure (I also need to be convinced this is really needed) and
all the regression tests must still pass with the patch applied.
I also attach weirdness.patch that deletes double definitions, and sets nameMax to 0 if it fails to
allocate some memory.
That one sounds fine, right !
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
daniel veillard com | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library http://libvirt.org/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]