[xml] [Bug]: Extensive memory usage on invalid documents



Hi all,

With the help of the Zen community at #AxKit we established quite a sensere bug,
when validating an XML document. I first encountered this, when using the perl
module, but after testing with xmllint --valid or --postvalid, it became clear that
the bug is in LibXML itself.

From the looks of it, there are too many errors in the document, that it can't
determine how to generate it's error output. Memory usage goes to maximum,
within a few seconds. On my shell account this is appr. 272Mb and for the webserver
this is appr. 512Mb on a 4.2 BSDi system, with 1G memory.

Attached are sample files, with descriptive names, and the backtrace of the coredump.

Furthermore, on the commandline, a 'realloc' error appears.

The DTD is reference in the documents, but also avalaible here:
http://melvyn.idg.nl/dtd/idg.nl/idgml.dtd

The differences between de 2 documents (apart from the fact that they're different content) is the correction of the blockmodel. In Perl I'm using the following lines
to accomplish that:

my $parser=XML::LibXML->new();
my $verhaal=ms_charconv($$item{'verhaal'});
my $htmldoc=$parser->parse_html_string($verhaal);
my $xmlverhaal=$htmldoc->toString();
my $regex=qr(<\?xml version="1.0" standalone="yes"\?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd";>
<html><body>);
#####The regexes below correct the blockmodel #####
$xmlverhaal=~s/$regex//;
$xmlverhaal=~s/<\/body><\/html>$//;
$xmlverhaal=~s/(<strong>[^<]+<\/strong>)<\/p>/<\/p>$1/gi;
$xmlverhaal=~s/<(\/?)strong>/<$1h6>/gi;
$xmlverhaal=~s/(<\/h6>)(\s*<li>)/$1<ul>$2/gi;
$xmlverhaal=~s/(<\/li>)(\s*<h6>)/$1<\/ul>$2/gi;
$xmlverhaal=~s/(<\/li>)(\s*<p>)/$1<\/ul>$2/gi;

$$item is a mysql DBD hashref.
ms_charconv is a sub, replacing the invalid MS characters from Word documents.

The contents of $xmlverhaal is then added as the body of the IDGML document.

LibXML was compiled using the following options:
$ cat configure-cmd.sh
#!/bin/sh

./configure \
        --prefix=/weblib/local \
        --with-zlib=/weblib/local \
        --with-iconv=/weblib/local

`make check` only gave some errors on the syntax for 'cut', but after installing the
GNU text-utils, this went away (hmm, I might agree on the auto* stuff there,
Daniel).

If you need more info, let me know.



Met vriendelijke groeten / With kind regards,

IDG.nl
Melvyn Sopacua
Webmaster

PS: DAAAAAAAHUUUUUUUTTTTTTTT!!
(ya gotta pay yer dues)

Attachment: backtrace_xmllint.txt
Description: Text document

Attachment: valid.xml
Description: application/xml

Attachment: invalid.xml
Description: application/xml



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]