[xslt] xsltproc memory consumption w/ large DTD / docbook

Hi all,

I've got a usage problem with libxml2/libxslt/xsltproc. I want to use
Norman Walsh's Website DTD which in turn is based on his
DocBook-XSL-Stylesheets and the DocBook-XML-DTD. So far xsltproc worked
just great for that task in combinations of current cvs of libxml2 and
libxslt as well as 2.4.16/1.0.22 on Linux 2.4.20 with glibc 2.1.3 (RedHat
Linux 6.2) as well as 2.2.93 (RH 8.0) with latest patches installed.

The problem is that xsltproc consumes huge amounts of memory, seemingly
mainly on storing the DTDs. I know that such a huge XSL/DTD framework as
DocBook is a really bad testcase and I'm prepared to dig into it further.
But before I do so I'd just like to know whether that's perhaps considered
normal for xsltproc.

As for my setup I've got about 300 source xml files of doctype Website
about 800 bytes each. They get loaded via document()-xpath-selects by a
self-contained XSL stylesheet (autolayout.xsl). After about 150 of them
process size of xsltproc reaches 900MB, swap runs out and the Linux OOM
kills the process.

So far I've tried making sure that it isn't a memory leak using dmalloc on
both of my libxml2/libxslt installations. While the release versions are
completely clean the CVS snapshots forget to free a few bytes over all
(less than 100 in 5 blocks or so) which doesn't seem that critical to me.

I've also tried narrowing down the source of that enormous memory
consumption. I had a look at the stylesheet and the only extraordinary
thing it does is loading those small external documents via document().
Otherwise it's completely self-contained, meaning it isn't including/using
the DocBook-XSL-Stylesheets at all because they haven't come into play yet
at that stage of processing. It can't be the document size either because
they're only about 800 bytes each.

But I thought that it might be the DTD size because Website is based on
DocBook and that DTD is quite large. Therefore I tried processing just one
document and dmalloc reported a overall memory consumption of 33MB.
After adding a second one of those 800 byte documents it went up to just
like 66.

Therefore a wild guess: Does xsltproc load, parse and *store* the DTD of
every document it includes via document()? If so: Is there a way to make
it stop doing so or reuse the already loaded DTD's in memory?

BTW: For now I do that particular processing stage using saxon-7.3.1 which
does it consuming about 50MB of memory. xsltproc then transforms the
individual pages (800 bytes XML) consuming about 30-40MB of RAM in the
process which seems to be the DTD as well as the DocBook-XSL stylesheets.

Thanks in advance for your help.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]