Re: [xslt] xsltproc memory consumption w/ large DTD / docbook



Try using the --novalid option on xsltproc; the man page states that this 
option should prevent the DTD loading.

Charlie B.

Quoting Michael Weiser <mweiser@fachschaft.imn.htwk-leipzig.de>:

> Hi all,
> 
> I've got a usage problem with libxml2/libxslt/xsltproc. I want to use
> Norman Walsh's Website DTD which in turn is based on his
> DocBook-XSL-Stylesheets and the DocBook-XML-DTD. So far xsltproc
> worked
> just great for that task in combinations of current cvs of libxml2 and
> libxslt as well as 2.4.16/1.0.22 on Linux 2.4.20 with glibc 2.1.3
> (RedHat
> Linux 6.2) as well as 2.2.93 (RH 8.0) with latest patches installed.
> 
> The problem is that xsltproc consumes huge amounts of memory,
> seemingly
> mainly on storing the DTDs. I know that such a huge XSL/DTD framework
> as
> DocBook is a really bad testcase and I'm prepared to dig into it
> further.
> But before I do so I'd just like to know whether that's perhaps
> considered
> normal for xsltproc.
> 
> As for my setup I've got about 300 source xml files of doctype Website
> about 800 bytes each. They get loaded via document()-xpath-selects by
> a
> self-contained XSL stylesheet (autolayout.xsl). After about 150 of
> them
> process size of xsltproc reaches 900MB, swap runs out and the Linux
> OOM
> kills the process.
> 
> So far I've tried making sure that it isn't a memory leak using dmalloc
> on
> both of my libxml2/libxslt installations. While the release versions
> are
> completely clean the CVS snapshots forget to free a few bytes over all
> (less than 100 in 5 blocks or so) which doesn't seem that critical to
> me.
> 
> I've also tried narrowing down the source of that enormous memory
> consumption. I had a look at the stylesheet and the only extraordinary
> thing it does is loading those small external documents via
> document().
> Otherwise it's completely self-contained, meaning it isn't
> including/using
> the DocBook-XSL-Stylesheets at all because they haven't come into play
> yet
> at that stage of processing. It can't be the document size either
> because
> they're only about 800 bytes each.
> 
> But I thought that it might be the DTD size because Website is based
> on
> DocBook and that DTD is quite large. Therefore I tried processing just
> one
> document and dmalloc reported a overall memory consumption of 33MB.
> After adding a second one of those 800 byte documents it went up to
> just
> like 66.
> 
> Therefore a wild guess: Does xsltproc load, parse and *store* the DTD
> of
> every document it includes via document()? If so: Is there a way to
> make
> it stop doing so or reuse the already loaded DTD's in memory?
> 
> BTW: For now I do that particular processing stage using saxon-7.3.1
> which
> does it consuming about 50MB of memory. xsltproc then transforms the
> individual pages (800 bytes XML) consuming about 30-40MB of RAM in the
> process which seems to be the DTD as well as the DocBook-XSL
> stylesheets.
> 
> Thanks in advance for your help.
> -- 
> Micha
> _______________________________________________
> xslt mailing list, project page http://xmlsoft.org/XSLT/
> xslt@gnome.org
> http://mail.gnome.org/mailman/listinfo/xslt
> 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]