Re: [xslt] xsltproc memory consumption w/ large DTD / docbook



On Tue, Jan 28, 2003 at 04:13:52PM -0500, Daniel Veillard wrote:
> On Tue, Jan 28, 2003 at 09:59:13PM +0100, Michael Weiser wrote:
> > I've also tried narrowing down the source of that enormous memory
> > consumption. I had a look at the stylesheet and the only extraordinary
> > thing it does is loading those small external documents via document().
> 
>  1/ document() parsing result *must* be kept in memory to comply 
> with generate-id() requirements. 
>  2/ parsing an XML file in XSLT, one *must* load the DTD to 
> allow for defaulted attributes and ID/IDREF lookup, this is a
> requirement of the XPath data model and applies to resources
> loaded with document() too.
> 
>  Knowing that the DocBook DTD is an huge beast and can require
> as much as 3-4 Megabytes in the libxml2 DOM tree representation
> your results are perfectly normal.
>  first point sharing DTD instances doesn't hold due to the
> fact that XML 1.0 allows an internal subset and sharing
> becomes a dangerous processing. 
>  However I think that in the document() case the DTD parts
> could be removed after the parsing process is finished. That
> could solve your specific problem.

  Please try the enclosed patch, and report improvement or crashes.
This is a relatively dangerous change and I want to get feedback before
commiting any such change in CVS,

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard@redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
Index: libxslt/documents.c
===================================================================
RCS file: /cvs/gnome/libxslt/libxslt/documents.c,v
retrieving revision 1.15
diff -c -r1.15 documents.c
*** libxslt/documents.c	11 Dec 2002 17:53:36 -0000	1.15
--- libxslt/documents.c	29 Jan 2003 11:22:02 -0000
***************
*** 162,167 ****
--- 162,168 ----
  xsltLoadDocument(xsltTransformContextPtr ctxt, const xmlChar *URI) {
      xsltDocumentPtr ret;
      xmlDocPtr doc;
+     xmlDtdPtr dtd;
  
      if ((ctxt == NULL) || (URI == NULL))
  	return(NULL);
***************
*** 210,215 ****
--- 211,225 ----
       */
      if (xsltNeedElemSpaceHandling(ctxt))
  	xsltApplyStripSpaces(ctxt, xmlDocGetRootElement(doc));
+ 
+     /*
+      * Remove the DTD from the document, it's not needed anymore.
+      */
+     dtd = xmlGetIntSubset(doc);
+     if (dtd != NULL) {
+ 	xmlUnlinkNode((xmlNodePtr) dtd);
+ 	xmlFreeDtd(dtd);
+     }
  
      ret = xsltNewDocument(ctxt, doc);
      return(ret);


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]