Re: [xslt] FYI on string interning of XSLT transform mode/modeURI

From: Mark Vakoc <thevakoc-xml yahoo com>
To: xslt gnome org
Subject: Re: [xslt] FYI on string interning of XSLT transform mode/modeURI
Date: Tue, 29 Mar 2005 12:20:07 -0800 (PST)

--- Daniel Veillard <veillard redhat com> wrote:
>   Hum, yes I see, yeah I would take patches for this :-)
> Also Dodji pointed out the size of the result tree as being a big memory
> use factor in libxslt, if you look at 
>   http://xmlsoft.org/XSLT/html/libxslt-transform.html#xsltRunStylesheet
> notice the 2 arguments:
>   SAX: a SAX handler for progressive callback output (not implemented yet)
>   IObuf: an output buffer for progressive output (not implemented yet)
> 
>   it had been that way forever but that's something, non trivial, but 
> not much more difficult than dictionnaries, it would allow to output
> to SAx or at least to an I/O front end directly without keeping the full
> output tree around. I would estimate implementing this around a full week
> of work, not that I really have time for this right now but it might be
> a good candidate in the future if I'm bored.
> 

Streaming output would be nice though not as critical in my case where the
typical output document isn't all that large (usually less than 1mb) but there
are significant number of xslt extension elements/functions returning external
data as RVTs to build that document.

The extreme example is a stylesheet that basically does integrity checks
between databases fetching records from one table than ensuring the
corresponding record(s) exist in another table.  It performs about 40k or more
database i/o operations to produce a 3mb output file.  The xslt extension
would, on average, return about 25 nodes, none with significant amount of text.
 Written one way the stylesheet used 500MB+ of memory (per the .memdump). 
Re-worked but producing the exact same output to use many templates reduced
memory usage to about 40MB.

> Daniel
> 
> P.S. any news on the XML diff front ;-)

Ah, yes, made significant progress but has been shelved for a while.  I'll
hopefully have more time to work on it pretty soon.  I'll probably send a patch
or two in steps.

The first step, which is almost done, does the hashing on all the nodes,
compares the trees for identical subtrees with a weighted algorithm for moves,
and removes the matching subtrees and replaces them with a cross linked
reference node (the node in one document has pointers to the corresponding
'replaced' node in the other document).

The second step is to analyze the "shrunk" trees to generate a diffgram.  That
should actually be pretty easy.

Note:  the diff is destructive to both trees (unavoidable).

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]