[xslt] external data in stylesheets



First of all I would like to thank Daniel Veillard and all other 
contributors to libxslt for a wonderful toolkit.  I am using Matt 
Sergeant's XML::LibXSLT to access libsxlt from Perl.  And using Michael 
Kay's XSLT Programmer's Reference as my bible for the past two months.

For the project I'm working on (announcement RSN), we've come to the stage 
that everything basically works but that we would like to have it run 
faster.  And although we're using perl, we would like to keep the XSLT as 
vanilla as possible so that it is not necessary for our servers to do 
conversions in the future.

Recently we have been looking at different strategies with external data in 
XML-files.

document($filename)
Initially we implemented our stylesheets using document($filename). This 
works fine, but causes a re-read of the external XML-file each time a 
conversion is done.  Since our external XML-files are growing each day 
(we're looking at one 6 Mbyte XML-file already), this becomes a problem 
rather sooner than later.

document('')
Then we realised that we could import the external XML-files as entities 
into the stylesheet and access with document('').  However, this turns out 
to be _worse_ than document($filename), because document('') seems to read 
the stylesheet _itself_ each time (or at least the first) time it is 
called.  Because the stylesheet with the external data imported as 
entities  is _more_ XML to parse than just the external XML-file, using 
document('') is actually slower (at least, that is what I gather from a 
benchmark using xsltproc).  Not sure whether this is a bug or not.

<xsl:variable name="tree"><node>value</node></xsl:variable>
In Michael Kay's book, the XSLT 1.1 capability of specifying a whole tree 
(node-set) as the value of a variable, is mentioned.  However, this does 
not seem to work in libxslt 1.0.  Not sure whether it should or not.


Solutions?
Either the document('') solution that does _not_ re-read the stylesheet 
each time, or the <xsl:variable> solution would work for us.

I've been looking at the source code of libxslt/functions.c.  Lines 168-197 
seem to be relevant.

         :
         URI = xmlBuildURI(obj->stringval, base);
         if (base != NULL)
             xmlFree(base);
         if (URI == NULL) {
             valuePush(ctxt, xmlXPathNewNodeSet(NULL));
         } else {
             xsltTransformContextPtr tctxt;
         :

I must admit my C is a bit rusty, but the condition URI == NULL seems to be 
the condition triggering the handling of '' as a parameter to 
document().  I wonder if that case is handled correctly.  Shouldn't that 
fill in the pointer to the stylesheet DOM itself?


Am I looking in the right direction of solving this performance issue?  Or 
is there something wrong in my assumptions or view how one should work with 
XML?

Anyway, I would appreciate any help in solving this problem.  Which I think 
is a problem for any heavy duty (server-side) use of libxslt.



Elizabeth Mattijsen





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]