Re: [xslt] efficiency



In message <003901c20222$3e2d8770$9e01a8c0@office.abanes.org>
          "Michael Rothwell" <rothwell@holly-springs.nc.us> wrote:

> I'm applying a 24k stylesheet to a 96k xml file. It takes about two seconds
> on a P3-700, with very high CPU usage.
>
> It seems like the apply-stylesheet time does not grow linearly with the size
> of the input -- it grows much more rapidly.

That really depends on what you're making it do. If the xml file (in a
pathological case) contained just lots of <a /> elements and your stylesheet
had a template to output a count of all the preceeding elements by
recursively applying to the preceeding elements and on each recursion adding
one to a value passed (counting the elements the very hard way) then the
time taken would be excessive. But that's just a complexity problem.

The particular amount of time it takes is very dependant on the OS you use,
the memory allocation routines you have linked against, what other processes
are on the system and any other system limitations (paging speed when memory
is exhausted, for example).

> I would like to be able to process the XML in a smaller amount of time,
> because the reponsiveness of the web server (which is doing the transforms)
> suffers quite dramatically. I'm not sure how to achieve this, though.
> Smaller input is not really an option. Are there some general optimization
> strategies for stylesheets processed by liibxslt? Some known slow
> constructs? Gotchas?

Use the correct methods for your data set - in the above example, recursing
would be foolish when you can use the count function. Using recursion in any
form is always going to impact on the systems performance so if you can reduce
this to a minimum it's always useful (if at all possible). Anywhere that
you're searching the entire document for something (//blah for example) is
going to be more processing intensive than a plain descent search (blah), or
a search from the root (/root/thing/blah), so using those in for-each of
apply-templates may cause much more processing.

Really, though, you just have to look at your stylesheet and see what it's
doing, just like any other programming language; if you're introducing a
term which is very elegant but requires massive amount of processing then
it may not be ideal.

If may also be that you can modify your data set to cover your processing
requirements - since you seem inclined to generate the data on the fly - for
example, if you have to use an XSLT template with recursion to do a
particular string transformation then you doing that when your data set is
created might help you.

If it's static data you're processing, don't bother processing them on the
fly - ok, so that's obvious, but there's no point in burdening the server
with unnecessary work; not least because you'll get repeated hits as the
autogenerated document could have changed in the intermediate time so you
get rehit for every request rather than one hit and then it being the
cache's problem. .

Obviously, though, this isn't really a topic specific to libxslt; I don't
really use the other processors much so I don't know how they compare of what
gotchas there might be that are particular to this processor. Indeed, I would
try to avoid working toward a single processors strengths or weaknesses.

-- 
Gerph {djf0-.3w6e2w2.226,6q6w2q2,2.3,2m4}
URL: http://www.movspclr.co.uk/
... Eyes to the heavens, screaming at the sky;
    Trying to send you messages, but choking on goodbye.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]