RE: [xslt] RE: [xml] does xsltproc caches subexpressions


> -----Original Message-----
> From: xslt-bounces gnome org [mailto:xslt-bounces gnome org] 
> On Behalf Of Frans Englich
> On Monday 22 May 2006 10:54, Buchcik, Kasimier wrote:
> [...]
> > The tiny "[1]) = 1" is the part 
> > of the expression, which e.g. Saxon can use to optimize the 
> evaluation.
> > With 1732 nodes in the key, Saxon took ~12s on a test box, 
> while Libxslt
> > took ~50s for evaluation of this expression. On the other hand, if I
> > changed this expression to use "count(.....) > 1" (i.e. without the
> > "[1]")
> > then Saxon needed already ages to process 300 nodes, while 
> Libxslt times
> > were still linear. So Saxon is optimized, if "[1]" is used;
> > Libxslt/Libxml2
> > not.
> What kind of optimizations does libxml2 do on predicated, in general?
> For the "[1]" predicate, I think two types can be done(when 
> we're talking about implementations like Saxon and libxml2):
> * Analysis of context dependency, and as result conclude how 
> the predicate must be evaluated. For example, "1" doesn't depend on
> context and will therefore always evaluate to the same. However, an
> involving "." would have to be evaluated each time. End point, some 
> predicates only have to be evaluated once.
> * Early exit. For example, one knows that "1" will never 
> match beyond the first item.
> Saxon implements both.

The following might be a bit vague, since I only scratched Libxml2's
I think Libxml2 does only the "early exit" optimization; at least I
found code indicating the opposite. Although in the case of [n], it
evaluate "n", but just triggers execution of optimized evaluation
wrt the position. The 3 functions are:
1) xmlXPathCompOpEvalFirst()       - used for "[1]"
2) xmlXPathCompOpEvalLast()        - used for "last()"
3) xmlXPathNodeCollectAndTestNth() - used for "[n]"

But there's one catch for the "early exit": In Libxml2 each predicate is
evaluated on the node-set result of its preceding predicate; so the
"early exit" is only possible at the last level of predicates.


<xsl:stylesheet version="1.0"
  <xsl:key name="mykey" match="//foo" use="@val"/>
  <xsl:template match="/">    
	  <xsl:value-of select="count(key('mykey',
'a')[ boo][@zoo][1])"/>

<?xml version="1.0"?>
<foo val="a" boo="x" zoo="x">
  <foo val="a" boo="x" zoo="x"/>

Theoretically the evaluation could stop on the first "foo" element in
doc-order, but it does not: first the predicate "@boo" is applied to
both "foo" elements, which results in a node-set containing both
Then the predicate "@zoo" is applied to the node-set, and only here we
"exit early" on the first match. I know this goes quite hand in hand
what the XPath says, but I think we loose opportunities for optimization

I think the result would be same same, if the predicates were evaluated
in a row on each node of the initial node-set. If such a mechanism would
be more complex than the one existing in an other question.

It would be interesting to hear how Saxon has implemented this.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]