Re: [xslt] RE: [xml] does xsltproc caches subexpressions

From: Frans Englich <frans englich telia com>
To: "Buchcik, Kasimier" <k buchcik 4commerce de>
Cc: The Gnome XSLT library mailing-list <xslt gnome org>
Subject: Re: [xslt] RE: [xml] does xsltproc caches subexpressions
Date: Wed, 24 May 2006 11:18:54 +0000

On Monday 22 May 2006 17:17, Buchcik, Kasimier wrote:
[...]

> Theoretically the evaluation could stop on the first "foo" element in
> doc-order, but it does not: first the predicate "@boo" is applied to
> both "foo" elements, which results in a node-set containing both
> elements.
> Then the predicate "@zoo" is applied to the node-set, and only here we
> can
> "exit early" on the first match. I know this goes quite hand in hand
> with
> what the XPath says, but I think we loose opportunities for optimization
> here.
>
> I think the result would be same same, if the predicates were evaluated
> all
> in a row on each node of the initial node-set. If such a mechanism would
> be more complex than the one existing in an other question.
>
> It would be interesting to hear how Saxon has implemented this.

I am not deeply familiar with Saxon, but have read parts of it, and in broad 
terms it is very much similar to my own implementation[1], which in some 
cases have been modelled after Saxon.

Evaluation is done by building a chain of Java-style Iterators. That is, 
evaluation logic doesn't sit in the AST node and builds a list containing its 
result, but an Iterator is allocated which handles requests on an 
item-per-item basis. An example:

/foo/bar/[1]

Evaluating this expression starts by asking "[1]" for one item. This in turn 
asks "bar" for one item, which in turn asks "foo" for one item. So, no more 
"foo" elements are detected than the first one that contains a bar element.

This lazy-evaluation, chaining of pull-based iterator applies to pretty much 
everything(even XQuery user defined functions when allowed), but sometimes 
has to be broken due to sorting, for example. Here's another example:

XPath 2.0 has the "range" expression which allows generation of sequences of 
integers. For example, "1 to 4" generates the sequence (1, 2, 3, 4). The 
implementation consists of two parts, an AST node(called RangeExpression) and 
an iterator(RangeIterator). When evaluating the expression (1 to 10000)
[last()], the RangeExpression returns an RangeIterator which subsequently is 
asked to return the last item. The end point is that it evaluates in constant 
time. (Not something one would encounter in the wild that often, but it 
demonstrates the point.)

This is not an exact description of Saxon nor my implementation, but the 
priniple is in the right direction.

Cheers,

		Frans

1.
It has not yet been released, but is becoming an XQuery 1.0/XPath 2.0/XSL-T 
2.0 implementation for KDE. Keywords in a nutshell is probably "statically 
typed", "pull-based data handling". Tokenizer is handwritten, parser Bison 
based. Source is here:
svn://anonsvn.kde.org/home/kde/trunk/kdenonbeta/kdom/xpath

To browse the documentation, run `doxygen` and browse xpath/html/index.html. 
The classes "Expression" and "Iterator" are central.

References:
- RE: [xslt] RE: [xml] does xsltproc caches subexpressions
  - From: Buchcik, Kasimier

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]