RE: [xslt] RE: [xml] does xsltproc caches subexpressions

From: "Buchcik, Kasimier" <k buchcik 4commerce de>
To: "Frans Englich" <frans englich telia com>
Cc: xslt gnome org
Subject: RE: [xslt] RE: [xml] does xsltproc caches subexpressions
Date: Fri, 26 May 2006 12:14:24 +0200
Hi, 

> -----Original Message-----
> From: Frans Englich [mailto:frans englich telia com] 
> 
> On Monday 22 May 2006 17:17, Buchcik, Kasimier wrote:
> [...]
> 
> > Theoretically the evaluation could stop on the first "foo" 
> element in
> > doc-order, but it does not: first the predicate "@boo" is applied to
> > both "foo" elements, which results in a node-set containing both
> > elements.
> > Then the predicate "@zoo" is applied to the node-set, and 
> only here we
> > can
> > "exit early" on the first match. I know this goes quite hand in hand
> > with
> > what the XPath says, but I think we loose opportunities for 
> optimization
> > here.
> >
> > I think the result would be same same, if the predicates 
> were evaluated
> > all
> > in a row on each node of the initial node-set. If such a 
> mechanism would
> > be more complex than the one existing in an other question.
> >
> > It would be interesting to hear how Saxon has implemented this.
> 
> I am not deeply familiar with Saxon, but have read parts of 
> it, and in broad 
> terms it is very much similar to my own implementation[1], 
> which in some 
> cases have been modelled after Saxon.
> 
> Evaluation is done by building a chain of Java-style 
> Iterators. That is,
> evaluation logic doesn't sit in the AST node and builds a 
> list containing its 
> result, but an Iterator is allocated which handles requests on an 
> item-per-item basis. An example:
> 
> /foo/bar/[1]
> 
> Evaluating this expression starts by asking "[1]" for one 
> item. This in turn asks "bar" for one item, which in turn asks "foo"
for one 
> item. So, no more "foo" elements are detected than the first one that
contains 
> a bar element.

Ah, this is nice. The machinery is encapsulated in specialized
iterators, so one don't has to think much when initiating
the iterations; plus the ability to stop subsequent iterations
(e.g. if we have a "[1]").

> This lazy-evaluation, chaining of pull-based iterator applies 
> to pretty much 
> everything(even XQuery user defined functions when allowed), 
> but sometimes 
> has to be broken due to sorting, for example. Here's another example:
> 
> XPath 2.0 has the "range" expression which allows generation 
> of sequences of 
> integers. For example, "1 to 4" generates the sequence (1, 2, 
> 3, 4). The 
> implementation consists of two parts, an AST node(called 
> RangeExpression) and 
> an iterator(RangeIterator). When evaluating the expression (1 
> to 10000)
> [last()], the RangeExpression returns an RangeIterator which 
> subsequently is 
> asked to return the last item. The end point is that it 
> evaluates in constant 
> time. (Not something one would encounter in the wild that 
> often, but it 
> demonstrates the point.)

I'm not sure I understand this correctly: do you mean that
the iterator still will process all 10000 iterations?
If you say that the RangeIterator will be asked to return
the last item, then the RangeIterator could just return the
integer 10000; after all it should be a specilized class for
ranges. But maybe I got you wrong.

> This is not an exact description of Saxon nor my 
> implementation, but the priniple is in the right direction.

It would be interesting to see if some parts of Libxml2's
XPath module can be refactored in the future to also use this
technique.
I think many people using XPath on the edge implicitely rely on
optimizations in the implementation for certain expressions.
This particilar case here - I refer to Jeni Tennisons's 
"generate-english-index" template in the docbook stylesheets -
is a good example for such expected optimizations.

[...]

> 1.
> It has not yet been released, but is becoming an XQuery 
> 1.0/XPath 2.0/XSL-T 
> 2.0 implementation for KDE. Keywords in a nutshell is 
> probably "statically 
> typed", "pull-based data handling". Tokenizer is handwritten, 
> parser Bison 
> based. Source is here:
> svn://anonsvn.kde.org/home/kde/trunk/kdenonbeta/kdom/xpath
> 
> To browse the documentation, run `doxygen` and browse 
> xpath/html/index.html. 
> The classes "Expression" and "Iterator" are central.

Great! Seems like this will become the "standard" open source
tool for the second generation of XPath/XSLT in the future :-)

By the way, sorry that I moved this thread over to xslt gnome org:
what I thought would be a libxslt related issue has become a
libxml2 issue at the end.

Regards,

Kasimier
Follow-Ups:
- Re: [xslt] RE: [xml] does xsltproc caches subexpressions
  - From: Frans Englich
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]