Re: [xml] [Patch] Optimizing '//' in XPath expressions
- From: Daniel Veillard <veillard redhat com>
- To: Nick Wellnhofer <wellnhofer aevum de>
- Cc: xml gnome org
- Subject: Re: [xml] [Patch] Optimizing '//' in XPath expressions
- Date: Fri, 24 Aug 2012 12:21:37 +0800
Hi Nick,
On Sun, Aug 19, 2012 at 07:42:38PM +0200, Nick Wellnhofer wrote:
When investigating the libxslt performance problem reported in bug
#657665, I found that '//' in XPath expressions can be very slow
when working on large subtrees.
I guess everybody agree on that statement :-)
One of the reasons is the seemingly quadratic time complexity of the
duplicate checks when merging result nodes. The other is a missed
optimization for expressions of the form
'descendant-or-self::node()/axis::test'. Since '//' is expanded to
'/descendant-or-self::node()/', this type of expression is quite
common. Depending on the axis of the expression following the
'descendant-or-self' step, the following replacements can be made:
from descendant-or-self::node()/child::test
to descendant::test
from descendant-or-self::node()/descendant::test
to descendant::test
from descendant-or-self::node()/self::test
to descendant-or-self::test
from descendant-or-self::node()/descendant-or-self::test
to descendant-or-self::test
'test' can be any kind of node test.
That sounds right !
All those are forward progressing axis so the order in the node set
is also preserved.
With these replacements the possibly huge result of
'descendant-or-self::node()' doesn't have to be stored temporarily,
but can be processsed in one pass. If the resulting nodeset is
small, the duplicate checks aren't a problem.
yes, as I said there is very little optimization in the
XPath engine and as you pointed out the way expressions are compiled
is far from optimal, i just followed the spec description
I found that there already is a function called
xmlXPathRewriteDOSExpression which performs this optimization for a
very limited set of cases. It employs a complicated iteration scheme
for rewritten expressions. AFAICS, this can be avoided by simply
changing the axis of the expression like described above.
yes I think at the time I was a bit afraid of really modifying the
compiled tree but the 4 expressions above are clearly big enhancements.
I suspect it's just the top of the iceberg, there is a number of other
post-compilation optimization which can certainly be made, but with
less drastic improvements.
With the attached patch against libxml2 and the files from bug
#657665 I got the following results.
Before:
$ time xsltproc/xsltproc --noout service-names-port-numbers.xsl
service-names-port-numbers.xml
real 2m56.213s
user 2m56.123s
sys 0m0.080s
After:
$ time xsltproc/xsltproc --noout service-names-port-numbers.xsl
service-names-port-numbers.xml
real 0m3.836s
user 0m3.764s
sys 0m0.060s
I also ran the libxml2 and libxslt test suites with the patch and
couldn't detect any breakage.
Impressive, a patch which removes more cruft than it adds and
provide an order of magnitude improvement! please send more patches :-)
Gratefully reviewed, applied and pushed,
thanks a lot !
Now I still need to review/apply the sort improvements...
Daniel
--
Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/
daniel veillard com | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library http://libvirt.org/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]