Re: [xml] confusing xpath performance characteristics

From: Stefan Behnel <stefan_ml behnel de>
To: xml gnome org
Subject: Re: [xml] confusing xpath performance characteristics
Date: Mon, 09 Nov 2009 19:23:47 +0100


Stefan Behnel, 09.11.2009 09:57:

It's the last operation, merging and sorting large sets of results, that
makes this extremely slow - it takes 92% of the evaluation time in my tests
(using libxml2 2.7.5). It's much faster to traverse the document in a
single step, and just select single attributes from it, that can quickly be
appended to the node set.

I imagine that this step could actually be optimised away in many cases
(like the case above, where results are guaranteed to be found in doc
order), so I guess it's just in there to avoid too much special casing. But
it seriously kills the performance here.


Would it be ok to add a new "int isInDocumentOrder" field to the xmlNodeSet
struct that would be true for node sets that are known to be in document order?

That would make it easy to skip the sorting step in all cases where
building the node set follows document order anyway. Given that node
comparison is horribly expensive (more than 90% of the sort time in my
tests), I think it's absolutely worth avoiding the sorting step whenever
possible.

Also, for sorted node sets, xmlXPathNodeSetAdd() could compare the new node
to the last node in the node-set and clear the flag if the new node breaks
the document order. That way, only N-1 comparisons would be required for a
sorted set of N nodes, instead of something like O(N^2) currently.

Stefan

References:
- Re: [xml] [lxml-dev] confusing xpath performance characteristics
  - From: Stefan Behnel

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]