[xml] xpath - descendant::something versus */something performance

Hi Daniel, All,

I'm working with large XML documents (over 500K elements).  It's
structure (written as XPath) looks like
and below form is usually rather shallow structure with only a few
elements named "node".  I'm interested in selecting form elements. I
use xmlXPathOrderDocElems to increase performance. If I write the
XPath as above, I get the results in a second or so. But if I write it
e.g. as


(note that the subtree of frame_elements[1] is usually very small), it
takes several minutes. I used gprof to find out why. Here is the
interesting part:

index % time    self  children    called     name
[1]     98.8  432.52    0.00                 xmlXPathNodeSetMerge [1]

Examining of xpath.c revealed that while AXIS_DESCENDANT uses
xmlXPathNodeSetMerge, AXIS_CHILD uses xmlXPathNodeSetMergeUnique,
which makes it a lot faster. The doc-comment for both of them is the
same. My question is

First, what is the difference between these two: the code indicates
that the latter assumes that the node-sets are disjoint, right?  Can
xmlXPathNodeSetMergeUnique be used for AXIS_DESCENDANT and
AXIS_DESCENDANT_OR_SELF as well? (I probed if it would speed
descendant::form and it did, but maybe the change is semantically
incorrect). Alternatively, is there some space for optimization in


-- Petr

Attachment: pgp0noDpKStc5.pgp
Description: PGP signature

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]