[xml] xpath - descendant::something versus */something performance



Hi Daniel, All,

I'm working with large XML documents (over 500K elements).  It's
structure (written as XPath) looks like
/v/body[1]/word/valency_frames[1]/frame/frame_elements[1]/element/form
and below form is usually rather shallow structure with only a few
elements named "node".  I'm interested in selecting form elements. I
use xmlXPathOrderDocElems to increase performance. If I write the
XPath as above, I get the results in a second or so. But if I write it
e.g. as

/v/body[1]/word/valency_frames[1]/frame/frame_elements[1]/descendant::form

(note that the subtree of frame_elements[1] is usually very small), it
takes several minutes. I used gprof to find out why. Here is the
interesting part:

index % time    self  children    called     name
                                                 <spontaneous>
[1]     98.8  432.52    0.00                 xmlXPathNodeSetMerge [1]



Examining of xpath.c revealed that while AXIS_DESCENDANT uses
xmlXPathNodeSetMerge, AXIS_CHILD uses xmlXPathNodeSetMergeUnique,
which makes it a lot faster. The doc-comment for both of them is the
same. My question is

First, what is the difference between these two: the code indicates
that the latter assumes that the node-sets are disjoint, right?  Can
xmlXPathNodeSetMergeUnique be used for AXIS_DESCENDANT and
AXIS_DESCENDANT_OR_SELF as well? (I probed if it would speed
descendant::form and it did, but maybe the change is semantically
incorrect). Alternatively, is there some space for optimization in
xmlXPathNodeSetMerge?

Thanks,

-- Petr

Attachment: pgp0noDpKStc5.pgp
Description: PGP signature



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]