Hi Daniel, All, I'm working with large XML documents (over 500K elements). It's structure (written as XPath) looks like /v/body[1]/word/valency_frames[1]/frame/frame_elements[1]/element/form and below form is usually rather shallow structure with only a few elements named "node". I'm interested in selecting form elements. I use xmlXPathOrderDocElems to increase performance. If I write the XPath as above, I get the results in a second or so. But if I write it e.g. as /v/body[1]/word/valency_frames[1]/frame/frame_elements[1]/descendant::form (note that the subtree of frame_elements[1] is usually very small), it takes several minutes. I used gprof to find out why. Here is the interesting part: index % time self children called name <spontaneous> [1] 98.8 432.52 0.00 xmlXPathNodeSetMerge [1] Examining of xpath.c revealed that while AXIS_DESCENDANT uses xmlXPathNodeSetMerge, AXIS_CHILD uses xmlXPathNodeSetMergeUnique, which makes it a lot faster. The doc-comment for both of them is the same. My question is First, what is the difference between these two: the code indicates that the latter assumes that the node-sets are disjoint, right? Can xmlXPathNodeSetMergeUnique be used for AXIS_DESCENDANT and AXIS_DESCENDANT_OR_SELF as well? (I probed if it would speed descendant::form and it did, but maybe the change is semantically incorrect). Alternatively, is there some space for optimization in xmlXPathNodeSetMerge? Thanks, -- Petr
Attachment:
pgp0noDpKStc5.pgp
Description: PGP signature