On Friday 12 November 2004 12:51, Daniel Veillard wrote:
On Fri, Nov 12, 2004 at 12:40:22PM +0100, Petr Pajas wrote:Hi All, let me contribute a bit to your discussion on this topic. Daniel Veillard <veillard redhat com> writes:On Thu, Nov 11, 2004 at 07:43:07PM +0100, Luca Padovani wrote:
<snip>
1) Assuming xmlXPathOrderDocElems has been called, if the compared nodes have different parents, then the nearest common ancestor is looked for (which is O(depth of the tree)), whereas it would IMO suffice to compare the parents (node->parent is always an element, right?) using indexed doc-order (getting parents is O(1)), right?
Sorry, the above reasoning of mine was wrong (comparing parents doesn't say anything if one of the nodes is a descendant of the other).
No, if xmlXPathOrderDocElems() has been called I would expect the comparison to be O(1) for nodes on different elements,
You still need to find their nearest common ancestor, which means you have to traverse the ancestor axis, which I wouldn't call O(1).
a list walk is still needed if for examples the nodes are siblings on a same element. Maybe libxml2 does not do the optimum there.
<snip>
Utilizing the fact that non-indexed elements (by index I mean the number assigned to them by xmlXPathOrderDocElems) are usually surrounded by indexed elements, the above could be improved as follows: as soon as we find a cur with an index, we could iterate from node2 backward (->prev) and either we find cur, or we find another node with an index. In the latter case we simply compare the indexes, i.e. (not tested)
<snip>
This would make comparing sibling text nodes O(1) in most cases since HAS_INDEX(node1->next) && HAS_INDEX(node2->prev) would hold in most cases (since as I already mentioned, except for a few cases like entities, processing instructions, and XInclude, text nodes are usually surrounded by elements).Interesting, yes, I would suggest trying that in the code and running the libxml2 and libxslt regression tests to see if this actually works :-)
<snip>
Try it and send a patch :-)
Ok, a patch is attached. The only snag is that I'm not sure what is the function supposed to return if node1 and node2 belong to different documents? I let it return -1 (which is what I believe it did before). On my rather complex >30MB neither too "flat", not too deeply-nested, dictionary-like XML document the change makes a significant improvement for just count(//node()): from 30s to only 15s (on a document "indexed" with xmlXPathOrderDocElems). I guess it's worth it ;-) Cheers, -- Petr
Attachment:
xpath.patch
Description: Text Data
Attachment:
pgpWHPXYt6caZ.pgp
Description: PGP signature