Re: [xml] libxml2 XPath performances

From: Petr Pajas <pajas ufal ms mff cuni cz>
To: xml gnome org, veillard redhat com
Cc: Luca Padovani <lpadovan cs unibo it>, Jose Commins <axora myrealbox com>
Subject: Re: [xml] libxml2 XPath performances
Date: Sun, 14 Nov 2004 18:46:05 +0100

On Friday 12 November 2004 12:51, Daniel Veillard wrote:

On Fri, Nov 12, 2004 at 12:40:22PM +0100, Petr Pajas wrote:

Hi All,

let me contribute a bit to your discussion on this topic.

Daniel Veillard <veillard redhat com> writes:

On Thu, Nov 11, 2004 at 07:43:07PM +0100, Luca Padovani wrote:

<snip>

1) Assuming xmlXPathOrderDocElems has been called, if the compared
nodes have different parents, then the nearest common ancestor is
looked for (which is O(depth of the tree)), whereas it would IMO
suffice to compare the parents (node->parent is always an element,
right?) using indexed doc-order (getting parents is O(1)), right?


Sorry, the above reasoning of mine was wrong (comparing parents doesn't say 
anything if one of the nodes is a descendant of the other).

  No, if xmlXPathOrderDocElems() has been called I would expect the
comparison to be O(1) for nodes on different elements,


You still need to find their nearest common ancestor, which means you have to 
traverse the ancestor axis, which I wouldn't call O(1).

a list walk 
is still needed if for examples the nodes are siblings on a same element.
Maybe libxml2 does not do the optimum there.

<snip>

Utilizing the fact that non-indexed elements (by index I mean the
number assigned to them by xmlXPathOrderDocElems) are usually
surrounded by indexed elements, the above could be improved as
follows: as soon as we find a cur with an index, we could iterate from
node2 backward (->prev) and either we find cur, or we find another
node with an index. In the latter case we simply compare the indexes,
i.e. (not tested)


<snip>

This would make comparing sibling text nodes O(1) in most cases since
HAS_INDEX(node1->next) && HAS_INDEX(node2->prev) would hold in most
cases (since as I already mentioned, except for a few cases like
entities, processing instructions, and XInclude, text nodes are usually
surrounded by elements).


  Interesting, yes, I would suggest trying that in the code and running the 
libxml2 and libxslt regression tests to see if this actually works :-)


<snip>

  Try it and send a patch :-)


Ok, a patch is attached. The only snag is that I'm not sure what is the 
function supposed to return if node1 and node2 belong to different documents? 
I let it return -1 (which is what I believe it did before).

On my rather complex >30MB neither too "flat", not too deeply-nested, 
dictionary-like XML document the change makes a significant improvement for 
just count(//node()): from 30s to only 15s (on a document "indexed" with 
xmlXPathOrderDocElems). I guess it's worth it ;-)

Cheers,
-- Petr

Attachment: xpath.patch
Description: Text Data

Attachment: pgpWHPXYt6caZ.pgp
Description: PGP signature

Follow-Ups:
- Re: [xml] libxml2 XPath performances
  - From: Jose Commins

References:
- [xml] XPath versus node walking performance.
  - From: Jose Commins
- Re: [xml] libxml2 XPath performances
  - From: Petr Pajas
- Re: [xml] libxml2 XPath performances
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]