[xslt] Re: Re: xsl:key degenerate performance



"Marc-Antoine Parent" <maparent@acm.org> wrote in message
AE717301-565C-11D8-AF14-003065D4F272@acm.org">news:AE717301-565C-11D8-AF14-003065D4F272@acm.org...
> Hello! I just looked at the sort routine and found out it was a shell
> sort. Good enough for most cases; but on an already sorted list it
> indeed has a cost.
> How about exploiting some alternative, faster sorting algorithms?
> Heapsort or Mergesort are generally faster than shell sort; and their
> worst case is acceptable, unlike quicksort.
> A better option is an algorithm designed to be resilient to
> already-sorted lists, such as Smoothsort.
> http://en.wikipedia.org/wiki/Smoothsort
> If there is interest, I can propose an implementation.
> Marc-Antoine

The degenerate performance comes not just from the sorting, but also from
the comparison function used, which takes longer as the number of nodes
increases. However, when comparing adjacent sibling nodes, the comparison
function works very fast, since it can rely on an optimisation of just
looking at the nodes' "next" and "previous" pointers.

My guess is that the vast majority of node sets in typical stylesheets (a)
are already sorted in document order (because of the way the expression is
evaluated), and (b) often contain adjacent sibling nodes. I have tried out
adding the following to xmlXPathNodeSetSort(), and it appears to work well
in practice.

/* See if already sorted */
len = set->nodeNr;
sorted = 1;
for (i = 0; i < len - 1; ++i) {
    if (xmlXPathCmpNodes(set->nodeTab[i],
        set->nodeTab[i + 1]) == -1) {
        sorted = 0;
        break;
    }
}
if (sorted)
    return;

A better sorting algorithm could also be tried, but I think it would need to
exploit the comparison of adjacent nodes if it is to work as well as this.






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]