Re: [xml] xmlXPathNodeSetSort performance



Vojtech Fried, 26.07.2012 15:45:
Keeping it in header has the advantage that it remains generic and can
be used from anywhere and with any type of parameters (e.g. not only for
sorting xmlNodePtrs). If in .c file, there can only be one sort
function. Although since the sort is used from only one place, it does
not matter :-) Another thing would be the need to move
XP_OPTIMIZED_NON_ELEM_COMPARISON to a header included both from the
sort.c and xpath.c. But that would probably be for better.

Absolutely. If we ever need it for sorting other kinds of data, we can
simply add another entry point to the source file. Everything else can just
be static and hidden in the module.


I have done more performance tests. Timsort behaves better than I
thought (or rather Shellsort worse than I thought). For sorted nodesets
of size n like '/item[true()]' Timsort does only n-1 comparisons, unlike
current Shellsort. ('/item' does not need sorting.)

Well, "/item[true()]" doesn't need sorting either, if you know that the
underlying set on which you evaluate the condition is always sorted
already. O(n) is way worse than O(1), for most n. That's why I mentioned
the need for a flag "sorted" in the node set structure.


It does not matter
whether the data are small or big, Timsort wins. For partially unsorted
sequences like '/item[(position() mod 10) = 0] | /item[(position() mod
10) = 1] | ..' it wins too. It wins both at number of comparisons or
valgrind instructions (of the whole sort).

Obviously. Tim Peters specifically designed his algorithm to minimise the
number of comparisons, and as I said, node comparisons can be very costly.

Stefan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]