Re: [xml] xmlXPathNodeSetSort performance

Keeping it in header has the advantage that it remains generic and can be used from anywhere and with any 
type of parameters (e.g. not only for sorting xmlNodePtrs). If in .c file, there can only be one sort 
function. Although since the sort is used from only one place, it does not matter :-)
Another thing would be the need to move XP_OPTIMIZED_NON_ELEM_COMPARISON to a header included both from the 
sort.c and xpath.c. But that would probably be for better.

I have done more performance tests. Timsort behaves better than I thought (or rather Shellsort worse than I 
thought). For sorted nodesets of size n like '/item[true()]' Timsort does only n-1 comparisons, unlike 
current Shellsort. ('/item' does not need sorting.) It does not matter whether the data are small or big, 
Timsort wins. For partially unsorted sequences like '/item[(position() mod 10) = 0] | /item[(position() mod 
10) = 1] | ..' it wins too. It wins both at number of comparisons or valgrind instructions (of the whole 


-----Original Message-----
From: xml-bounces gnome org [mailto:xml-bounces gnome org] On Behalf Of Stefan Behnel
Sent: 25. července 2012 18:26
To: xml
Subject: Re: [xml] xmlXPathNodeSetSort performance

Vojtech Fried, 25.07.2012 17:45:
Second version of Timsort patch, slightly more polished. It builds on 
my gcc, I have fixed some warnings and merged the two headers into 
one. I did not move the code to .c file though, because the sort 
implementation uses some macro magic, i.e. the functions you see in 
the code are really function "templates" and they are "instantiated" 
with the name and type you choose with the macros (basically it is a 
poor man's C++ template system :-). I could remove the macros and 
specialize the functions for libxml xmlNodePtr, but that seems quite ugly to me.

What about moving it all into a .c file, then adding a new entry point at the end that specialises the 
preceding code for the exact use case in xpath.c?

Thanks for doing this, BTW, the node set sorting performance is a huge problem. Not only for very large lists 
of nodes, but also for multi-step XPath expressions, which currently result in multiple sorting and 
re-sorting steps.

Note that node comparison can be very costly, so another thing to investigate would be to add a flag to the 
node set struct that remembers if it's sorted already. That would allow the sort algorithm to skip to the 
merging step directly, instead of traversing the whole node list first.

xml mailing list, project page xml gnome org

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]