Re: [xml] xmlXPathNodeSetSort performance
- From: Vojtech Fried <vfried opentext com>
- To: xml <xml gnome org>
- Subject: Re: [xml] xmlXPathNodeSetSort performance
- Date: Thu, 26 Jul 2012 13:45:11 +0000
Keeping it in header has the advantage that it remains generic and can be used from anywhere and with any
type of parameters (e.g. not only for sorting xmlNodePtrs). If in .c file, there can only be one sort
function. Although since the sort is used from only one place, it does not matter :-)
Another thing would be the need to move XP_OPTIMIZED_NON_ELEM_COMPARISON to a header included both from the
sort.c and xpath.c. But that would probably be for better.
I have done more performance tests. Timsort behaves better than I thought (or rather Shellsort worse than I
thought). For sorted nodesets of size n like '/item[true()]' Timsort does only n-1 comparisons, unlike
current Shellsort. ('/item' does not need sorting.) It does not matter whether the data are small or big,
Timsort wins. For partially unsorted sequences like '/item[(position() mod 10) = 0] | /item[(position() mod
10) = 1] | ..' it wins too. It wins both at number of comparisons or valgrind instructions (of the whole
sort).
/Vojtech
-----Original Message-----
From: xml-bounces gnome org [mailto:xml-bounces gnome org] On Behalf Of Stefan Behnel
Sent: 25. července 2012 18:26
To: xml
Subject: Re: [xml] xmlXPathNodeSetSort performance
Vojtech Fried, 25.07.2012 17:45:
Second version of Timsort patch, slightly more polished. It builds on
my gcc, I have fixed some warnings and merged the two headers into
one. I did not move the code to .c file though, because the sort
implementation uses some macro magic, i.e. the functions you see in
the code are really function "templates" and they are "instantiated"
with the name and type you choose with the macros (basically it is a
poor man's C++ template system :-). I could remove the macros and
specialize the functions for libxml xmlNodePtr, but that seems quite ugly to me.
What about moving it all into a .c file, then adding a new entry point at the end that specialises the
preceding code for the exact use case in xpath.c?
Thanks for doing this, BTW, the node set sorting performance is a huge problem. Not only for very large lists
of nodes, but also for multi-step XPath expressions, which currently result in multiple sorting and
re-sorting steps.
Note that node comparison can be very costly, so another thing to investigate would be to add a flag to the
node set struct that remembers if it's sorted already. That would allow the sort algorithm to skip to the
merging step directly, instead of traversing the whole node list first.
Stefan
_______________________________________________
xml mailing list, project page http://xmlsoft.org/ xml gnome org https://mail.gnome.org/mailman/listinfo/xml
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]