Re: [xml] Profiling and possible speed improvements



On Tue, Apr 23, 2002 at 08:41:10PM +0200, frodol dds nl wrote:
Hi folks,

My xslt stylesheet took a bit long to apply (many hours, but amongst other 
things it creates an index of my document, which is probably a bit of a
stretch to do with xslt), so I profiled it with gprof to see where it
spent its time. It seems that most time is consumed within xmlXPathCmpNodes,
which in turn is called from xmlXPathNodeSetSort. As this uses a Shell
sort algorithm, and it seems that I am handling somewhat large nodesets,
I wondered whether it would help to use another sorting algorithm (with
a little luck, that would reduce the number of node comparisons needed). 
To keep it simple, I tried the qsort algorithm of glibc-2.2.5, which
indeed uses a quick-sort. This seemed to help quite a lot.

I have attached the diff I used to illustrate the changes needed. 
Basically, I created xmlXPathCmpNodes_qsort that does the same job
as xmlXPathCmpNodes, except that it conforms to qsort semantics
(don't return -2, -1 means less than, use pointers to elements).
Don't apply this blindly, it needs some more work.

  Hum, this definitely sounds interesting, one should add the following
checks:
    - first verify that the resulting sort produces identical results
      (running on the test-suite input might be sufficient)
    - second check at configure time whether qsort is really available
      and if not fallback to the existing one.
    - another very effective improvement would be to use the lines held
      in the nodes (if line numbering was collected when the input document
      was parsed) and use them to speed up the compare in most cases.

 Still I have a hard time believing xsltproc could take an order of hours to
process a document, but well .... Can you produce the 10 top lines reported by
gprof before and after ?

  TIA,

Daniel


-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]