[xslt] Profiling and possible speed improvements



Hi folks,

My xslt stylesheet took a bit long to apply (many hours, but amongst other 
things it creates an index of my document, which is probably a bit of a
stretch to do with xslt), so I profiled it with gprof to see where it
spent its time. It seems that most time is consumed within xmlXPathCmpNodes,
which in turn is called from xmlXPathNodeSetSort. As this uses a Shell
sort algorithm, and it seems that I am handling somewhat large nodesets,
I wondered whether it would help to use another sorting algorithm (with
a little luck, that would reduce the number of node comparisons needed). 
To keep it simple, I tried the qsort algorithm of glibc-2.2.5, which
indeed uses a quick-sort. This seemed to help quite a lot.

I have attached the diff I used to illustrate the changes needed. 
Basically, I created xmlXPathCmpNodes_qsort that does the same job
as xmlXPathCmpNodes, except that it conforms to qsort semantics
(don't return -2, -1 means less than, use pointers to elements).
Don't apply this blindly, it needs some more work.

Thanks for the good work,
   Frodo

-- 
Frodo Looijaard <frodol@dds.nl>  PGP key and more: http://huizen.dds.nl/~frodol
Defenestration n. (formal or joc.):
  The act of removing Windows from your computer in disgust, usually followed
  by the installation of Linux or some other Unix-like operating system.
*** /tmp/libxml2-2.4.20/xpath.c	Thu Apr 11 04:35:16 2002
--- libxml2-2.4.20/xpath.c	Tue Apr 23 20:36:22 2002
***************
*** 1410,1413 ****
--- 1410,1491 ----
  
  /**
+  * xmlXPathCmpNodes:
+  * @node1:  the first node
+  * @node2:  the second node
+  *
+  * Compare two nodes w.r.t document order
+  *
+  * Returns 0 in case of error -1 if first point < second point, 0 if
+  *         that's the same node, 1 otherwise
+  */
+ int
+ xmlXPathCmpNodes_qsort(xmlNodePtr *node1_ptr, xmlNodePtr *node2_ptr) {
+     int depth1, depth2;
+     xmlNodePtr cur, root,node1=*node1_ptr,node2=*node2_ptr;
+ 
+     if ((node1 == NULL) || (node2 == NULL)) 
+     /*
+      * a couple of optimizations which will avoid computations in most cases
+      */
+     if (node1 == node2)
+ 	return(0);
+     if ((node1->type == XML_NAMESPACE_DECL) ||
+         (node2->type == XML_NAMESPACE_DECL))
+ 	return(-1);
+     if (node1 == node2->prev)
+ 	return(-1);
+     if (node1 == node2->next)
+ 	return(1);
+ 
+     /*
+      * compute depth to root
+      */
+     for (depth2 = 0, cur = node2;cur->parent != NULL;cur = cur->parent) {
+ 	if (cur == node1)
+ 	    return(-1);
+ 	depth2++;
+     }
+     root = cur;
+     for (depth1 = 0, cur = node1;cur->parent != NULL;cur = cur->parent) {
+ 	if (cur == node2)
+ 	    return(1);
+ 	depth1++;
+     }
+     /*
+      * Distinct document (or distinct entities :-( ) case.
+      */
+     if (root != cur) {
+ 	return(0);
+     }
+     /*
+      * get the nearest common ancestor.
+      */
+     while (depth1 > depth2) {
+ 	depth1--;
+ 	node1 = node1->parent;
+     }
+     while (depth2 > depth1) {
+ 	depth2--;
+ 	node2 = node2->parent;
+     }
+     while (node1->parent != node2->parent) {
+ 	node1 = node1->parent;
+ 	node2 = node2->parent;
+ 	/* should not happen but just in case ... */
+ 	if ((node1 == NULL) || (node2 == NULL)) 
+ 	    return(0);
+     }
+     /*
+      * Find who's first.
+      */
+     if (node1 == node2->next)
+ 	return(1);
+     for (cur = node1->next;cur != NULL;cur = cur->next)
+ 	if (cur == node2)
+ 	    return(-1);
+     return(1); /* assume there is no sibling list corruption */
+ }
+ 
+ /**
   * xmlXPathNodeSetSort:
   * @set:  the node set
***************
*** 1423,1426 ****
--- 1501,1505 ----
  	return;
  
+ #if 0
      /* Use Shell's sort to sort the node-set */
      len = set->nodeNr;
***************
*** 1440,1443 ****
--- 1519,1526 ----
  	}
      }
+ #else
+     /* Try QuickSort here */
+     qsort(set->nodeTab,set->nodeNr,sizeof(set->nodeTab[0]),xmlXPathCmpNodes_qsort);
+ #endif
  }
  


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]