[xslt] xsl:key degenerate performance



I have come across a performance issue with the implementation of xsl:key when 
dealing with a large number of nodes. To demonstrate this, if one processes 
documents like

<KeyTest>
	<Cell><Value>0</Value></Cell>
	<Cell><Value>1</Value></Cell>
	<Cell><Value>2</Value></Cell>
	<!--Lots more similar lines follow-->
</KeyTest>

using 

<xsl:key name="cells-key" match="/KeyTest/Cell" use="Value"/>

the processing time seems to grow exponentially/geometrically as the number of 
nodes grows. For 1000 nodes, the processing is virtually instantaneous, for 10000 
nodes, it takes about 6 seconds, while for 100000 nodes processing time is in the 
order of 1 hour! (This is with libxslt-1.1.2.)

I have been able to check what is happening in a profiler, and it turns that virtually 
all of this time is spent in the single call to xmlXPathNodeSetSort() which occurs as 
part of the compilation of the "match" expression, the idea being to put the resulting 
node set into document order. For example, for the 10000 node case, this sort 
operation takes all but about 0.1 second of the total processing time. This is 
particularly unfortunate in this case, since the nodes are already sorted, and the 
whole operation is unnecessary!

The next question is what to do about this. Some possibilities might be:

(a) Don't sort the result of the xsl:key "match" expression. From my reading of the 
XSLT standard, it does not suggest that the result returned by the key() function 
should be in document order (or any particular order).

(b) Since implementing (a) might break some existing stylesheets which relied on 
the document order assumption, introduce some global setting to force sorting on or 
off.

(c) Try and be a bit smarter in detecting when it is actually necessary to sort, since 
in almost all cases, the nodes will be in document order in any case.

It would be really nice to fix this, since the other parts of the xsl:key, key() 
implementation appear to be really fast, and the only thing holding back the 
handling of larger documents is the unnecessary sort operation.

Chris Trengove





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]