Re: [xslt] xsl:key degenerate performance



Chris Trengove said:
> I have come across a performance issue with the implementation of
> xsl:key when
> dealing with a large number of nodes. To demonstrate this, if one
> processes
> documents like
>
> <KeyTest>
> 	<Cell><Value>0</Value></Cell>
> 	<Cell><Value>1</Value></Cell>
> 	<Cell><Value>2</Value></Cell>
> 	<!--Lots more similar lines follow-->
> </KeyTest>
>
> using
>
> <xsl:key name="cells-key" match="/KeyTest/Cell" use="Value"/>
>
> the processing time seems to grow exponentially/geometrically as the
> number of
> nodes grows. For 1000 nodes, the processing is virtually
> instantaneous, for 10000
> nodes, it takes about 6 seconds, while for 100000 nodes processing
> time is in the
> order of 1 hour! (This is with libxslt-1.1.2.)
>
> I have been able to check what is happening in a profiler, and it
> turns that virtually
> all of this time is spent in the single call to
> xmlXPathNodeSetSort() which occurs as
> part of the compilation of the "match" expression, the idea being to
> put the resulting
> node set into document order. For example, for the 10000 node case,
> this sort
> operation takes all but about 0.1 second of the total processing
> time. This is
> particularly unfortunate in this case, since the nodes are already
> sorted, and the
> whole operation is unnecessary!
>
> The next question is what to do about this. Some possibilities might
> be:
>
> (a) Don't sort the result of the xsl:key "match" expression. From my
> reading of the
> XSLT standard, it does not suggest that the result returned by the
> key() function
> should be in document order (or any particular order).
>
> (b) Since implementing (a) might break some existing stylesheets
> which relied on
> the document order assumption, introduce some global setting to
> force sorting on or
> off.
>
> (c) Try and be a bit smarter in detecting when it is actually
> necessary to sort, since
> in almost all cases, the nodes will be in document order in any
> case.
>
> It would be really nice to fix this, since the other parts of the
> xsl:key, key()
> implementation appear to be really fast, and the only thing holding
> back the
> handling of larger documents is the unnecessary sort operation.
>
> Chris Trengove

Thanks for the information, and for the analysis.  Could you please
open a bug report (enhancement request) for this so it doesn't get
overlooked?  Thanks -

Bill



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]