*From*: Daniel Veillard <veillard redhat com>*To*: Steinborn Thomas <Thomas Steinborn erl8 siemens de>*Cc*: "'libxml Mailing Liste'" <xml gnome org>*Subject*: Re: [xml] extremely long xslt transformation times*Date*: Mon, 30 Sep 2002 07:45:02 -0400

On Mon, Sep 30, 2002 at 01:10:38PM +0200, Steinborn Thomas wrote:

We got some surprising results when comparing the transformation times for files of identical size (5MB). One file transforms in 11 seconds and the other in 283 seconds. We did some ananlysis of the problem and found that the doc taking so long to transform contains a node list of 22,000 elements that are all child elements of a single element. We used Quantify to track down the bottleneck. The result is when applying the template <xsl:apply-templates select="@*|node()"/> a long time is spent in evaluating the XPath "@*|node()". The compiled XPath is SORT UNION COLLECT attrs all COLLECT nodes all The call stack when evaluating is: xmlXPathCompOpEval xmlXPathNodeSetSort xmlXPathCmpNodes The sort operation performs worst case in this scenario since the two collected node sets are already sorted. xmlXPathNodeSetSort uses Shell sort, which is O(log N * N) number of compares. xmlXPathCmpNodes itself is O(N) since it has to traverse the list of all child nodes to determine the order. Altogether this is O(N * N * log N)

Okay

Now we see three different ways to improve performance in this case: 1) optimize the XPath compilation not to sort since the collected node sets are already sorted and the union will not change that

Err, in general the union operator will force a sort, this is a specific case which seems hard to generalize.

2) Improve xmlXPathCmpNodes by adding an child index to xmlNode, thus quickly comparing just the index and not traversing the list

Changing the processeddocument format, hum no, this is not possible at this point.

3) Avoid sorting a node set by adding a flag to the xmlXPathNodeSet structure if the set is sorted already

Okay but the merge operation of the union in general will require another sort so, how do you compute that a priori

We'd like to ask the list what approache we should take for this case?

2 sounds impossible technically, I don't want to change the format of the in-memory representation. 1 and 3 are okay at an architectural level but I don't see how to do this easilly in practice. But if you know how to do this, feel free to work on it, I take patches ! Daniel -- Daniel Veillard | Red Hat Network https://rhn.redhat.com/ veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

**References**:**[xml] extremely long xslt transformation times***From:*Steinborn Thomas

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]