Re: [xml] extremely long xslt transformation times



On Mon, Sep 30, 2002 at 01:10:38PM +0200, Steinborn Thomas wrote:
We got some surprising results when comparing the transformation times for
files of identical size (5MB). One file transforms in 11 seconds and the
other in 283 seconds.

We did some ananlysis of the problem and found that the doc taking so long
to transform contains a node list of 22,000 elements that are all child
elements of a single element. 

We used Quantify to track down the bottleneck. The result is when applying
the template

      <xsl:apply-templates select="@*|node()"/>

a long time is spent in evaluating the XPath "@*|node()".

The compiled XPath is
SORT
      UNION
              COLLECT attrs all
              COLLECT nodes all

The call stack when evaluating is:
xmlXPathCompOpEval
      xmlXPathNodeSetSort
              xmlXPathCmpNodes

The sort operation performs worst case in this scenario since the two
collected node sets are already sorted. xmlXPathNodeSetSort uses Shell sort,
which is O(log N * N) number of compares. xmlXPathCmpNodes itself is O(N)
since it has to traverse the list of all child nodes to determine the order.
Altogether this is O(N * N * log N)

  Okay

Now we see three different ways to improve performance in this case:

1) optimize the XPath compilation not to sort since the collected node sets
are already sorted and the union will not change that

  Err, in general the union operator will force a sort, this is a 
specific case which seems hard to generalize.

2) Improve xmlXPathCmpNodes by adding an child index to xmlNode, thus
quickly comparing just the index and not traversing the list

  Changing the processeddocument format, hum no, this is not possible
at this point.

3) Avoid sorting a node set by adding a flag to the xmlXPathNodeSet
structure if the set is sorted already

  Okay but the merge operation of the union in general will require another
sort so, how do you compute that a priori

We'd like to ask the list what approache we should take for this case?

  2 sounds impossible technically, I don't want to change the format
of the in-memory representation. 1 and 3 are okay at an architectural
level but I don't see how to do this easilly in practice. But if you
know how to do this, feel free to work on it, I take patches !

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]