Re: [xslt] XSLT compilation is slow for big XSLTs



On Thu, Aug 20, 2015 at 07:21:50PM +0200, Christian Ceelen wrote:
Hi,

Ok. I think i found the answer by trial and error myself. I was not able to
look up the named templates in the hash table for the (pre-) compiled
templates.

So i followed Nick's suggest and came up with this first draft:
   https://github.com/GNOME/libxslt/pull/1

  So I applied the patch, this looks reasonable, we need to add to the
end of the structure, to minimize risk of ABI breakage. Should change
the 1024 value that Nick pointed out.

I tested locally and didn't get any surprises. I also created a simple test
with 2 named templates that have the same name to test for collision
detection. The new code behaves in that case as the previous one. But so
far I have no idea yet where and how to add it to the tests in the test
suite as this is a test for failure detection.

Looking at a second glance at the imports.c and xsltFindTemplate i think it
would be worthwhile to add the new function i created for lookup in the
hash table in the file imports.c and try to encapsulate the behavior there.
Do you think i should refactor the code in that direction?

  That's where things started to go wrong, I modified xsltFindTemplate()
to look in the namedTemplatesHash instead of digging the templates list,
patch is attached, and that raised a number of errors in the regression tests.
So it seems to me that namedTemplatesHash isn't initialized with all the
named templates from the stylesheet and hence the collision detection
is also likely wrong even if it passed an initial simple test.

  So I think we need to give it more scrutiny, i.e. each time a template is
added to style->templates we need to make sure we also add it to
style->namedTemplatesHash.


Best regards,

Christian


On Thu, Aug 20, 2015 at 10:28 AM, Christian Ceelen <
christian ceelen gmail com> wrote:

Hi Nick,

Thank you very much for the answer. That sounds like a very good idea. Yet
would the validation still be correct, if the hashtable is local to the
stylesheet in which the template was defined? I was wondering what would
happen if the same template name is defined in two files, where one of the
files then imports or includes the other. Concerning the imports i think
the standard is clear, the defined template has lower preference, so having
two with the same name sounds legit to me. But what happens when the file
is included?
My current understanding is that real validation requires to walk the
include tree for each template name. Would that be correctly replicating
the existing behavior?

Btw. In xsltInernals.h:L1514 i found in struct _xsltStylesheet:
void *templatesHash;
Is this table already containing what we want?

Best regards,

Christian

On Wed, Aug 19, 2015 at 1:00 PM, Nick Wellnhofer <wellnhofer aevum de>
wrote:

On 19/08/2015 10:47, Christian Ceelen wrote:

At the moment i do not have a deep understanding of the libXSLT code. My
current guess is, that this point at which the template name is
validated to
be unique in the stylesheet. The current algorithm seems to be walking
linearly through the list of all already created templates and is
comparing
the name with each. Given that the compilation process is walking
through all
templates this loop means that we have an O(n^2) algorithm (with n being
the
number of template instances in the stylesheet to compile).
The huge number of templates in my XSLTs are just so far over the edge,
that
the compilation takes 35s. I ran a test in which i skipped the loop. This
reduced the compilation time to below 2.5 s.

Would anyone let me know if i have understood the code? What can i do to
improve the code that would get easily accepted and released? I am open
to any
kind of suggestions on what to do to speed this validation step up with
a data
structure or mechanism already existing ?


Yes, template names are verified by searching a linked list which is
O(n^2). Template lookup by name uses the same list:

    https://github.com/GNOME/libxslt/blob/master/libxslt/imports.c#L375

It shouldn't be too hard to change the code to use an xmlHashTable:

    https://github.com/GNOME/libxml2/blob/master/include/libxml/hash.h

Simply add a the hash table to struct _xsltStylesheet


https://github.com/GNOME/libxslt/blob/master/libxslt/xsltInternals.h#L1509

add some initialization and cleanup, then use xmlHashAddEntry2 and
xmlHashLookup2 with the template's name and namespace URI. This should make
name verification O(n) overall and lookup of a single template by name O(1).

What is now the developer or working repository for libxslt?
I found the git repository on git.gnome.org <http://git.gnome.org>, the
GNOME/libxslt on github, plus several forks which seem to be working
repos.
Can you point me please to the repo against which i should use for
further
investigation or patches?


The official repo is the one at git.gnome.org:

    https://git.gnome.org/browse/libxslt/

Nick

_______________________________________________
xslt mailing list, project page http://xmlsoft.org/XSLT/
xslt gnome org
https://mail.gnome.org/mailman/listinfo/xslt




_______________________________________________
xslt mailing list, project page http://xmlsoft.org/XSLT/
xslt gnome org
https://mail.gnome.org/mailman/listinfo/xslt


-- 
Daniel Veillard      | Open Source and Standards, Red Hat
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/

Attachment: xsltFindTemplate.patch
Description: Text document



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]