Re: [xslt] XSLT compilation is slow for big XSLTs



Hi,

No worries. I'm pretty sure this won't be forgotten in the long run. 
Let me know about your ideas on how to improve the code.

Cheers,
Christian

On Mon, Sep 14, 2015 at 3:38 AM, Daniel Veillard <veillard redhat com> wrote:
On Mon, Sep 07, 2015 at 09:47:40AM +0200, Christian Ceelen wrote:
> Hi,
>
> Did anyone find the time to look into this or test this?

  Sorry Christian,

not forgotten but not done yet, will try this week !

  thanks,

Daniel

> Cheers,
> Christian
>
> On Mon, Aug 31, 2015 at 11:47 AM, Christian Ceelen <
> christian ceelen gmail com> wrote:
>
> > Hi,
> >
> > i think i no know what i did wrong while filling the hash table. And yes.
> > My bad ... :
> >
> > xslt.c: 5448
> > + xmlHashAddEntry2(style->namedTemplatesHash, ret->name, ret->nameURI,
> > template);
> >
> > This adds a pointer to the xmlNode in the parsed stylesheet to the hash
> > table. For just detecting collisions that seems to be ok.
> > But it is utterly useless for using it in xsltFindTemplate in 'import.c'
> > as that expects a pointer to the compiled template struct 'xsltTemplate'.
> > So my initial patch for reusing it doing a bad pointer cast here:
> > --- a/libxslt/imports.c
> > +++ b/libxslt/imports.c
> > @@ -400,19 +400,12 @@ xsltFindTemplate(xsltTransformContextPtr ctxt, const
> > xmlChar *name,
> >         return(NULL);
> >      style = ctxt->style;
> >      while (style != NULL) {
> > -       cur = style->templates;
> > -       while (cur != NULL) {
> > -           if (xmlStrEqual(name, cur->name)) {
> > -               if (((nameURI == NULL) && (cur->nameURI == NULL)) ||
> > -                   ((nameURI != NULL) && (cur->nameURI != NULL) &&
> > -                    (xmlStrEqual(nameURI, cur->nameURI)))) {
> > -                   return(cur);
> > -               }
> > -           }
> > -           cur = cur->next;
> > -       }
> > -
> > -       style = xsltNextImport(style);
> > +        if(style->namedTemplatesHash != NULL){
> > +            cur = (xsltTemplatePtr)
> > xmlHashLookup2(style->namedTemplatesHash, name, nameURI);
> > +            if(cur != NULL)
> > +                return (cur);
> > +        }
> > +        style = xsltNextImport(style);
> >      }
> >      return(NULL);
> >  }
> >
> > This explains why it was not working as i had hoped. The hash table
> > returns the xmlNodePtr instead of the xsltTemplatePtr expected by
> > xsltFindTemplate. The patch Daniel does the same, it expects
> > xsltTemplatePtr.
> >
> > So changing in xslt.c: 5448
> > + xmlHashAddEntry2(style->namedTemplatesHash, ret->name, ret->nameURI,
> > template);
> > to
> > + xmlHashAddEntry2(style->namedTemplatesHash, ret->name, ret->nameURI,
> > ret);
> > Makes it possible to reuse the hash table for the named templates also in
> > xsltFindTemplate.
> >
> > I pushed the updated changes to github and attached a clean patch to the
> > email. It is created against 1.1.28.
> >
> > The patch adds the hash table initialization and population to
> > xsltAddTemplate (pattern.c), sets the initial size to a more reasonable 8
> > entries, reuses it in xsltFindTemplate (imports.c), and adds a test case to
> > check that an error is emitted during compilation for template names that
> > are not unique.
> >
> > So far i haven't replaced the helper functions as Nico suggested, as i was
> > thinking of moving the look up code to imports.c and share part of a lookup
> > local to a single stylesheet between xsltFindTemplate and
> > xsltParseStylesheetTemplate. I am not sure if you rather go for a small
> > code duplication or for having small helper classes. In terms of
> > performance i doubt that there is a noticeable difference.
> >
> > Best regards,
> > Christian
> >
> >
> > On Mon, Aug 31, 2015 at 8:27 AM, Daniel Veillard <veillard redhat com>
> > wrote:
> >
> >> On Thu, Aug 20, 2015 at 07:21:50PM +0200, Christian Ceelen wrote:
> >> > Hi,
> >> >
> >> > Ok. I think i found the answer by trial and error myself. I was not
> >> able to
> >> > look up the named templates in the hash table for the (pre-) compiled
> >> > templates.
> >> >
> >> > So i followed Nick's suggest and came up with this first draft:
> >> >    https://github.com/GNOME/libxslt/pull/1
> >>
> >>   So I applied the patch, this looks reasonable, we need to add to the
> >> end of the structure, to minimize risk of ABI breakage. Should change
> >> the 1024 value that Nick pointed out.
> >>
> >> > I tested locally and didn't get any surprises. I also created a simple
> >> test
> >> > with 2 named templates that have the same name to test for collision
> >> > detection. The new code behaves in that case as the previous one. But so
> >> > far I have no idea yet where and how to add it to the tests in the test
> >> > suite as this is a test for failure detection.
> >> >
> >> > Looking at a second glance at the imports.c and xsltFindTemplate i
> >> think it
> >> > would be worthwhile to add the new function i created for lookup in the
> >> > hash table in the file imports.c and try to encapsulate the behavior
> >> there.
> >> > Do you think i should refactor the code in that direction?
> >>
> >>   That's where things started to go wrong, I modified xsltFindTemplate()
> >> to look in the namedTemplatesHash instead of digging the templates list,
> >> patch is attached, and that raised a number of errors in the regression
> >> tests.
> >> So it seems to me that namedTemplatesHash isn't initialized with all the
> >> named templates from the stylesheet and hence the collision detection
> >> is also likely wrong even if it passed an initial simple test.
> >>
> >>   So I think we need to give it more scrutiny, i.e. each time a template
> >> is
> >> added to style->templates we need to make sure we also add it to
> >> style->namedTemplatesHash.
> >>
> >>
> >> > Best regards,
> >> >
> >> > Christian
> >>
> >>
> >> > On Thu, Aug 20, 2015 at 10:28 AM, Christian Ceelen <
> >> > christian ceelen gmail com> wrote:
> >> >
> >> > > Hi Nick,
> >> > >
> >> > > Thank you very much for the answer. That sounds like a very good
> >> idea. Yet
> >> > > would the validation still be correct, if the hashtable is local to
> >> the
> >> > > stylesheet in which the template was defined? I was wondering what
> >> would
> >> > > happen if the same template name is defined in two files, where one
> >> of the
> >> > > files then imports or includes the other. Concerning the imports i
> >> think
> >> > > the standard is clear, the defined template has lower preference, so
> >> having
> >> > > two with the same name sounds legit to me. But what happens when the
> >> file
> >> > > is included?
> >> > > My current understanding is that real validation requires to walk the
> >> > > include tree for each template name. Would that be correctly
> >> replicating
> >> > > the existing behavior?
> >> > >
> >> > > Btw. In xsltInernals.h:L1514 i found in struct _xsltStylesheet:
> >> > > void *templatesHash;
> >> > > Is this table already containing what we want?
> >> > >
> >> > > Best regards,
> >> > >
> >> > > Christian
> >> > >
> >> > > On Wed, Aug 19, 2015 at 1:00 PM, Nick Wellnhofer <wellnhofer aevum de
> >> >
> >> > > wrote:
> >> > >
> >> > >> On 19/08/2015 10:47, Christian Ceelen wrote:
> >> > >>
> >> > >>> At the moment i do not have a deep understanding of the libXSLT
> >> code. My
> >> > >>> current guess is, that this point at which the template name is
> >> > >>> validated to
> >> > >>> be unique in the stylesheet. The current algorithm seems to be
> >> walking
> >> > >>> linearly through the list of all already created templates and is
> >> > >>> comparing
> >> > >>> the name with each. Given that the compilation process is walking
> >> > >>> through all
> >> > >>> templates this loop means that we have an O(n^2) algorithm (with n
> >> being
> >> > >>> the
> >> > >>> number of template instances in the stylesheet to compile).
> >> > >>> The huge number of templates in my XSLTs are just so far over the
> >> edge,
> >> > >>> that
> >> > >>> the compilation takes 35s. I ran a test in which i skipped the
> >> loop. This
> >> > >>> reduced the compilation time to below 2.5 s.
> >> > >>>
> >> > >>> Would anyone let me know if i have understood the code? What can i
> >> do to
> >> > >>> improve the code that would get easily accepted and released? I am
> >> open
> >> > >>> to any
> >> > >>> kind of suggestions on what to do to speed this validation step up
> >> with
> >> > >>> a data
> >> > >>> structure or mechanism already existing ?
> >> > >>>
> >> > >>
> >> > >> Yes, template names are verified by searching a linked list which is
> >> > >> O(n^2). Template lookup by name uses the same list:
> >> > >>
> >> > >>
> >> https://github.com/GNOME/libxslt/blob/master/libxslt/imports.c#L375
> >> > >>
> >> > >> It shouldn't be too hard to change the code to use an xmlHashTable:
> >> > >>
> >> > >>
> >> https://github.com/GNOME/libxml2/blob/master/include/libxml/hash.h
> >> > >>
> >> > >> Simply add a the hash table to struct _xsltStylesheet
> >> > >>
> >> > >>
> >> > >>
> >> https://github.com/GNOME/libxslt/blob/master/libxslt/xsltInternals.h#L1509
> >> > >>
> >> > >> add some initialization and cleanup, then use xmlHashAddEntry2 and
> >> > >> xmlHashLookup2 with the template's name and namespace URI. This
> >> should make
> >> > >> name verification O(n) overall and lookup of a single template by
> >> name O(1).
> >> > >>
> >> > >> What is now the developer or working repository for libxslt?
> >> > >>> I found the git repository on git.gnome.org <http://git.gnome.org>,
> >> the
> >> > >>> GNOME/libxslt on github, plus several forks which seem to be working
> >> > >>> repos.
> >> > >>> Can you point me please to the repo against which i should use for
> >> > >>> further
> >> > >>> investigation or patches?
> >> > >>>
> >> > >>
> >> > >> The official repo is the one at git.gnome.org:
> >> > >>
> >> > >>     https://git.gnome.org/browse/libxslt/
> >> > >>
> >> > >> Nick
> >> > >>
> >> > >> _______________________________________________
> >> > >> xslt mailing list, project page http://xmlsoft.org/XSLT/
> >> > >> xslt gnome org
> >> > >> https://mail.gnome.org/mailman/listinfo/xslt
> >> > >>
> >> > >
> >> > >
> >>
> >> > _______________________________________________
> >> > xslt mailing list, project page http://xmlsoft.org/XSLT/
> >> > xslt gnome org
> >> > https://mail.gnome.org/mailman/listinfo/xslt
> >>
> >>
> >> --
> >> Daniel Veillard      | Open Source and Standards, Red Hat
> >> veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
> >> http://veillard.com/ | virtualization library  http://libvirt.org/
> >>
> >
> >

> _______________________________________________
> xslt mailing list, project page http://xmlsoft.org/XSLT/
> xslt gnome org
> https://mail.gnome.org/mailman/listinfo/xslt


--
Daniel Veillard      | Open Source and Standards, Red Hat
veillard redhat com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | virtualization library  http://libvirt.org/
_______________________________________________
xslt mailing list, project page http://xmlsoft.org/XSLT/
xslt gnome org
https://mail.gnome.org/mailman/listinfo/xslt



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]