Re: [xslt] Bug Fix doubles the XSL preprocessing time



On Sun, Apr 09, 2006 at 06:40:23AM -0700, Jerome Pesenti wrote:
> Greetings,
> I noticed that the processing time of my large
> stylesheets  (i.e., running xsltParseStylesheetDoc)
> almost doubled between 1.1.14 and 1.1.15.
> 
> I was able to trace the change to the following fix:
> 
> http://cvs.gnome.org/viewcvs/libxslt/libxslt/xsltutils.c?r1=1.89&r2=1.90
> 
> I am not very familiar with the actual bug fixed
> 
> libxslt/xsltutils.c: fixed a bug when size of
> xmlXPathContext
>   changes, uses the libxml2 alloc and dealloc
> functions instead.

  Very simple. If the xmlXPathContext structures changes libxslt
would likely *crash* because it relied on the size of the structure
it was compiled against, and not the size expected by the used libxml2 
version

> but I can see how adding xmlXPathNewContext for each
> xpath compiled can be expensive....
> 
> Here is a suggested patch which seems to fix the
> problem:
[...]
> +    xctxt = (xmlXPathContextPtr)
> alloca(sizeof(xmlXPathContext));
> +    memset(xctxt, 0 , (size_t)
> sizeof(xmlXPathContext));
> +

  The suggested fix is not acceptable as this create the bug again.
You could try to force the reuse of the same xmlXPathContextPtr for
all the compilations instead of reallocating a new one each time,
that could work, but the current patch is not acceptable.

> Also, on a similar topic, is the precomputation of the
> whole stylesheet required to do a transformation?

  Yes, for performances (see below), and also because you need to
reassemble the various includes and import, i.e. all entities composing
the stylesheet must be parsed.

> Let's say that I have a very large stylesheet and only
> 10% of it is used during a transformation (for
> example, because certain modes are never called).
> Currently all the XPaths are precompiled even though
> 90% of them will never be used. Is there a way to
> compile the XPaths *only* when they are used? I tried
> to set nopreproc to 0 but that didn't seem to work.

  If you do that then the stylesheet are modified by transformations,
they are not shareable between parallel transformations, this is a 
huge regression (or you need to lock/unlock each time you check the
stylesheet content in shared environment which would kill performances
too).

  Usually the cost of compilation is small compared to transform
time. Also one key of high performance processing is the ability
to reuse compiled stylesheets over and over and even in parallel.

  Seem your processing is based on one compilation per transform,
and such an approach is clearly the worse you can get performance
wise if your stylesheets are huge.

Daniel

-- 
Daniel Veillard      | Red Hat http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]