Re: [xslt] Isn't it time to make libxslt multi-threaded?



On Thu, Jul 26, 2012 at 01:59:38AM +0400, Дмитрий Грибов wrote:
> Hi, folks!
> 
> It is not about bugs or faq and it's going to be a long story. Yet I
> believe this really worth it.
> 
> We are using xslt for web-serving for, I believe, 10+ years. For today we
> have quite a dedicated and robust templates trees that processes a huge
> and dedicated xml with libxslt. All is nice and fine, but average
> transformation time is 0.15s. And it's turning from sad to unacceptable.
> Processors nowadays have many cores, but brute-force speed does not look
> really promising.
> 
> As I was searching for a way to fix the problem I considered some options:
> 1. Simplify layout and templates. Unnacceptable for many reasons.
> 
> 2. Abandon xslt. As I believe, the speed issue is fixable, in any other
> way xslt is a complete winner - so not an option.
> 
> 3. Partial transformation + application-level assembling and caching of
> output fragments. This is good, but still not good enough.
> 
> 4. Multithreaded async partial transformation + application-level caching
> + application level assembly. That sounds promising (and looks fearsome).
> 
> 
> So I stuck to option 3.
> And then...
> 
> As long as I was digging the idea, I realized that I am going to spend
> tonns of time to emulate from outside the xslt-engine quite a simple and
> native for xslt itself concept: asynchronous transformation. The threading
> magic must reside within libxslt! That becomes obvious quite soon, as you
> start thinking. And that must be quite easy to do now, as libxml is
> thread-safe already.
> 
> Isn’t it a time to do some threading?

  Well well well ... I discovered that thread only yesterday, sorry I'm
behind. I guess I still need to say something, so a few quick reactions:
  - yes some part of xslt evaluation could be parallelized, that's with
    a theorical 1.0 perfect model, with extensions I'm always worried of
    uncontrolled side effect in user provided extensions
  - I'm a C coder, and I know it's *hard* to parallelize code, we would
    have to keep a pool of threads, check from the hardware how many it
    makes sense to run in parallel, and bullet proof the code. It's far
    from trivial
  - I have close to zero time to spend at the moment on libxslt beside
    trying to fix the worse bugs I'm being pointed at >:->
  - did the profiling on your test cases show something "interesting" ?
    see xsltproc --norman (or --profile :-)
  - why was libxslt fast enough 10 years ago and looking slow now, that
    to me makes little sense, did your data/xslt grew out of control ?
  - what is the variablility factor between two consecutive request,
    you use the same document but different stylesheets ? or the
    stylesheets are the same but the document change ?

 My coding experience is that optimizing blindly is usually a good
way to spend a lot of time without much result. When I say blindly
I think that assuming your own XSLT would parallelize well (is there big
for-each applied on the top elements, or a zillion of small ones on small
leaves elements ?) and that's actually where we could gain speed,
that's not something I would bet to much of my own time without actual
data :-)
That could also break the self balancing optimization of libxml2 XPath
(a bit of query on-the-fly optimization which can be quite effective
but must be disabled for parallel processing on a stylesheet).

  So I'm afraid I can't come tomorrow with a patch maggically solving
your speed problems, especially without a better understanding of
where the time is actually spent.

  Usually people cache XSLT output when they need fast answer (and
to lower CPU usage) IIRC the website of Die Welt and The Register just
do that, but apparently you have already looked at that option
thoroughtly. I'm still wondering, you were suggesting something along
  <xsl:cache> as maybe some internal extension for caching, could
you explain ?

  So even if that thread was quite long I'm still a bit puzzled
and wondering if there is actually anything I could do considering
I have (again) very little time for libxslt :-)

Daniel

-- 
Daniel Veillard      | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
daniel veillard com  | Rpmfind RPM search engine http://rpmfind.net/
http://veillard.com/ | virtualization library  http://libvirt.org/


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]