Re: [xml-bindings]Memoization

From: Gary Benson <gary inauspicious org>
To: xml-bindings gnome org
Subject: Re: [xml-bindings]Memoization
Date: Thu, 22 Aug 2002 15:51:28 +0100

On Thu, Aug 22, 2002 at 08:13:21AM -0400, Daniel Veillard wrote:

>   Hum, there is no such mechanism, the only way would be to add an
> interface to libxslt to plug-in a document cache. Note that for a single
> transformation all the document() result are cached (this is actually
> required by the XSLT spec), but there is no mechanism for a more global
> cache. Could make sense, could be very small change if one doesn't
> try to implement a default cache in libxslt itself.

I'm not trying to cache the results of transformations: each one only
happens once in my application anyway so caching would not help.  The
problem I have is that I wrote the thing to be really easy to develop
for first and foremost, and threw everything else out of the window.

The app in question is used to generate static HTML for inauspicious.org
and the problem is mainly that each of the 250-odd diary entries
requires seven files to be parsed: the huge (~350k) file containing all
the diary entries, the XSLT file that converts one day's worth of
entries into i.o markup and the XSLT file that converts i.o markup to
HTML. The second XSLT file includes three more files, one of which
parses the file full of diary entries again to create the calendar.  I
could speed it up by rearranging it all with speed as a priority but
that would be a lot of work and if I can simply cache documents that
come from xmlParseFile() then that would work a treat.

Any ideas on where to start?

Cheers,
Gary

> > On Thu, Aug 08, 2002 at 02:34:07AM +0100, Gary Benson wrote:
> > 
> > > Hiya,
> > > 
> > > I have an application which uses the Python bindings, and I've been
> > > looking at ways to speed it up. The profiler shows that about three
> > > quarters of the time is spent parsing XML files and there are _a lot_ of
> > > repeat parsings going on.
> > > 
> > > Now, it occurred to me that I could speed it up tremendously if I could
> > > somehow cache the parsed xmlDocs, but it isn't possible at the Python
> > > level since a lot of the repeat parsings are files loaded in the XSLT
> > > stylesheets via document().  To be truly effective I'd have to cache
> > > them within libxml itself.  Ideally, you'd call xmlParsedDocsCache(1) to
> > > enable the cache, such that the normal behaviour is unchanged.
> > > 
> > > I'd appreciate some insight from those who know libxml best (so probably
> > > Daniel :)).  Is what I'm talking about possible, and if it is where
> > > would be a nice place to implement it.  Obviously I'd like to be able to
> > > call xmlParsedDocsCache() from Python code, but how do I make that
> > > happen with the automated generator stuff? Libxml is a big bugger and
> > > it's hard to know where to start :)
> > > 
> > > Thanks in advance,
> > > Gary
> > > 
> > > [ gary inauspicious org ][ GnuPG 85A8F78B ][ http://inauspicious.org/ ]

Follow-Ups:
- Re: [xml-bindings]Memoization
  - From: Daniel Veillard

References:
- [xml-bindings]Memoization
  - From: Gary Benson
- [xml-bindings]Memoization
  - From: Gary Benson
- Re: [xml-bindings]Memoization
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]