Re: [xslt] xsltproc for very large documents?



On Fri, Jan 17, 2003 at 08:55:52AM +0100, Eric van der Vlist wrote:
> Hi Stuart,
> 
> On Fri, 2003-01-17 at 08:29, Stuart Hungerford wrote:
> 
> > I'm working with some very large (well, large by my
> > standards) XML documents of around 300MB each.  
> 
> I can't tell specifically for libxslt, but AFAIK this is a design issue
> with XSLT which is not "streamable" in that it requires some kind
> representation of the input documents in memory to work.
> 
> You might be interested by STX [1], a project started a while ago to
> define "Streaming Transformations for XML".

  Right.
More specifically for xsltproc there is two aspects of the problem
I can think of:
    - first the fact that XSLT in general requires to load the full
      document in memory (well xsltproc does, other tools tries to
      work around this) which is a limitation of the general XSLT
      processing model. This shouldn't generate atrocious performances
      unless you're swapping, i.e. the working set doesn't fit in 
      memory and the hard drive is used constantly to swap blocks in and
      out. Usually such trashing makes processing very slow, that's 
      normal, only solution is more RAM.
    - second if no trashing happens, then there might be some troubles
      which got reported once on the list about libxslt/xsltproc speed
      when sorting very large node sets. It was reported only once and
      related to very long sequences of child for examples.

  The first one I can't really fix, the second I can work on it assuming
I have some example. So could you check if your box is "trashing" i.e.
problem 1/i , if not could you provide a smaller sample of your data
and a stylesheet exhibiting the problem, with some quantification of the
time needed to process the full data.

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard@redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]