Re: [xml] Deep-copy of xsltStylesheetPtr?



On Sat, Sep 18, 2004 at 04:17:01PM -0700, Rasmus Lerdorf wrote:
On Sat, 18 Sep 2004, Daniel Veillard wrote:
Well sure, caching the result is always best :-) , it's unclear from
your test how much the stylesheet load/compilation is, trying to use
a read-only stylesheet from the cache at least as an experiment would be
interesting too.

Some direct measurements:

Loading data.xml doc  0.621080398559 ms.
Loading data.xsl doc  0.472068786621 ms.
Compiling stylesheet  0.223159790039 ms.
Applying stylesheet   0.448942184448 ms.
dump_mem              0.166893005371 ms.

And if I make the data.xml file 50k instead of 1k while keeping data.xsl
the same I get:

Loading data.xml doc  5.368947982788 ms.
Loading data.xsl doc  0.576972961425 ms.
Compiling stylesheet  0.231027603149 ms.
Applying stylesheet  12.245893478394 ms.
dump_mem              9.248018264775 ms.

-rw-r--r--  1 root  wheel  1142  Jul 19 17:17 data.xml
-rw-r--r--  1 root  wheel   831  Jul 19 17:17 data.xsl
-rw-r--r--  1 root  wheel  51367 Sep 18 16:01 data_big.xml

We are dealing with rather small numbers here so they float a bit, but
loading the docs is obviously dependant on the size of them and it looks
like, at least for a small stylesheet, compiling it is about half the time
of loading its doc.

  Parsing should mostly be linear, with a non-neglectible startup time
to create and initialize the parser context on small files (with the
xmlReadxxx APIs parsers can be reused for multiple parsing, if you're
really into performance for small files, having a pool of ready to use
parsers might help.)
  For an experiment I enhanced xmllint to show timing around --copy
operation, I get (relatively fast proc but compiled in debug mode):

paphio:~/XML -> ls -l db5000.xml
-rw-rw-r--    1 veillard www       1004640 Sep 19 13:42 db5000.xml
paphio:~/XML -> xmllint --copy --timing --noout db5000.xml
Parsing took 117 ms
Copying took 124 ms
Freeing original took 45 ms
Freeing took 79 ms
paphio:~/XML ->

  Copying a document is more expensive than reparsing ... ahum, I may need
to do a bit of valgind/kcachegrind to try to fix this :-\
  Freeing the copy takes more time too because it's not using a dictionnary
there is more actual free() needed. I'm afraid that at the moment, keeping
preparsed tree in memory is a really bad option, it increase VM pressure
and at leat in Linux with the unified memory handling, it's better to have
1MB of data in the buffer cache than 8MB of parsed data in memory which will
take more time to get access to if you need to copy.

Daniel

-- 
Daniel Veillard      | Red Hat Desktop team http://redhat.com/
veillard redhat com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]