Re: [xml] libxml2 v2.5.7 appears much slower than libxml2 v2.4.24

On Mon, Aug 11, 2003 at 07:05:43PM +0100, kevin ruscoe ubs com wrote:
Hi all

* My toolchain is as follows:
docbook-dtd           4.2
docbook-xsl           1.61.3
libxml2 (old) 2.4.24
libxml2 (new) 2.5.7
Linux                 2.4.9-e.24enterprise

  okay, nothing surprising.

* libxml2 v2.5.7 seems dramatically (about 20x) slower than libxml2 v2.4.24
in my environment:

$ time $PD/libxml2/2.4.24/bin/xmllint --noout --postvalid --xinclude update_schema.dbk
$PD/libxml2/2.4.24/bin/xmllint --noout --postvalid --xinclude   2.75s user 0.71s system 70% cpu 4.895 total

$ time $PD/libxml2/2.5.7/bin/xmllint --noout --postvalid --xinclude update_schema.dbk
$PD/libxml2/2.5.7/bin/xmllint --noout --postvalid --xinclude update_schema.db  73.19s user 7.62s system 83% 
cpu 1:36.34 total

  Ouch, is that on the same box? There is something going on, 83% cpu
for 20x more time is absolutely not normal.
  The very first thing to do would be to run with the --timing flag to
try to identify if the problem is in the XInclude processing or the
DTD post validation or split between them.

* Here is the strace -c output:

$ strace -c -f $PD/libxml2/2.4.24/bin/xmllint --noout --postvalid --xinclude update_schema.dbk
["/sbcimp/run/pd/libxml2/2.4.24/bin/xmllint", "--noout", "--postvalid", "--xinclude", "update_schema.dbk"], 
[/* 233 vars */]) = 0
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 64.40    0.492642         411      1198        49 stat64
 18.76    0.143504         219       654        65 open
 10.28    0.078624          32      2487           read


% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 70.67   10.850226         330     32840       652 stat64
 21.92    3.364601         204     16532        65 open
  4.51    0.692042          10     69912           read

* The problem seems to be that v2.5.7 opens files from the DTD package far more often than does the 
previous version.  For example:

$ strace -e trace=stat64 -o strace.out -f $PD/libxml2/2.4.24/bin/xmllint --noout --postvalid --xinclude 
$ grep /home/ruscoekm/my_tree/links/CORE/links/docbook-dtd/ent/iso-pub.ent strace.out|wc -l

$ strace -e trace=stat64 -o strace.out -f $PD/libxml2/2.5.7/bin/xmllint --noout --postvalid --xinclude 
$ grep /home/ruscoekm/my_tree/links/CORE/links/docbook-dtd/ent/iso-pub.ent strace.out|wc -l

Any thoughts on what is happening here.  Is this a known issue or am I doing something wrong?

  Okay I have a guess:
    1/ your XInclude hierarchy is large (approx 30 files).
    2/ all of the Included parts reference the DocBook DOCTYPE

XInclude requires to coalesce IDs, which mean parsing of each the included
parts need to be done with DTD loaded. The older version of XInclude wasn't
doing this and hence wasn't compliant. The new one does it, is fully compliant,
but now load the DocBook DTD in memory for each included instance.

  So this may be normal, though more than one minute of processing for 
a document sounds horribly long, even if loading 30 times the docbook DTD
I think I timed this below 100ms on my machine (some time ago), and 
30x100ms is 3 seconds, not 60 , there is still something really strange.

  I would probably need a (private/obfuscated if needed) copy of the data 
to make a cleaner analysis.

  One obvious way to try to speed this up is to reuse a DTD pool for the
set of XIncluded documents. This could bring back performances to close
to their initial values. But I would need to confirm the problem first.


Daniel Veillard      | Red Hat Network
veillard redhat com  | libxml GNOME XML XSLT toolkit | Rpmfind RPM search engine

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]