Re: [xml] libxml2 v2.5.7 appears much slower than libxml2 v2.4.24
- From: Daniel Veillard <veillard redhat com>
- To: kevin ruscoe ubs com
- Cc: xml gnome org
- Subject: Re: [xml] libxml2 v2.5.7 appears much slower than libxml2 v2.4.24
- Date: Mon, 11 Aug 2003 15:53:27 -0400
On Mon, Aug 11, 2003 at 07:05:43PM +0100, kevin ruscoe ubs com wrote:
Hi all
* My toolchain is as follows:
docbook-dtd 4.2
docbook-xsl 1.61.3
libxml2 (old) 2.4.24
libxml2 (new) 2.5.7
Linux 2.4.9-e.24enterprise
okay, nothing surprising.
* libxml2 v2.5.7 seems dramatically (about 20x) slower than libxml2 v2.4.24
in my environment:
$ time $PD/libxml2/2.4.24/bin/xmllint --noout --postvalid --xinclude update_schema.dbk
$PD/libxml2/2.4.24/bin/xmllint --noout --postvalid --xinclude 2.75s user 0.71s system 70% cpu 4.895 total
$ time $PD/libxml2/2.5.7/bin/xmllint --noout --postvalid --xinclude update_schema.dbk
$PD/libxml2/2.5.7/bin/xmllint --noout --postvalid --xinclude update_schema.db 73.19s user 7.62s system 83%
cpu 1:36.34 total
Ouch, is that on the same box? There is something going on, 83% cpu
for 20x more time is absolutely not normal.
The very first thing to do would be to run with the --timing flag to
try to identify if the problem is in the XInclude processing or the
DTD post validation or split between them.
* Here is the strace -c output:
$ strace -c -f $PD/libxml2/2.4.24/bin/xmllint --noout --postvalid --xinclude update_schema.dbk
execve("/sbcimp/run/pd/libxml2/2.4.24/bin/xmllint",
["/sbcimp/run/pd/libxml2/2.4.24/bin/xmllint", "--noout", "--postvalid", "--xinclude", "update_schema.dbk"],
[/* 233 vars */]) = 0
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
64.40 0.492642 411 1198 49 stat64
18.76 0.143504 219 654 65 open
10.28 0.078624 32 2487 read
[...]
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
70.67 10.850226 330 32840 652 stat64
21.92 3.364601 204 16532 65 open
4.51 0.692042 10 69912 read
* The problem seems to be that v2.5.7 opens files from the DTD package far more often than does the
previous version. For example:
$ strace -e trace=stat64 -o strace.out -f $PD/libxml2/2.4.24/bin/xmllint --noout --postvalid --xinclude
update_schema.dbk
$ grep /home/ruscoekm/my_tree/links/CORE/links/docbook-dtd/ent/iso-pub.ent strace.out|wc -l
40
$ strace -e trace=stat64 -o strace.out -f $PD/libxml2/2.5.7/bin/xmllint --noout --postvalid --xinclude
update_schema.dbk
$ grep /home/ruscoekm/my_tree/links/CORE/links/docbook-dtd/ent/iso-pub.ent strace.out|wc -l
1162
Any thoughts on what is happening here. Is this a known issue or am I doing something wrong?
Okay I have a guess:
1/ your XInclude hierarchy is large (approx 30 files).
2/ all of the Included parts reference the DocBook DOCTYPE
XInclude requires to coalesce IDs, which mean parsing of each the included
parts need to be done with DTD loaded. The older version of XInclude wasn't
doing this and hence wasn't compliant. The new one does it, is fully compliant,
but now load the DocBook DTD in memory for each included instance.
So this may be normal, though more than one minute of processing for
a document sounds horribly long, even if loading 30 times the docbook DTD
I think I timed this below 100ms on my machine (some time ago), and
30x100ms is 3 seconds, not 60 , there is still something really strange.
I would probably need a (private/obfuscated if needed) copy of the data
to make a cleaner analysis.
One obvious way to try to speed this up is to reuse a DTD pool for the
set of XIncluded documents. This could bring back performances to close
to their initial values. But I would need to confirm the problem first.
Daniel
--
Daniel Veillard | Red Hat Network https://rhn.redhat.com/
veillard redhat com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]