[xml] libxml2 performance
- From: Gary Pennington <Gary Pennington sun com>
- To: libxml2 <xml gnome org>
- Subject: [xml] libxml2 performance
- Date: Tue, 28 May 2002 11:02:45 +0100
Hi,
I know that performance is becoming more of a concern for people, or at
least more people are saying it is. A month or so ago, I started looking
at ways to enhance performance inside libxml2.
Most applications can stand some performance tuning and the question is
how much tuning can we do without breaking the existing interfaces. I've
been focussing my attention on simply trying to improve the existing
algorithms and functional behaviour. In my opinion, that involves least
risk to the correctness and behaviour of the library.
Here's some profiling data that I've collected. This data was collected
on libxml2-2.4.21:
Build Details
Compiler
cc: Sun WorkShop 6 update 1 C 5.2 2000/09/11
OS
SunOS calvin 5.8 Generic_108528-13 sun4u sparc SUNW,Ultra-60
cc -X03
libxml2 options
--with-threads --without-python
Profiling Data
% cumulative self self total
time seconds seconds calls ms/call ms/call name
6.9 0.24 0.24 xmlParseTryOrFinish
[11]
3.5 0.36 0.12 846181 0.00 0.00
xmlParserInputBufferGrow [18]
3.5 0.48 0.12 20481 0.01 0.01 xmlParseAttValue [14]
3.2 0.59 0.11 startElement [15]
2.6 0.68 0.09 2235327 0.00 0.00 xmlCurrentChar [21]
2.0 0.75 0.07 1689361 0.00 0.00 xmlStrEqual [30]
2.0 0.82 0.07 1039422 0.00 0.00 xmlNextChar
<cycle 1> [27]
2.0 0.89 0.07
xmlAttrSerializeContent [29]
1.7 0.95 0.06 1090232 0.00 0.00 xmlParserInputGrow [16]
1.7 1.01 0.06 215886 0.00 0.00 xmlParseChunk [33]
(I've just extracted the top 10 functions from the profiling list,
similarly for the following)
I modified the xmlParserInputBufferGrow function to remove unnecessary
copies, however you'll note that this didn't have a great deal of
impact. However, it was a fairly simple change to make, so I did it
anyway and found a small reduction in time (not reflected in the
statistics below).
% cumulative self self total
time seconds seconds calls ms/call ms/call name
6.0 0.23 0.23 xmlParseTryOrFinish
[15]
5.7 0.45 0.22 2235327 0.00 0.00 xmlCurrentChar [17]
3.9 0.60 0.15 846181 0.00 0.00
xmlParserInputBufferGrow [20]
2.9 0.71 0.11 1090232 0.00 0.00 xmlParserInputGrow [14]
2.9 0.82 0.11 20481 0.01 0.02 xmlParseAttValue [13]
2.3 0.91 0.09 1039422 0.00 0.00 xmlNextChar
<cycle 1> [26]
1.8 0.98 0.07 1689361 0.00 0.00 xmlStrEqual [41]
1.8 1.05 0.07 28541 0.00 0.00
xmlEncodeEntitiesReentrant [40]
1.6 1.11 0.06 456261 0.00 0.00 xmlStrdup [49]
1.6 1.17 0.06 58950 0.00 0.00 xmlParseCharData [39]
Since the above represents 29.1 % and 30.5% (respectively) of the total
processing time during the profiling runs (simply doing gmake tests to
generate the profiling data), I would suggest that any ideas about
reducing time in the above functions should represent a good investment
in effort.
For instance, I would imagine that most compilers would struggle to
optimize xmlParseTryOrFinish (it's just too big) and so any effort spent
breaking down this code and streamlining/decomposing it should yield
good results.
If people are interested in getting involved in a libxml2 performance
improvement exercise, then contact me directly and we can start to
decide who's going to work on what and what changes make sense. I've
only got limited time to do this work and so the more people that are
involved the better (up to a point: I guess about 5 would be the limit
for useful interaction on a sub-project like this). I can realistically
only do my testing on Solaris (and maybe some limited work on Linux) ,
so developers from other platforms would be particularly useful in
validating that changes do have a cross-platform impact.
I don't anticipate this being a high intensity project, more like a
marathon with low rates of change over a sustained period of time. I've
got a very conservative attitude when it comes to changing code for
performance benefits.
Gary
--
Gary Pennington
Solaris Kernel Development,
Sun Microsystems
Gary Pennington sun com
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]