[xml] libxml2 performance



Hi,

I know that performance is becoming more of a concern for people, or at least more people are saying it is. A month or so ago, I started looking at ways to enhance performance inside libxml2.

Most applications can stand some performance tuning and the question is how much tuning can we do without breaking the existing interfaces. I've been focussing my attention on simply trying to improve the existing algorithms and functional behaviour. In my opinion, that involves least risk to the correctness and behaviour of the library.

Here's some profiling data that I've collected. This data was collected on libxml2-2.4.21:

Build Details
Compiler
cc: Sun WorkShop 6 update 1 C 5.2 2000/09/11
OS
SunOS calvin 5.8 Generic_108528-13 sun4u sparc SUNW,Ultra-60
cc -X03
libxml2 options
--with-threads --without-python

Profiling Data

  %  cumulative    self        self    total
time    seconds      seconds    calls  ms/call  ms/call name
6.9 0.24 0.24 xmlParseTryOrFinish [11] 3.5 0.36 0.12 846181 0.00 0.00 xmlParserInputBufferGrow [18]
 3.5        0.48    0.12    20481    0.01    0.01  xmlParseAttValue [14]
 3.2        0.59    0.11                            startElement [15]
 2.6        0.68    0.09  2235327    0.00    0.00  xmlCurrentChar [21]
 2.0        0.75    0.07  1689361    0.00    0.00  xmlStrEqual [30]
2.0 0.82 0.07 1039422 0.00 0.00 xmlNextChar <cycle 1> [27] 2.0 0.89 0.07 xmlAttrSerializeContent [29]
 1.7        0.95    0.06  1090232    0.00    0.00  xmlParserInputGrow [16]
 1.7        1.01    0.06   215886    0.00    0.00  xmlParseChunk [33]

(I've just extracted the top 10 functions from the profiling list, similarly for the following)

I modified the xmlParserInputBufferGrow function to remove unnecessary copies, however you'll note that this didn't have a great deal of impact. However, it was a fairly simple change to make, so I did it anyway and found a small reduction in time (not reflected in the statistics below).

  %  cumulative    self        self    total
time    seconds      seconds    calls  ms/call  ms/call name
6.0 0.23 0.23 xmlParseTryOrFinish [15]
 5.7        0.45    0.22  2235327    0.00    0.00  xmlCurrentChar [17]
3.9 0.60 0.15 846181 0.00 0.00 xmlParserInputBufferGrow [20]
 2.9        0.71    0.11  1090232    0.00    0.00  xmlParserInputGrow [14]
 2.9        0.82    0.11    20481    0.01    0.02  xmlParseAttValue [13]
2.3 0.91 0.09 1039422 0.00 0.00 xmlNextChar <cycle 1> [26]
 1.8        0.98    0.07  1689361    0.00    0.00  xmlStrEqual [41]
1.8 1.05 0.07 28541 0.00 0.00 xmlEncodeEntitiesReentrant [40]
 1.6        1.11    0.06   456261    0.00    0.00  xmlStrdup [49]
 1.6        1.17    0.06    58950    0.00    0.00  xmlParseCharData [39]

Since the above represents 29.1 % and 30.5% (respectively) of the total processing time during the profiling runs (simply doing gmake tests to generate the profiling data), I would suggest that any ideas about reducing time in the above functions should represent a good investment in effort.

For instance, I would imagine that most compilers would struggle to optimize xmlParseTryOrFinish (it's just too big) and so any effort spent breaking down this code and streamlining/decomposing it should yield good results.

If people are interested in getting involved in a libxml2 performance improvement exercise, then contact me directly and we can start to decide who's going to work on what and what changes make sense. I've only got limited time to do this work and so the more people that are involved the better (up to a point: I guess about 5 would be the limit for useful interaction on a sub-project like this). I can realistically only do my testing on Solaris (and maybe some limited work on Linux) , so developers from other platforms would be particularly useful in validating that changes do have a cross-platform impact.

I don't anticipate this being a high intensity project, more like a marathon with low rates of change over a sustained period of time. I've got a very conservative attitude when it comes to changing code for performance benefits.

Gary

--
Gary Pennington
Solaris Kernel Development,
Sun Microsystems
Gary Pennington sun com





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]