Re: [xml] libxml2 performance

From: Gary Pennington <Gary Pennington sun com>
To: Peter Jacobi <pj walter-graphtek com>
Cc: xml gnome org
Subject: Re: [xml] libxml2 performance
Date: Tue, 28 May 2002 13:38:11 +0100

Peter Jacobi wrote:

Hi Gary, All,
Thank you for your interesting performance data. A first question whicharises for me, is to what extent do you see the malloc cost on yourplatform? Is this a (large?) part of the other 69.5%?

Clarification:

The performance data was only collected for libxml2.so. I didn't collectdata about where time was spent in other libraries, because I wasspecifically looking for areas of improvement to libxml2.

The reason I did this was to try and ensure that the potentialperformance improvements that were identified would be as independant aspossible of the platform where the data was collected. I realise this isa somewhat fragile concept, since the elapsed time of any libxml2functions will depend on the underlying performance of the platformlibraries and the the actual time will depend on the performance of theOS/Hardware. But it seemed like a reasonable starting position and isone I have used successfully in the past with other performanceimprovement exercises.


Then, as I'm still struggling to get the equivalent of gmake tests
running on WIN32, I kindly ask you to test (for performance
and validity)
- removing the ctxt->token test in CUR and RAW

- dis-inline (make function calls) all but the first line of SHRINK andGROW

I'll add this to the list of things to check.

  %  cumulative    self        self    total
time    seconds      seconds    calls  ms/call  ms/call name
 5.7        0.45    0.22  2235327    0.00    0.00  xmlCurrentChar [17]
[13] 2.3 0.91 0.09 1039422 0.00 0.00 xmlNextChar
For these two candidates, I'm still thinking about replacing all
UNICODE codepoint extraction and testing with a lookup basedapproach, as in [rough sketch]:
xmlChar c0 = ctx->input->cur [0];
xmlChar c1 = ctx->input->cur [1];
xmlChar c2 = ctx->input->cur [2];

int isNameChar (xmlChar c0, xmlChar c1, xmlChar c2) {
return (table2 [c2 + table1 [c1 + table0 [c0] << 8] << 8) &bitNameChar;
}
In fact, more complicated bit shuffling may be necessary for keeping thelookup tables small, so it's not quite clear, whether it's worth the trouble.

I can't comment on this as I haven't looked at this area in detail.

If people are interested in getting involved in a libxml2 performance
improvement exercise, then contact me directly and we can start to decide
who's going to work on what and what changes make sense.
I'm definitively interested, but being on WIN32 I feel a bit toolwisechallenged.
Also, the goal of optimization may differ to some extent. By making
gmake tests the benchmark, you try to save time for quite a range ofapplications. My primary benchmark are XML files typically produced byobject forest freezing: large number of short tags and little text.

Well, the fact that you build on WIN32 is a good thing. I don't haveaccess to a WIN32 development environment at the moment.

If you can't profile, you can at least validate that library changesresult in measured improvements on your application. You can still time(wall clock) program executions.

I don't anticipate this being a high intensity project, more like amarathon with low rates of change over a sustained period of time. I've
got a very conservative attitude when it comes to changing code for
performance benefits.
Very healthy attitude. But possibly not all changes are small andincremental. If we see mallocing and strduping a common suspectbetween platforms, a larger refactoring of parser.c, sax.c and tree.c maybe necessary.

Agreed, but difficult to assess without access to results from all theplatforms.


I've done some profiling against libc.so.1. Here is a top 20 extract:

  %  cumulative    self        self    total
time    seconds      seconds    calls  ms/call  ms/call name
26.0        9.20    9.20                            _libc_fork [2]
14.3       14.28    5.08   251434    0.02    0.02  _lstat64 [4]
 8.4       17.26    2.98    54316    0.05    0.05  _libc_read [6]
 3.1       18.37    1.11    48827    0.02    0.02  _stat (8)
 2.3       19.20    0.83    33675    0.02    0.02  _libc_close [9]
 2.2       19.99    0.79    13342    0.06    0.06  _getdents64 (12)
 1.8       20.63    0.64                            __open [15]
 1.7       21.24    0.61                            realfree [17]
 1.6       21.82    0.58  7334388    0.00    0.00  _mbtowc (18)
 1.5       22.36    0.54                            _morecore [19]
 1.5       22.89    0.54                            _free_unlocked [20]
 1.4       23.38    0.49                            _private_fcntl [23]
 1.4       23.87    0.49                            _brk_unlocked [24]
 1.3       24.34    0.47    51109    0.01    0.01  _ioctl [25]
 1.3       24.79    0.45                            _malloc_unlocked [26]
 1.2       25.21    0.42                            _creat64 [29]
 1.1       25.61    0.40    8501    0.05    0.05  _write [31]
 1.0       25.97    0.36                            cleanfree [33]
 1.0       26.33    0.36                            __mbtowc_sb [34]
 0.9       26.63    0.31                            t_splay [39]

Obviously most of these functions are internal to the Solaris 8 libc,however it's fairly apparent that memory allocation is a big part ofperformance. The statistics are warped by the excessive amount offorking that goes on during the test (gmake forks each test as aseparate process, that's a lot of overhead), but several of thefunctions listed below that relate to memory allocation, (e.g. realfree,_morecore, _free_unlocked, _brk_unlocked, _malloc_unlocked). There's asignificant amount of work involved in file access as well, _stat_lstat64 and I/O _libc_read, _libc_close, _write, _creat64.

If we wanted to focus on macro improvements to performance, then bettermemory allocation and I/O handling would be good places to look. Ofcourse, the level of improvement for work in these areas will varysignificantly across platforms.


Gary

Follow-Ups:
- Re: [xml] libxml2 performance
  - From: Daniel Veillard
- Re: [xml] libxml2 performance
  - From: Bjorn Reese

References:
- Re: [xml] libxml2 performance
  - From: Peter Jacobi

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]