Re: Performance Patches (was Re: [xml] too many mallocs?)

From: Aleksey Sanin <aleksey aleksey com>
To: veillard redhat com
Cc: Peter Jacobi <pj walter-graphtek com>, xml gnome org
Subject: Re: Performance Patches (was Re: [xml] too many mallocs?)
Date: Mon, 27 May 2002 09:37:50 -0700

IMHO, the performance bottleneck is memory allocations. Every call to malloc/free
means global process lock/unlock and this really slows down everything. However,
in most cases you do not need to have every attribute, name, node structure, etc.
allocated separatelly. It's a rare case when you'll set different values for the same
attribute many times. In most cases the value is set once and forever. Also parsing/creating
the tree is almost always done in the same process. These points lead to the idea
of doing "buffered" memory allocation in the xmlDoc: when memory is allocated
to store a document node, node attribute, etc. libxml should get memory from *one* (per
xmlDoc) growing memory buffer. This can reduce the number of mallocs dramatically
and this number will depend only on the size of the input document! The number of
elements does not matter! The freeing is one operation everytime :)
However, there are some problems with this solution (all solvable, I hope):
    1) SAX
    2) move nodes from doc to doc
    3) nodes outside doc

This optimization mostly impacts multi-threaded application (most of the cases these days I guess).
And it is worth to do it. However, this means a *lot* of changes in the code and I am not
sure that it's the right time for this (probably this memory allocation should go to libxml3).

Aleksey.

Daniel Veillard wrote:

On Mon, May 27, 2002 at 10:52:50AM +0200, Peter Jacobi wrote:

Hi Daniel, All,

As I have to understand the inner workings of libxml2 anyway, I can as 
well try to look for some performance patches (given the fact that
it will run on some 100MHz AMD Elan PC104 boards soon).

As this is scheduled to be a weekend hobby mostly, progress may be 
slow.


  No worries, Garry Pennington is also doing some performance work
in the background, I'm actually quite happy to see people learning
the internals, the larger the pool of knowledgeable people about the
code the safer the project !

Two logistical difficulties: 
1) I'm still not able to CVS through our firewall and I've found no way
to get plain source files from Bonsai, so I would like to use the
snapshot tarballs. But the tarball linked from www.xmlsoft.org is dated
2002-03-07 - what's going on there?


  For some reasons the cron job ain't working, I relaunched it manually
  -rw-r--r--    1 veillard www       4523125 May 27 06:34 cvs-snapshot.tar.gz

2) You have packed zillions of test files under directory test, but I
didn't find something like an automatic harness to execute the
tests. Does something like this exists? It would be most usefull
for rapidly discovering broken optimizations.


  make tests in the Unix makefiles. I also use the check-xml-test-suite.py
Python based script to check against the W3C/NIST XML testsuite, the only
pointer I have at the moment is the public mail archives
   http://lists.w3.org/Archives/Public/public-xml-testsuite/

Some preliminary comments (and questions):

a) The "too many malloc problem"
It seems to me, that the different layers don't conspire enough
to save (time and memory) resources. This problem is worsened,
as the function signature of some layers are fixed (SAX). In effect
every layer tends to allocate a new copy of the data.


  Right, SAX is a big pain, especially since I want to keep references
to entities in attributes and SAX was not designed to allow this.

The largest gain so fair in my tests came from adding versions of 
xmlNewDocNode, xmlNewNode and xmlNewNsProp
which take ownership of the 'name' string instead of strduping it.


  Hum :-( that's a serious API change, but as separate function this
sounds fine,

Another idea is, as libxml2 is essentially acting as if strduping
is cheap, can we make it really cheap by going to reference
counted strings? I'm not clear on this, and anyway, it would have
to wait for libxml3.


  I did made an attempt at this and concluded at the time that there
was no significant gain to be made. But in the meantime I removed a number
of CPU intensive operations and this may change seriously. I also only
made those tests with linux.

b) Macros and the the ctxt->token case
I'm still not positive about the macros (but I may have
erred regarding the multiple returns). They hide their
cost from the programmer but not from the processor.
RAW and CUR are used about 350 times in parser.c
and parserinternals.c - eliminating the ctxt->token
test (and so replacing both of them with a simple
*ctxt->input->cur) was the greatest single factor for
decreasing the binary's size and the second largest
for increasing speed.


  yes but it's likely to seriously break the parser. Any such change
which wasn't tested with "make tests" holds little trust, conformance
is still my #1 goal.

So that raises the question, whether ctxt->token can
be buried as a relict from the past. As I see it, all


 I don't think so ! But you're right it might be a good idea to double
check,

nontrivial uses of it are already commented out (for
example in xmlParserHandleReference). The only
remaining assignment of other values than 0, is
the assignment of ' ' in xmlParserHandlePEReference
and xmlParsePEReference, and even there its use
is suspect if read the TODO comment right.
Can't we stuff the ' ' directly into the buffer if it is
really needed?


 Hum, maybe, yes

c) UTF8 conversion
I'm wondering whether parser.c can be changed to always working
on xmlChars and all costly conversion to 32bit UNICODE codepoints
can be avoided.


  no, you need to check the values of the codepoints for conformance.

My first impression is, that only the NameChar and
NameStartChar checking is really bothered about UNICODE codepoints
and these checks could be replaced by multi-level table lookups 
of the UTF8 bytes.


  no the restrictions of the Char production applies to the full content
of the XML document (except where stricter rules applies).

d) ctxt-sax checking
Another candidate for elimination are the about 250 cases of checking 
ctxt->sax before calling the SAX callback. I'm under the impression, that
ctxt->sax is always not zero, when I look in xmlSAXParseFile and
xmlSAXParseMemory. Also when changing the line
 ctxt->sax = sax
to
 *ctxt->sax = *sax
we are free to change the zeros in the SAX callback struc to NOP 
handlers, and the tests for the callbacks being zero can be eliminated
too. Finally, when disabling the sax callback by overwriting all
entries with NOP handlers, even the tests of ctxt->disableSAX can be
eliminated.


  Not sure you would gain much by removing those tests, and I would not feel
comfortable dereferencing the pointer without checking it first. It's just
too easy to get a new widely deployed security hole ... checks are good IMHO

Daniel

Follow-Ups:
- Re: Performance Patches (was Re: [xml] too many mallocs?)
  - From: Peter Jacobi

References:
- Re: [xml] too many mallocs?
  - From: Peter Jacobi
- Re: [xml] too many mallocs?
  - From: Daniel Veillard
- Performance Patches (was Re: [xml] too many mallocs?)
  - From: Peter Jacobi
- Re: Performance Patches (was Re: [xml] too many mallocs?)
  - From: Daniel Veillard

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]