Re: [xml] First experiments with threading
- From: Gary Pennington <Gary Pennington uk sun com>
- To: veillard redhat com
- Cc: xml gnome org
- Subject: Re: [xml] First experiments with threading
- Date: Mon, 15 Oct 2001 15:50:43 +0100
Daniel Veillard wrote:
On Mon, Oct 15, 2001 at 12:59:16PM +0100, Gary Pennington wrote:
Daniel Veillard wrote:
I made the tests on an SMP box running Linux Red Hat with 2.4.7 kernel.
I think the impact comes from the access to the memory routines, libxml
is extremely aggressive on the memory allocator, and in the current code
the xmlMalloc/xmlRealloc/xmlFree are part of the per-thread data. This
mean that we are currently calling pthread_self() in addition to the
existing routine each time we access them and this reflects on performances.
Made more tests, the problem is not with pthread_self() or the call to
__xmlFree() or __xmlMalloc() but seems rather due to significant performance
penalties when compiling with -D_REENTRANT
On Solaris 8, I'm seeing a different picture.
I've run the version built with the patches I submitted against 2.4.5
and I see a 12/12% degradation. You are showing 57/46%, which is a lot
higher and also not a consistent degradation.
Well the fact that the multi-threaded allocator is slower is not surprizing,
it's really fast usually, and the associated work needed for -D_REENTRANT
can have a significant impact.
The fact that --valid make serious changes is not surprizing either
the working set of syscalls made is really affected, having different
degradation impact for both modes is not surprizing.
Also could you check the CVS version ?
I just built the CVS version. I used the same build flags to the
compiler to make sure the comparison was valid and the same compiler
(Sun Forte 6.1, -xO3) and I get a consistent 5/7% degradation. The
degradation is lower than I estimated previously, since now I am
comparing the same source (whereas before I compared non-threaded 2.4.5
source against the patched version I had written). Now I am comparing
the same source with thread support compiled or not. The thread support
cost in Solaris seems to be about 6% (averaging the data points above)
which is low.
I "trussed" the code to make sure that our interceptor routines were
being invoked, and they are - here's a sample
10141/1: <- libxml2:xmlParserInputShrink() = 0x46d90
10141/1: -> libxml2:xmlSkipBlankChars(0x343d8, 0x0,
0x0, 0x0)
10141/1: <- libxml2:xmlSkipBlankChars() = 1
10141/1: -> libxml2:xmlParseName(0x343d8, 0x0, 0x0, 0x0)
10141/1: -> libxml2:xmlStrndup(0x4af7c, 0x4, 0x0, 0x0)
10141/1: -> libxml2:__xmlMalloc(0x0, 0x0, 0x0, 0x0)
10141/1: <- libxml2:__xmlMalloc() = 0xff357de0
10141/1: <- libxml2:xmlStrndup() = 0x326d8
10141/1: <- libxml2:xmlParseName() = 0x326d8
10141/1: -> libxml2:xmlSkipBlankChars(0x343d8, 0x0,
0x0, 0x0)
It looks like the difference we can see is down to the difference in
threading support between Solaris and Linux. Anyway, the degradation is
acceptable (I think) on Solaris - but I agree that we should still
leave the default as non-threaded.
I guess that this could be down to the difference between the
Solaris/Linux threading models or it could be partly as a result of
changes in the code above and beyond the patches I submitted.
yes, I bet on the 1st one since I mostly did name changes but not
mechanism changes
I think that to make sure we are comparing like with like, I should
build against your modifications. Is there a CVS incantation that I can
use to pull down the same code that you tested with or can I just get
the latest code from the cvs repository?
latest from this night will do.
My current view are:
1/ that threaded mode should not be the default configuration
I agree. Lost of people don't need thread support and would rather not
pay the performance price (which will always exist no matter how much we
try to minimize it).
okay
2/ that by default xmlMalloc/xmlRealloc/xmlFree should be kept
application wide settings
This might be wise anyway, since a thread passing a memory pointer to
another thread may have problems if the memory allocator used by each
thread is significantly different.
agreed too, that can get really messy.
3/ that it shall be relatively straightforward to make them
thread specific with the use of a dedicated #define
this can still be useful for other thread models the
equivalent of pthread_self() is real cheap.
The code is already written. Could you make it configurable option for
the brave of heart?
Done, configure now lists:
--with-thread-alloc Add per-thread memory(off)
I will work on 2/ to check if my analysis of the added cost was right,
I was wrong,
and will do 3/ if it is confirmed. Then I will cleanup the couple of places
where libxml code need locking and add some threading regression tests.
I did 3/
Daniel
Gary
--
Gary Pennington
Solaris Kernel Development,
Sun Microsystems
Gary Pennington sun com
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]