Re: [xml] First experiments with threading



Daniel Veillard wrote:

On Mon, Oct 15, 2001 at 12:59:16PM +0100, Gary Pennington wrote:

Daniel Veillard wrote:

I made the tests on an SMP box running Linux Red Hat with 2.4.7 kernel.
I think the impact comes from the access to the memory routines, libxml
is extremely aggressive on the memory allocator, and in the current code
the xmlMalloc/xmlRealloc/xmlFree are part of the per-thread data. This
mean that we are currently calling pthread_self() in addition to the
existing routine each time we access them and this reflects on performances.


 Made more tests, the problem is not with pthread_self() or the call to
__xmlFree() or __xmlMalloc() but seems rather due to significant performance
penalties when compiling with -D_REENTRANT

On Solaris 8, I'm seeing a different picture.

I've run the version built with the patches I submitted against 2.4.5 and I see a 12/12% degradation. You are showing 57/46%, which is a lot higher and also not a consistent degradation.


 Well the fact that the multi-threaded allocator is slower is not surprizing,
it's really fast usually, and the associated work needed for -D_REENTRANT
can have a significant impact.
 The fact that --valid make serious changes is not surprizing either
the working set of syscalls made is really affected, having different
degradation impact for both modes is not surprizing.
 Also could you check the CVS version ?

I just built the CVS version. I used the same build flags to the compiler to make sure the comparison was valid and the same compiler (Sun Forte 6.1, -xO3) and I get a consistent 5/7% degradation. The degradation is lower than I estimated previously, since now I am comparing the same source (whereas before I compared non-threaded 2.4.5 source against the patched version I had written). Now I am comparing the same source with thread support compiled or not. The thread support cost in Solaris seems to be about 6% (averaging the data points above) which is low. I "trussed" the code to make sure that our interceptor routines were being invoked, and they are - here's a sample

10141/1:                  <- libxml2:xmlParserInputShrink() = 0x46d90
10141/1: -> libxml2:xmlSkipBlankChars(0x343d8, 0x0, 0x0, 0x0)
10141/1:                  <- libxml2:xmlSkipBlankChars() = 1
10141/1:                  -> libxml2:xmlParseName(0x343d8, 0x0, 0x0, 0x0)
10141/1:                    -> libxml2:xmlStrndup(0x4af7c, 0x4, 0x0, 0x0)
10141/1:                      -> libxml2:__xmlMalloc(0x0, 0x0, 0x0, 0x0)
10141/1:                      <- libxml2:__xmlMalloc() = 0xff357de0
10141/1:                    <- libxml2:xmlStrndup() = 0x326d8
10141/1:                  <- libxml2:xmlParseName() = 0x326d8
10141/1: -> libxml2:xmlSkipBlankChars(0x343d8, 0x0, 0x0, 0x0)


It looks like the difference we can see is down to the difference in threading support between Solaris and Linux. Anyway, the degradation is acceptable (I think) on Solaris - but I agree that we should still leave the default as non-threaded.



I guess that this could be down to the difference between the Solaris/Linux threading models or it could be partly as a result of changes in the code above and beyond the patches I submitted.


yes, I bet on the 1st one since I mostly did name changes but not
mechanism changes

I think that to make sure we are comparing like with like, I should build against your modifications. Is there a CVS incantation that I can use to pull down the same code that you tested with or can I just get the latest code from the cvs repository?


 latest from this night will do.

My current view are:
 1/ that threaded mode should not be the default configuration

I agree. Lost of people don't need thread support and would rather not pay the performance price (which will always exist no matter how much we try to minimize it).


 okay

 2/ that by default xmlMalloc/xmlRealloc/xmlFree should be kept
    application wide settings

This might be wise anyway, since a thread passing a memory pointer to another thread may have problems if the memory allocator used by each thread is significantly different.


 agreed too, that can get really messy.

 3/ that it shall be relatively straightforward to make them
    thread specific with the use of a dedicated #define
    this can still be useful for other thread models the
    equivalent of pthread_self() is real cheap.

The code is already written. Could you make it configurable option for the brave of heart?


 Done, configure now lists:
     --with-thread-alloc  Add per-thread memory(off)

I will work on 2/ to check if my analysis of the added cost was right,


 I was wrong,

and will do 3/ if it is confirmed. Then I will cleanup the couple of places
where libxml code need locking and add some threading regression tests.


 I did 3/

Daniel


Gary

--
Gary Pennington
Solaris Kernel Development,
Sun Microsystems
Gary Pennington sun com






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]