Re: memory allocations.



This

> Histogram for block sizes:
>     0-15          28880  43% ==================================================
>    16-31          17489  26% ==============================

and this

> Histogram for block sizes:
>     0-15          12775  25% ====================================
>    16-31          17349  34% ==================================================

Suggests that if you put a wrapper around the allocation rooutine for
the functions which allocates larger pools and handles all requests <
32 bytes out of them you'd save (in the non-debug code) 69% of the
malloc calls.

> Graph with pools enabled:
>   http://pobox.com/~hp/gtk-demo-memusage.png
> All kind of hard to use without call stack information.

There are several things you can read from the graph.  Somebody who
knows the program should be able to figure out which parts of the
program this relates to.  A version of the graph with markers can be
found at

  http://people.redhat.com/drepper/commented-gtk-demo-memusage.png

(1) there is a not so steep slope up to about 300k where about 110k
    are allocated in about 9000 malloc calls.  This is probably all
    from the same function and almost all allocations are small.  A
    candidate for a memory pool

(2) A similar slope.  Another candidate.

(3) This is the most important.  There is constant allocation and
    deallocation of vey small memory chunk.  If this could be cached
    30% of all operations would go away.

(4)
(5)
(6) These are a few more small regions where allocation happens really
    slow.  Basically, every slope below a certain angle is noteworthy.


> Would a malloc() tuned for this histogram be faster than generic libc
> malloc(), or is libc malloc tuned for this sort of pattern already?

Every special-purpose allocator has the potential to be faster than a
general purpose allocator which malloc has to be.  But the
implementation cannot be too trivial.  It eventually has to cope with
multiple threads and then things already get ugly.  Simple locking can
lead to high contention.  Maintaining per-thread pools requirs a lot
of administrative overhead.  All this could mean that a special
implementation in fact is slower.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]