Malloc profiler/callgraph



Recently, Behdad has turned his attention to reducing the number of
allocations Cairo makes. In order to measure his progress, he wrote a
tool to hook into malloc and record the callers. Unfortunately in order
to get the best results, he needed to modify the source. As an
alternative, I present this valgrind skin. It is mostly based on the
massif skin, in that it overloads the mallocfree functions and records
the entire stacktrace and accumulates statistics for each unique trace.
At the end it will print a table of the allocators (the function that
called malloc, or rather the first function not listed among --alloc-fn
ala massif) and it will dump out the unique stack traces to a file. At
the moment, I have not translated this output into any common format (I
was thinking of writing it in a callgrind.out format so as to use it in
kcachegrind) and instead include a very simple mp-gui.py to read in the
stack traces and provide a means of reviewing the results.

The patch is relative to valgrind's svn trunk. Apply, reconfigure and
make install. Usage is similar to other valgrind skins:
$ valgrind --tool=memprof --help
$ valgrind --tool=memprof ./cairo-perf

And the output is:
==18877== 216 distinct allocators.
==18877== nBlocks	nBytes		nReallocs  Lifespan (ms)
	...
==18877== 484,619	1,030,781,440	0 1      _cairo_traps_add_trap_from_points [cairo-traps.c::193]
==18877== 528,888	21,155,520	0 0      _cairo_pixman_format_create_masks [icformat.c::102]
==18877== 528,916	69,816,912	0 62     pixman_image_createForPixels [icimage.c::76]
==18877== 598,300	4,786,400	0 2      _cairo_freelist_alloc [cairo-freelist.c::52]
==18877== 967,584	290,275,200	0 2      _cairo_path_fixed_move_to [cairo-path-fixed.c::199]
==18877== 1,408,396	361,163,776	0 0      _cairo_spline_add_point [cairo-spline.c::110]
==18877== 1,763,825	32,374,496	0 2      skip_list_insert [cairo-skiplist.c::293]
==18877== 10,943,515	4,330,076,529	0 145    (total)

The downside to this tool is that it incurs an order of magnitude
performance overhead, which is a nuisance as before it extracted the
stack for each unique callsite it was only about a factor of 3-4 slower.

I hope you find this a useful little tool.
Happy Profiling!
--
Chris Wilson

Attachment: vg-memprof.patch.gz
Description: Binary data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]