Recently, Behdad has turned his attention to reducing the number of allocations Cairo makes. In order to measure his progress, he wrote a tool to hook into malloc and record the callers. Unfortunately in order to get the best results, he needed to modify the source. As an alternative, I present this valgrind skin. It is mostly based on the massif skin, in that it overloads the mallocfree functions and records the entire stacktrace and accumulates statistics for each unique trace. At the end it will print a table of the allocators (the function that called malloc, or rather the first function not listed among --alloc-fn ala massif) and it will dump out the unique stack traces to a file. At the moment, I have not translated this output into any common format (I was thinking of writing it in a callgrind.out format so as to use it in kcachegrind) and instead include a very simple mp-gui.py to read in the stack traces and provide a means of reviewing the results. The patch is relative to valgrind's svn trunk. Apply, reconfigure and make install. Usage is similar to other valgrind skins: $ valgrind --tool=memprof --help $ valgrind --tool=memprof ./cairo-perf And the output is: ==18877== 216 distinct allocators. ==18877== nBlocks nBytes nReallocs Lifespan (ms) ... ==18877== 484,619 1,030,781,440 0 1 _cairo_traps_add_trap_from_points [cairo-traps.c::193] ==18877== 528,888 21,155,520 0 0 _cairo_pixman_format_create_masks [icformat.c::102] ==18877== 528,916 69,816,912 0 62 pixman_image_createForPixels [icimage.c::76] ==18877== 598,300 4,786,400 0 2 _cairo_freelist_alloc [cairo-freelist.c::52] ==18877== 967,584 290,275,200 0 2 _cairo_path_fixed_move_to [cairo-path-fixed.c::199] ==18877== 1,408,396 361,163,776 0 0 _cairo_spline_add_point [cairo-spline.c::110] ==18877== 1,763,825 32,374,496 0 2 skip_list_insert [cairo-skiplist.c::293] ==18877== 10,943,515 4,330,076,529 0 145 (total) The downside to this tool is that it incurs an order of magnitude performance overhead, which is a nuisance as before it extracted the stack for each unique callsite it was only about a factor of 3-4 slower. I hope you find this a useful little tool. Happy Profiling! -- Chris Wilson
Attachment:
vg-memprof.patch.gz
Description: Binary data