On Thu, 05 Oct 2006 11:32:37 -0700, Carl Worth wrote: > > Another tool that will be helpful to have is something for doing > historical comparison over several runs. The simplest tool, and very > useful, would be a "performance diff" that takes two runs and reports > the difference, (perhaps only showing tests where the results differ > more than a single standard deviation). I would use that kind of tool > constantly to ensure that submitted patches to provide desired > performance improvements. I've written that program now. It's called cairo-perf-diff and it's built in the cairo/perf directory. The Makefile won't install it or anything, as I figure it's easy enough for interested people to just manually copy it to ~/bin or whatever. The interesting part of making this program work well is in what it _doesn't_ show. Currently it is discarding as uninteresting any change for which the mean values are not separated by more than 3 of the standard deviation of each. Ideally, that's the only kind of discarding we would do, but it's not quite working well enough yet. So, currently I'm also discarding any changes below a given threshold, (5% by default, but can also be specified as the third argument on the command line). Even then, it's still not discarding all the noise. There's a really easy test for this. Just run cairo-perf twice (saving the output from each run into first.perf and second.perf) and then run: cairo-perf first.perf second.perf 0.0 (That third 0.0 forces it to only discard based on overlapping probability distributions based on the 3 standard deviations---and not too discard things based on the percentage change being too small.) If everything were working correctly, the output from the above would be empty, since there should be no interesting changes in the performance results, (and any variation should be captured by the reported standard deviations). But the results aren't empty yet. I did some things to attempt to improve this already. For example, I've made cairo-perf output the number of ticks it measures in addition to the time in milliseconds it estimates, (based on an estimate of the CPU frequency that it measures). So cairo-perf-diff computes only on the ticks columns, (but puts the time in its output for readability). I think other problems are the fixed-percentage outlier elimination and early bailout based on a stably low standard deviation. I think these prevent the standard deviation from capturing the true amount of variation. I started some work to eliminate the early bailout and to do adaptive outlier detection, (based on the conventional "1.5 times the interquartile range above the third quartile or below the first quartile" http://mathworld.wolfram.com/Outlier.html ). I haven't succeeded at making great improvements along those lines, (particularly in light of the fact that removing the early bail out slows things down a lot). And I really need to start using this tool to land cairo patches rather than develop it. So if anyone else wants to improve things to try to get the command above to report nothing, then that would be greatly appreciated. In the meantime, here's a sample showing what the output can look like. Here's what cairo-perf-diff gives me when I give it the results of cairo-perf before and after the patch that Monty provided for fixing the subimage_copy performance bug in cairo: -Carl Speedups ======== xlib-rgba subimage_copy-512 3.93 2.46% -> 0.07 2.71%: 52.91x faster ███████████████████████████████████████████████████▉ xlib-rgb subimage_copy-512 4.03 1.97% -> 0.09 2.61%: 44.74x faster ███████████████████████████████████████████▊ xlib-rgba subimage_copy-256 1.02 2.25% -> 0.07 0.56%: 14.42x faster █████████████▍ xlib-rgba text_image_rgb_over-256 63.21 1.53% -> 11.87 2.17%: 5.33x faster ████▍ xlib-rgba text_image_rgba_over-256 62.31 0.72% -> 11.87 2.82%: 5.25x faster ████▎ xlib-rgba text_image_rgba_source-256 67.97 0.85% -> 16.48 2.23%: 4.13x faster ███▏ xlib-rgba text_image_rgb_source-256 68.82 0.55% -> 16.93 2.10%: 4.07x faster ███▏ xlib-rgba subimage_copy-128 0.19 1.72% -> 0.06 0.85%: 3.10x faster ██▏ xlib-rgb text_image_rgb_over-256 108.22 0.40% -> 57.47 0.37%: 1.88x faster ▉ xlib-rgb text_image_rgba_over-256 107.32 0.59% -> 57.32 0.78%: 1.87x faster ▉ xlib-rgb text_image_rgb_source-256 114.92 0.44% -> 61.73 0.79%: 1.86x faster ▉ xlib-rgb text_image_rgba_source-256 114.01 0.51% -> 61.69 0.51%: 1.85x faster ▉ xlib-rgba subimage_copy-64 0.11 2.24% -> 0.06 0.73%: 1.83x faster ▉ xlib-rgb subimage_copy-256 2.81 1.57% -> 1.65 1.19%: 1.71x faster ▊ xlib-rgba text_image_rgb_over-128 4.78 2.22% -> 2.85 1.06%: 1.68x faster ▋ xlib-rgba text_image_rgba_over-128 4.72 1.38% -> 2.83 0.92%: 1.67x faster ▋ xlib-rgba text_image_rgb_source-128 5.82 0.22% -> 3.92 0.57%: 1.48x faster ▌ xlib-rgba text_image_rgba_source-128 5.79 0.25% -> 3.93 1.56%: 1.47x faster ▌ xlib-rgba text_image_rgba_over-64 1.53 1.03% -> 1.13 0.42%: 1.35x faster ▍ xlib-rgba text_image_rgb_over-64 1.52 0.45% -> 1.13 1.15%: 1.34x faster ▍ xlib-rgb subimage_copy-64 0.25 1.04% -> 0.19 2.61%: 1.34x faster ▍ xlib-rgb subimage_copy-128 0.64 1.65% -> 0.50 1.09%: 1.27x faster ▎ xlib-rgba fill_radial_rgba_over-256 9.75 0.95% -> 7.81 2.55%: 1.25x faster ▎ xlib-rgba fill_image_rgb_over-256 2.56 0.77% -> 2.07 1.49%: 1.24x faster ▎ xlib-rgba fill_image_rgba_over-256 2.55 0.41% -> 2.06 1.01%: 1.23x faster ▎ xlib-rgba text_image_rgb_source-64 2.27 0.91% -> 1.88 0.20%: 1.21x faster ▎ xlib-rgba fill_radial_rgb_over-256 9.68 0.60% -> 8.17 0.51%: 1.18x faster ▏ xlib-rgba fill_image_rgba_source-256 3.95 2.11% -> 3.35 1.51%: 1.18x faster ▏ xlib-rgba subimage_copy-32 0.07 1.57% -> 0.06 0.91%: 1.17x faster ▏ xlib-rgba text_image_rgba_source-64 2.25 0.28% -> 1.92 1.57%: 1.17x faster ▏ xlib-rgba fill_image_rgb_source-256 3.85 0.39% -> 3.32 1.20%: 1.16x faster ▏ xlib-rgb text_image_rgb_over-64 4.60 2.34% -> 4.06 0.51%: 1.13x faster ▏ xlib-rgb text_image_rgb_over-128 16.05 1.57% -> 14.24 1.86%: 1.13x faster ▏ xlib-rgb text_image_rgb_source-128 17.20 2.02% -> 15.32 1.76%: 1.12x faster ▏ xlib-rgb text_image_rgba_over-64 4.54 0.71% -> 4.11 1.08%: 1.10x faster ▏ xlib-rgb text_image_rgb_source-64 5.03 0.35% -> 4.59 0.16%: 1.10x faster ▏ xlib-rgba fill_image_rgba_over-64 0.36 1.78% -> 0.33 0.61%: 1.09x faster ▏ xlib-rgb text_image_rgba_source-64 4.99 0.20% -> 4.61 0.49%: 1.08x faster ▏ xlib-rgb subimage_copy-32 0.11 1.24% -> 0.10 1.13%: 1.07x faster ▏ xlib-rgba fill_radial_rgb_source-128 2.54 0.44% -> 2.38 0.31%: 1.07x faster ▏ xlib-rgba fill_image_rgba_source-64 0.48 0.65% -> 0.45 0.58%: 1.07x faster ▏ xlib-rgba fill_radial_rgb_over-128 2.19 0.33% -> 2.06 1.00%: 1.06x faster ▏ xlib-rgba fill_image_rgb_source-64 0.48 0.60% -> 0.45 0.72%: 1.06x faster Slowdowns ========= xlib-rgba paint_similar_rgba_source-256 0.12 2.52% -> 0.16 2.81%: 1.33x slower ▍ image-rgba paint_image_rgba_source-256 0.08 0.39% -> 0.10 2.45%: 1.25x slower ▎ image-rgba paint_similar_rgba_source-256 0.09 0.38% -> 0.10 2.35%: 1.20x slower ▎ image-rgb paint_solid_rgb_over-512 0.64 1.12% -> 0.74 1.57%: 1.17x slower ▏ image-rgb paint_solid_rgba_source-512 0.64 1.21% -> 0.74 0.44%: 1.17x slower ▏ image-rgb paint_solid_rgb_source-512 0.64 0.93% -> 0.74 0.59%: 1.16x slower ▏ image-rgb paint_radial_rgb_source-512 53.05 2.18% -> 60.76 2.07%: 1.15x slower ▏ xlib-rgba text_radial_rgb_over-64 3.95 0.57% -> 4.48 1.09%: 1.14x slower ▏ image-rgba paint_solid_rgba_source-512 0.66 1.65% -> 0.73 1.10%: 1.12x slower ▏ image-rgba paint_solid_rgb_source-512 0.66 1.90% -> 0.73 0.74%: 1.11x slower ▏ image-rgb paint_similar_rgba_source-256 0.26 1.09% -> 0.29 0.98%: 1.11x slower ▏ image-rgb fill_radial_rgba_over-256 5.57 0.30% -> 6.11 0.24%: 1.10x slower ▏ image-rgb paint_radial_rgba_over-512 55.79 1.42% -> 60.80 0.68%: 1.09x slower ▏ image-rgb fill_radial_rgba_source-128 1.64 0.20% -> 1.78 0.15%: 1.09x slower ▏ image-rgb fill_radial_rgba_source-256 6.02 0.49% -> 6.55 0.26%: 1.09x slower ▏ image-rgb fill_radial_rgba_over-128 1.54 1.08% -> 1.66 0.15%: 1.07x slower ▏ image-rgb fill_radial_rgba_source-64 0.56 0.47% -> 0.60 0.46%: 1.07x slower ▏ image-rgb paint_image_rgb_source-256 0.08 0.39% -> 0.09 0.78%: 1.06x slower image-rgb fill_radial_rgba_over-64 0.53 0.14% -> 0.56 0.47%: 1.06x slower xlib-rgba fill_radial_rgb_source-64 0.83 0.38% -> 0.88 0.46%: 1.05x slower
Attachment:
pgphj1ki8D0vH.pgp
Description: PGP signature