Re: Gtk+ unit tests (brainstorming)

From: Tim Janik <timj imendio com>
To: Carl Worth <cworth cworth org>
Cc: Federico Mena Quintero <federico ximian com>, Gtk+ Developers <gtk-devel-list gnome org>
Subject: Re: Gtk+ unit tests (brainstorming)
Date: Tue, 31 Oct 2006 15:26:35 +0100 (CET)

On Wed, 25 Oct 2006, Carl Worth wrote:

On Wed, 25 Oct 2006 12:40:27 -0500, Federico Mena Quintero wrote:

There are some things I really don't like in cairo's "make check"
suite right now:

2. The tests take forever to link. Each test right now is a separate
  program. I chose this originally so that I could easily execute
  individual tests, (something I still do regularly and still
  require). The problem is that with any change to the library, "make
  check" goes through the horrifically slow process of using libtool
  to re-link the hundred or so programs. One idea that's been floated
  to fix this is something like a single test program that links with
  the library, and then dlopens each test module (or something like
  this). Nothing like that has been implemented yet.


that won't quite work either, because libtool has to link shared modules
also. and that takes even longer. for beast, that's the sole issue, the
plugins/ dir takes forever to build and forever to install (libtool relinks
upon installation).

to avoid this type of hassle with the test programs, what we've been
doing is basically to put multiple tests into a single programs, i.e.

static void
test_paths (void)
{
  TSTART ("Path handling");
  TASSERT (...);
  TDONE();
}
[...]

int
main (int   argc,
      char *argv[])
{
  birnet_init_test (&argc, &argv);

  test_cpu_info();
  test_paths();
  test_zintern();
  [...]
  test_virtual_typeid();

  return 0;
}

Something that is worth stealing is some of the work I've been doing
in "make perf" in cairo. I've been putting a lot of effort into
getting the most reliable numbers out of running performance tests,
(and doing it is quickly as possible yet). I started with stuff that
was in Manu's torturer with revamping from Benjamin Otte and I've
further improved it from there.

Some of the useful stuff is things such as using CPU performance
counters for measuring "time", (which of course I didn't write, but
just got from liboil---thanks David!), and then some basic statistical
analysis---such as reporting the average and standard deviation over
many short runs timed individually, rather than just timing many runs
as a whole, (which gives the same information as the average, but
without any indication of how stable the results are from one to the
next).


we've looked at cairo's perf output the other day, and one thing we really
failed to understand is that you print average values for your test runs.
granted, there might be some limited use to averaging over multiple runs
to have an idea how much time "could" be consumed by a particular task,
but much more interesting are other numbers.

i.e. using averaging, your numbers include uninteresting outliers
that can result from scheduling artefacts (like measuring a whole second
for copying a single pixel), and they hide the interesting information,
which is the fastest possible performance encountered for your test code.

printing the median over your benchmark runs would give a much better
indication of the to-be-expected average runtime, because outliers
into either direction are essentially ignored that way.

most interesting for benchmarking and optimization however is the minimum
time a specific operation takes, since in machine execution there is a hard
lower limit we're interested in optimizing. and apart from performance
clock skews, there'll never be minimum time measurement anomalies wich
we wanted to ignore.

for beast, we've used a combination of calibration code to figure minimum
test run repetitions and taking measurements minimums, which yields quite
stable and accurate results even in the presence of concurrent background
tasks like project/documentation build processes.

The statistical stuff could still be improved, (as I described in a
recent post to performance-list), but I think it is a reasonable
starting point.


well, apologies if median/minimum printing is simply still on your TODO ;)

Oh, and my code also takes care to do things like ensuring that the X
server has actually finished drawing what you asked it to, (I think
GtkWidgetProfiler does that as well---but perhaps with a different
approach). My stuff uses a single-pixel XGetImage just before starting
or stopping the timer.


why exactly is that a good idea (and better than say XSync())?
does the X server implement logic like globally carrying out all
pending/enqueued drawing commands before allowing any image capturing?

Never forget the truth that Keith Packard likes to share
often:

	Untested code == Broken code


heh ;)

-Carl


---
ciaoTJ

Follow-Ups:
- Re: Gtk+ unit tests (brainstorming)
  - From: Carl Worth

References:
- Gtk+ unit tests (brainstorming)
  - From: Tim Janik
- Re: Gtk+ unit tests (brainstorming)
  - From: Federico Mena Quintero
- Re: Gtk+ unit tests (brainstorming)
  - From: Carl Worth

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]