Gtk+ unit tests (brainstorming)



Hi all.

as mentioned in another email already, i've recently worked on improving
unit test integration in Beast and summarized this in my last blog entry:
  http://blogs.gnome.org/view/timj/2006/10/23/0 # Beast and unit testing


while analysing the need for a testing framework and whether it makes sense
for GLib and Gtk+ to depend on yet another package for the sole purpose of
testing, i made/had the following observations/thoughts:

- Unit tests should run fast - a test taking 1/10th of a second is a slow
  unit test, i've mentioned this in my blog entry already.

- the important aspect about a unit test is the testing it does, not the
  testing framework matter. as such, a testing framework doesn't need to
  be big, here is one that is implemented in a whole 4 lines of C source,
  it gets this point across very well: ;)
    http://www.jera.com/techinfo/jtns/jtn002.html

- in the common case, test results should be reduced to a single boolean:
    "all tests passed" vs. "at least one test failed"
  many test frameworks provide means to count and report failing tests
  (even automake's standard check:-rule), there's little to no merit to
  this functionality though.
  having/letting more than one test fail and to continue work in an
  unrelated area rapidly leads to confusion about which tests are
  supposed to work and which aren't, especially in multi-contributor setups.
  figuring whether the right test passed, suddenly requires scanning of
  the test logs and remembering the last count of tests that may validly
  fail. this defeats the purpose using a single quick make check run to
  be confident that one's changes didn't introduce breakage.
  as a result, the whole test harness should always either succeed or
  be immediately fixed.

- for reasons also mentioned in the afformentioned blog entry it might
  be a good idea for Gtk+ as well to split up tests into things that
  can quickly be checked, thoroughly be checked but take long, and into
  performance/benchmark tests.
  these can be executed by make targets check, slowcheck and perf
  respectively.

- for tests that check abort()-like behvaior, it can make sense to fork-off
  a test program and check whether it fails in the correct place.
  allthough this type of checks are the minority, the basic
  fork-functionality shouldn't be reimplemented all over again and warrants
  a test utility function.

- for time bound tasks it can also make sense to fork a test and after
  a certain timeout, abort and fail the test.

- some test suites offer formal setup mechnisms for test "sessions".
  i fail to see the necessity for this. main() { } provides useful test
  grouping just as well, this idea is applied in an example below.

- multiple tests may need to support the same set of command line arguments
  e.g. --test-slow or --test-perf as outlined in the blog entry.
  it makes sense to combine this logic in a common test utility function,
  usually pretty small.

- homogeneous or consistent test output might be desirable in some contexts.
  so far, i've made the experience that for simple make check runs, the most
  important things are that it's fast enough for people to run frequently
  and that it succeeds.
  if somewhat slowly perceived parts are hard to avoid, a progress indicator
  can help a lot to overcome the required waiting time. so, here the exact
  oputput isn't too important as long as some progress is displayed.
  for performance measurements it makes sense to use somewhat canonical
  output formats though (ideally machine parsable) and it can simplify the
  test implementations if performance results may be intermixed with existing
  test outputs (such as progress indicators).
  i've mentioned this in my blog entry as well, it boils down to using a
  small set of utility funcitons to format machine-detectable performance
  test result output.

- GLib based test programs should never produce a "CRITICAL **:" or
  "WARNING **:" message and succeed. the reasoning here is that CRITICALs
  and WARNINGs are indicators for an invalid program or library state,
  anything can follow from this.
  since tests are in place to verify correct implementation/operation, an
  invalid program state should never be reached. as a consequence, all tests
  should upon initialization make CRITICALs and WARNINGs fatal (as if
  --g-fatal-warnings was given).

- test programs should be good glib citizens by definineg G_LOG_DOMAIN, so
  WARNING, CRITICAL, and ERROR printouts can correctly indicate the failing
  component. since multiple test programs usually go into the same directory,
  something like DEFS += -DG_LOG_DOMAIN='"$(basename $(@F))"' (for GNU make)
  or DEFS += -DG_LOG_DOMAIN='"$@"' (for portable makefiles) needs to be used.


as far as a "testing framework" is needed for GLib/Gtk+, i think it would
be sufficient to have a pair of common testutils.[hc] files that provide:

1- an initialization function that calls gtk_init() and preparses
   arguments relevant for test programs. this should also make all WARNINGs
   and CRITICALs fatal.

2- a function to register all widget types provided by Gtk+, (useful for
   automated testing).

3- a function to fork off a test and assert it fails in the expected place
   (around a certain statement).

4- it may be helpful to have a fork-off and timeout helper function as well.

5- simple helper macros to indicate test start/progress/assertions/end.
   (we've at least found these useful to have in Beast.)

6- output formatting functions to consistently present performance measurements
   in a machine parsable manner.


if i'm not mistaken, test frameworks like Check would only help us out with
3, 4 and to some extend 5. i don't think this warrants a new package
dependency, especially since 5 might be highly customized and 3 or 4 could be
useful to provide generally in GLib.


here is an example to be more concrete on what i think Gtk+ tests could look
like, i.e. it shows what we have in beast right now:

==========tests/Makefile.am===================================================
DEFS            += -DG_LOG_DOMAIN='"$(basename $(@F))"'
TESTS           += threads    # "threads" is started by make check
PERFTESTS       += threads    # "threads --test-perf" is started by make perf
==========tests/threads.cc====================================================
/* --- sample test function --- */
static void
test_threads (void)
{
  TSTART ("C++OwnedMutex");
  TASSERT (NULL != &Thread::self());
  static OwnedMutex static_omutex;
  static_omutex.lock();
  TASSERT (static_omutex.mine() == true);
  static_omutex.unlock();
  TASSERT (static_omutex.mine() == false);
  TDONE();
}
/* --- an automatic test session setup is constituted by main() --- */
int
main (int   argc,
      char *argv[])
{
  birnet_init_test (&argc, &argv); // does arg parsing etc.
  test_threads();
  test_atomic();
  if (init_settings().test_perf)   // true for --test-perf
    {
      bench_auto_locker_cxx();
      bench_other_stuff();
    }
  return 0;
}
==========stdout of simple test run (brief)===================================
TEST: threads		# printed by birnet_init_test()
C++OwnedMutex: [---]	# each TASSERT produces a '-'
PASS: threads		# printed by make check
==========


also, i've spent some thoughts on the things that would be nice to have under
automatic unit tests Gtk+:

- for a specific widget type, test input/output conditions of all API
  functions (only for valid use cases though)
- similarly, test all input/output conditions of the Gdk API
- try setting & getting all widget properties on all widgets over the full
  value ranges (sparsely covered by means of random numbers for instance)
- try setting & getting all container child properties analogously
- check layout algorithms by layouting a child widget that does nothing but
  checking the coordinates it's layed out at. i've played around with such
  a test item in Rapicorn. as food for thought, here's a list of the
  properties it currently supports (assertions are carried out upon exposure):
    MakeProperty (TestItem, epsilon,       "Epsilon",       "Epsilon within which assertions must hold",  DFLTEPS,   0,         +MAXFLOAT, 0.01, "rw"),
    MakeProperty (TestItem, assert_left,   "Assert-Left",   "Assert positioning of the left item edge",   -INFINITY, -INFINITY, +MAXFLOAT, 3, "rw"),
    MakeProperty (TestItem, assert_right,  "Assert-Right",  "Assert positioning of the right item edge",  -INFINITY, -INFINITY, +MAXFLOAT, 3, "rw"),
    MakeProperty (TestItem, assert_bottom, "Assert-Bottom", "Assert positioning of the bottom item edge", -INFINITY, -INFINITY, +MAXFLOAT, 3, "rw"),
    MakeProperty (TestItem, assert_top,    "Assert-Top",    "Assert positioning of the top item edge",    -INFINITY, -INFINITY, +MAXFLOAT, 3, "rw"),
    MakeProperty (TestItem, assert_width,  "Assert-Width",  "Assert amount of the item width",            -INFINITY, -INFINITY, +MAXFLOAT, 3, "rw"),
    MakeProperty (TestItem, assert_height, "Assert-Height", "Assert amount of the item height",           -INFINITY, -INFINITY, +MAXFLOAT, 3, "rw"),
    MakeProperty (TestItem, fatal_asserts, "Fatal-Asserts", "Handle assertion failures as fatal errors",  false, "rw"),
- create all widgets with mnemonic constructors and check that their
  activation works.
- generically query all key bindings of stock Gtk+ widgets, and activate them,
  checking that no warnings/criticals are generated.
- create a test rcfile covering all rcfile mechanisms that's parsed and who's
  values are asserted in the resulting GtkStyles.
- for all widget types, create and destroy them in a loop to:
  a) measure basic object setup performance
  b) catch obvious leaks
  (these would be slowcheck/perf tests)


as always, feedback is appreciated, especially objections/concerns
regarding the ideas outlined ;)

---
ciaoTJ



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]