Re: Gtk+ unit tests (brainstorming)

From: Stefan Kost <ensonic hora-obscura de>
To: Tim Janik <timj imendio com>
Cc: Gtk+ Developers <gtk-devel-list gnome org>
Subject: Re: Gtk+ unit tests (brainstorming)
Date: Tue, 31 Oct 2006 21:54:41 +0200
Hi Tim,

Tim Janik wrote:
> Hi all.
>
> as mentioned in another email already, i've recently worked on improving
> unit test integration in Beast and summarized this in my last blog entry:
>    http://blogs.gnome.org/view/timj/2006/10/23/0 # Beast and unit testing
>
>   
I did a presentation on check base unit tests on last guadec. Here are
the slides:
http://www.buzztard.org/files/guadec2005_advanced_unit_testing.pdf

I have made good experience with check in gstreamer and also in
buzztard. IMHO it does not make sense to write an own test suite. The
unit tests are optional and IHMO its not too mauch to ask for developers
to install check - maybe print a teaser for people that build CVS code
and don't have check installed. And btw. most distros have it.
> while analysing the need for a testing framework and whether it makes sense
> for GLib and Gtk+ to depend on yet another package for the sole purpose of
> testing, i made/had the following observations/thoughts:
>
> - Unit tests should run fast - a test taking 1/10th of a second is a slow
>    unit test, i've mentioned this in my blog entry already.
>   
In the slides we were talking about the concept of test aspects. You do
positive and negative tests, performance and stress tests. It makes
sense to organize the testsuite to reflect this. It might also make
sense to have one test binary per widget (class). This way you can
easily run single tests. IHMO its not big deal if tests run slow. If you
have a good test coverage, the whole test run will be slow anyway. For
that purpose we have continuous integrations tools like buildbot. That
will happily run your whole testsuite even under valgrind and bug the
developer per IRC/mail/whatever.
> - the important aspect about a unit test is the testing it does, not the
>    testing framework matter. as such, a testing framework doesn't need to
>    be big, here is one that is implemented in a whole 4 lines of C source,
>    it gets this point across very well: ;)
>      http://www.jera.com/techinfo/jtns/jtn002.html
>   
True, but reinventing the wheel is usually just means repeating errors.
> - in the common case, test results should be reduced to a single boolean:
>      "all tests passed" vs. "at least one test failed"
>    many test frameworks provide means to count and report failing tests
>    (even automake's standard check:-rule), there's little to no merit to
>    this functionality though.
>    having/letting more than one test fail and to continue work in an
>    unrelated area rapidly leads to confusion about which tests are
>    supposed to work and which aren't, especially in multi-contributor setups.
>    figuring whether the right test passed, suddenly requires scanning of
>    the test logs and remembering the last count of tests that may validly
>    fail. this defeats the purpose using a single quick make check run to
>    be confident that one's changes didn't introduce breakage.
>    as a result, the whole test harness should always either succeed or
>    be immediately fixed.
>   
Totally disagree. The whole point in using the fork-based approach
together with setup/teardown hooks is to provide a sane test environment
for each case. When you run the test suite on a build bot, you want to
know about the overall state (percentage of pass/fail) plus a list of
tests that fail, so that you can fix the issues the test uncovered.
GSteamer and many apps based on it use a nice logging frameworks (that
also previously has been offered for glib integration (glog)). The tests
create logs that help to understand the problem.
> - for reasons also mentioned in the afformentioned blog entry it might
>    be a good idea for Gtk+ as well to split up tests into things that
>    can quickly be checked, thoroughly be checked but take long, and into
>    performance/benchmark tests.
>    these can be executed by make targets check, slowcheck and perf
>    respectively.
>
> - for tests that check abort()-like behvaior, it can make sense to fork-off
>    a test program and check whether it fails in the correct place.
>    allthough this type of checks are the minority, the basic
>    fork-functionality shouldn't be reimplemented all over again and warrants
>    a test utility function.
>   
This is available in 'check'.
> - for time bound tasks it can also make sense to fork a test and after
>    a certain timeout, abort and fail the test.
>   
This is available in 'check'.
> - some test suites offer formal setup mechnisms for test "sessions".
>    i fail to see the necessity for this. main() { } provides useful test
>    grouping just as well, this idea is applied in an example below.
>   
See above.
> - multiple tests may need to support the same set of command line arguments
>    e.g. --test-slow or --test-perf as outlined in the blog entry.
>    it makes sense to combine this logic in a common test utility function,
>    usually pretty small.
>   
Agree here. The tests should forward their argc/argv (except when thei
test argc/argv handling).
> - homogeneous or consistent test output might be desirable in some contexts.
>    so far, i've made the experience that for simple make check runs, the most
>    important things are that it's fast enough for people to run frequently
>    and that it succeeds.
>    if somewhat slowly perceived parts are hard to avoid, a progress indicator
>    can help a lot to overcome the required waiting time. so, here the exact
>    oputput isn't too important as long as some progress is displayed.
>    for performance measurements it makes sense to use somewhat canonical
>    output formats though (ideally machine parsable) and it can simplify the
>    test implementations if performance results may be intermixed with existing
>    test outputs (such as progress indicators).
>    i've mentioned this in my blog entry as well, it boils down to using a
>    small set of utility funcitons to format machine-detectable performance
>    test result output.
>   
The std. check output is not very verbose, but it can log to xml in
addition.
> - GLib based test programs should never produce a "CRITICAL **:" or
>    "WARNING **:" message and succeed. the reasoning here is that CRITICALs
>    and WARNINGs are indicators for an invalid program or library state,
>    anything can follow from this.
>    since tests are in place to verify correct implementation/operation, an
>    invalid program state should never be reached. as a consequence, all tests
>    should upon initialization make CRITICALs and WARNINGs fatal (as if
>    --g-fatal-warnings was given).
>   
Wrong. In the docs you describe API usage (e.g. it is not valid to pass
NULL to this funtion). Thus a test could check if in the debug build a
g_return_if_fail() is used to implement the contract (warn & fail if
NULL is passed). Such tests are called blackbox tests. Testers know the
API and its docs and the tests verify that the API docs are in sync with
the implementation.
> - test programs should be good glib citizens by definineg G_LOG_DOMAIN, so
>    WARNING, CRITICAL, and ERROR printouts can correctly indicate the failing
>    component. since multiple test programs usually go into the same directory,
>    something like DEFS += -DG_LOG_DOMAIN='"$(basename $(@F))"' (for GNU make)
>    or DEFS += -DG_LOG_DOMAIN='"$@"' (for portable makefiles) needs to be used.
>
>   
I would love to see gtk having a check based test suite + coverage
reports + buildbot integration. I recommend to have a look at the
Makefile.am in GStreamer. It has lots of useful stuff for handling the
tests.

Stefan
References:
- Gtk+ unit tests (brainstorming)
  - From: Tim Janik
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]