Gtk+ unit tests (conclusions)

Hi all,

after the brainstorming on GTK+ unit tests I think it is time to start
getting some conlusions. Generally, it seems that people is ok with
Tim's proposal although there are some points which have led to debate.

In this mail I first tried to summarize Tim's proposal for reference,
then I tried to point out the current state of the opened debates around
it, and finally, I added some other additions/suggestions/doubts/issues
dropped by people along the brainstorming. 

Regarding the opened debates, there are some important issues that still
need to reach a decision, like deciding whether to go with a testing
framework like Check or without it, so any opinions on these topics will
be very appreciated.

Note: please, keep in mind that I tried to summarize lots of info, so,
although I did my best to gather all opinions, it is possible that I
forgot something or made some mistakes. If you think I forgot something
or made a mistake, excuse me, and feel free to add/fix whatever you
think should be added/fixed. 


1.- About tests output:

1.1.- "All tests passed" vs "at least on test failed" instead a list of
passed and failed tests.
1.2.- Use a progress indicator for slow tests.
1.3.- Homogeneous/consistent test output. For performance measurements,
provide canonical and machine parsable output.

2.- About tests implementation:

2.1.- Provide make targets to split up tests (check, slowcheck, perf).
2.2.- Fork tests that test abort()-like bahavior.
2.3.- Fork time bound tests, aborting and failing them once a timeout
has passed.
2.4.- Pass main() arguments to tests.
2.5.- Tests that produce a "CRITICAL **:" or  "WARNING **:" message
should fail.
2.6.- Use G_LOG_DOMAIN properly.

3.- About the testing framework

3.1.- Do not add a new dependency in GTK+, instead of using an existing
testing framework like Check, develop a common set of reduced features
that would be needed for unit testing. These features would be:

1) An initialization function that takes care of things like calling
gtk_init(), preparsing arguments, setting CRITICALS/WARNINGS as fatal,
2) Register all widget types provided by Gtk+.
3) Fork off a test and assert it fails in the expected place.
4) A fork-off and timeout helper function.
5) Helper macros to indicate test start/progress/assertions/end.
6) Output formatting function.

4.- Things that would be worth testing:

4.1.- For a specific widget type, test input/output conditions of all
API functions (only for valid use cases) for both Gtk and Gdk.
4.2.- Try setting & getting all widget properties on all widgets over
the full value ranges (sparsely covered by means of random numbers for
4.3.- Try setting & getting all container child properties.
4.4.- Check layout algorithms by layouting a child widget and checking
the coordinates it's layed out at.
4.5.- Create all widgets with mnemonic constructors and check that their
activation works.
4.6.- Generically query all key bindings of stock Gtk+ widgets, and
activate them, checking that no warnings/criticals are generated.
4.7.- Create a test rcfile covering all rcfile mechanisms that's parsed
and who's values are asserted in the resulting GtkStyles.
4.8.- For all widget types, create and destroy them in a loop to:
   a) measure basic object setup performance
   b) catch obvious leaks
   (these would be slowcheck/perf tests)


* Regarding 1.1.: 

Some people think that a list with all passed/failed tests should, at
least, be also provided. Some comments that support this idea have been:

   - People would like to know the overall state (percentage of
   - People would like to have a list of tests that fail, so that they
can fix the issues.
   - It allows a group of people to work on fixing different issues in
   - Some people prefer a more verbose output.
   - It could save valuable developer time in some situations.

Some comments against it have been:

   - Having/letting more than one test fail and to continue work in an
   unrelated area rapidly leads to confusion about which tests are
   supposed to work and which aren't, especially in multi-contributor
   - Figuring whether the right test passed, suddenly requires scanning
of the test logs and remembering the last count of tests that may
validly fail.
   - Defeats the purpose using a single quick make check run to be
confident that one's changes didn't introduce breakage. (Tim Janik)
   - You usually have to fix the first issue before being able to move

* Regarding 2.2.:

What about testing segmentation faults? Seg. faults are not predictable,
so, in case we want to be able to get a complete list of passed/failed
tests we need to fork every single test. The downsize, of course, is
execution time.

* Regarding 2.5.: 

Some people think that there are situations where you would like to not
make those tests fail. Some comments that support this idea have been:

   - In the docs you describe API usage. Thus a test could check if in
the debug build a g_return_if_fail() is used to implement the contract
(warn & fail if NULL is passed).
   - It is worth knowing if a function handles safely the case when it
is passed invalid arguments (like a NULL pointer) or if it produces a
segmentation fault in that case.
   - Sometimes it is useful to check that a critical message was indeed
shown, and then move on.
   - Preemptively deciding it's always impossible to test resilience of
certain known warnings is a misstep.

Some comments against it have been:

   - In GLib context, once a program triggers any of g_assert*(),
g_error(), g_warning() or g_critical(), the program/library is in an
undefined state.
   - That can be implemented anyway by installing g_log handlers,
reconfiguring the fatality of certain log levels and by employing
fork-ed test mode. But these kind of tests will be rare though, and
also need to be carefully crafted.
   - Functions are simply just defined within the range specified by the
   - Occasional return_if_fail statements in glib/gtk code base are a
pure convenience tool to catch programming mistakes easier. They can be
removed in production environments.

IMHO, I think there are two issues in this debate:

   - On one hand, people want to assert some critical/warnings. That
could be done, as Tim suggested, by installing g_log handlers,
reconfiguring the fatality of certain log levels and by employing
fork-ed test mode. However, seems it is not clear when this kind of
tests should be done.

   - On the other hand, although g_return_if_fail statements have been
added only to help developers, some people think they are/should be part
of the API contract, and thus, worth testing.

* Regarding 3.1.: 

Some people think that using a testing framework like Check would be
better. Some comments that support this idea have been:

   - It does not make sense to write an own test suite framework. It is
reinventing the wheel.
   - It wouldn't be a dependency to build GTK+, it would only be a
dependency to run the tests.
   - It is not too much to ask for developers to install check.
   - Most distros have it.
   - Check is widely used and having a standard tool for testing,
instead of doing something ad-hoc, has its advantages.
   - You will need to maintain that ad-hoc framework. If new features
are needed in the future you will need to add them yourself.

Some comments against it have been:
   - Test frameworks like Check would only help us out with
3, 4 and to some extend 5 (see above the list of features that Tim
suggested the unit test framework should provide). This does not warrant
a new package dependency, especially since 5 might be highly customized
and 3 or 4 could be useful to provide generally in GLib.

           OTHER ISSUES

* How many test programs?

Ideally, one test program per component makes developer's life easier,
cause it would allow developers to run only the tests for the components
they are interested in/working on. I think this is a very important
issue if we finally make the test suite to fail when we detect the first
failed test. On the other hand, the more test programs we have, the more
it will take to build cause libtool needs to relink all the test
programs again when there is a change in the lib, which can become a
really endless process if we have too many test programs. 

I guess we need to reach some kind of agreement to organize groups of
tests so we do not generate too few nor too many test programs.

* Code coverage

Some people suggested to include code coverage statistics (gcov,
lcov,etc.). Nobody seemed to be against this.

* Adding testing-only code to the lib

Adding conditionalized testing only code to the lib could be useful to
get effective tests under certain situations. However, for a project of
the size and build time of Gtk+, with a quite large legacy code base, it
can be a too high price.

Quite related to the above issue is the idea of adding the tests to the
files with the code being tested, as Nautilis does. The problem would be
the file growth and the fact that GTK+ already has some quite big files.
Also, some people added that the additional cruft the tests would add to
the files is rather distracting, so they prefer them to be in separate

* Misc proposals/concerns

This is a misc set of minor recomendations/doubts or suggestions for
future work beyond the scope of unit tests.

- Use AT-SPI (functional tests and accessibility support).
- Clearly document the purpose of each test.
- Develop use-case tests.
- Is signal emission part of the API contract? should it be tested?


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]