reftests

From: Benjamin Otte <otte gnome org>
To: gtk-devel-list <gtk-devel-list gnome org>
Subject: reftests
Date: Tue, 3 May 2011 22:01:04 +0200
Hey,

with the latest commits[1] I have added reftests to GTK. Reftests are
my approach at getting layout and rendering behavior of gtk tested.
I've added a bunch of tests already for the things I have fixed and
will continue to add tests for bugs I fix. For what the test runner
does, see the commit message in [1], for what reftests are, see [2].
The test runner works very well, even though it is still a bit rough
around the edges, but that's mostly because gtester needs to be made
better to cope with generic testing. (It's way too crash-happy as-is.)

In this mail, I want to go into the motivation for writing reftests
and why I didn't want to make use of the previous test infrastructure.
I tried to achieve the following goals (if you think I could achieve
them better, please speak up):
- It should be easy to create tests
- It should be easy to run tests
- It should be easy to understand tests
- It should be easy to fix problems shown by tests
- The test infrastructure should easily scale

That's the TL;DR version, here is the long one:

- It should be easy to create tests
Writing a test is something people hate to do. It's the #1 reason why
Open Source projects don't write tests. Also, it's the #1 reason why
bugs aren't fixed. If people would file bugs with easy to reproduce
tests instead of saying "in my custom application, when I do X, Y
happens and not Z", there'd be a much higher chance developers would
be interested in looking at it.
This is why the reftests use stock ui files that can be created in
Glade. So everyone that is able to use Glade can create a test file.
And we can just use it.

- It should be easy to run tests
It's quite hard to get someone to run a test. It requires compilation
of a GTK checkout. That is not good.
For a developer, too, it's quite complicated to run a test from
someone else, say from bugzilla or a pastebin. Either you have to
invoke gcc manually or you have to integrate it into the testsuite
infrastructure.
With reftests, you dump the ui file somewhere and run
tests/reftests/gtk-reftest path/to/file.ui and that's it. You can then
spend the rest of the day updating the testcase wherever you want, and
pastebin or mail it back and forth with whoever you work on the test
together.

- It should be easy to understand tests
Here's an example output from the current testsuite:
  /FilterModel/filled/hide-root-level:
  ** ERROR **: Signal queue empty
  aborting...
It's hard to understand what might be broken. The output from current
tests is both sparse and not very informative. If somebody came into
IRC and said he ran make check and got this, I doubt anybody would
know how to fix it. Or be interested in actually fixing what is wrong.
So it is important that tests provide output that is easy to digest
and get a hunch of what is actually wrong. Which is why gtk-reftest
outputs images - the reference rendering of the expected output[3],
the actual rendering[4] and the difference between those[5]. And it
should be reasonably easy to find the difference between them and get
an idea of what is wrong (Pango doesn't ellipsize every row, only the
last one. Bad Pango - and Behdad hasn't even applied my patch for
this, I need to poke him again as I've just committed that test,
ooops.)

- It should be easy to fix problems shown by tests
This is really a combination of the previous points, but deserves
separate mention: If a test regresses in a year or so and the original
author has left to work on Libreoffice, Mozilla or other exciting
jobs, it should be easy for the current developer to fix the problem.

- The test infrastructure should easily scale
This is mostly a question about how to organize a test suite so that
people actually run it. Or at least run the parts that are relevant to
them and an automatic testing infrastructure can do the full run and
actually produce useful output to developers of something fails. So
far we're pretty bad at this. Our patented test runner named Dan
Winship interacts with the developers by reopening bugs with a bit of
output from stderr. That works for now, but I'm not sure that test
runner wants to scale.
To give everyone a clue for what I'm aiming at:
* The Swfdec testsuite contains 2.500+ tests. It takes 3 minutes to run.
* The cairo testsuite contains 350 tests. It takes about 10 minutes to
run for a normal run. A full run easily takes an hour.
* The Webkit testsuite contains 20.000+ tests. It takes 15-20 minutes
to run them all.
So from looking at those numbers (and I didn't include Mozilla because
I couldn't find any numbers - but they would be frightening) I would
guess that a "proper" GTK testsuite should contain 10.000+ tests and a
full run would take at least 10 minutes. And in there, it should be
easy to identify tests, run some of them and generate useful outputs.
In particular, it should be easy to skip it.

So, this got longer than I expected it to get. So I better close now. Questions?

Benjamin

PS: Credit for this test runner goes to David Baron, Robert
O'Callahan, Carl Worth, Sandro Santilli who inspired me to spend more
time on testing and actually like it.


1: http://git.gnome.org/browse/gtk+/commit/?id=363dbb60397ebf683d8a97ae15517030c27357d7
2: http://weblogs.mozillazine.org/roc/archives/2008/12/reftests.html
3: http://people.freedesktop.org/~company/stuff/label-fun.ref.png
4: http://people.freedesktop.org/~company/stuff/label-fun.out.png
5: http://people.freedesktop.org/~company/stuff/label-fun.diff.png
Follow-Ups:
- Re: reftests
  - From: Behdad Esfahbod
- Re: reftests
  - From: Kristian Rietveld
- Re: reftests
  - From: Benjamin Otte
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]