Re: reftests



Hi Benjamin,

On May 3, 2011, at 10:01 PM, Benjamin Otte wrote:
> with the latest commits[1] I have added reftests to GTK. Reftests are
> my approach at getting layout and rendering behavior of gtk tested.
> I've added a bunch of tests already for the things I have fixed and
> will continue to add tests for bugs I fix. For what the test runner
> does, see the commit message in [1], for what reftests are, see [2].
> The test runner works very well, even though it is still a bit rough
> around the edges, but that's mostly because gtester needs to be made
> better to cope with generic testing. (It's way too crash-happy as-is.)

Very nice to see that we are (finally) getting testing in place for layout and rendering code!

> In this mail, I want to go into the motivation for writing reftests
> and why I didn't want to make use of the previous test infrastructure.
> I tried to achieve the following goals (if you think I could achieve
> them better, please speak up):
> - It should be easy to create tests
> - It should be easy to run tests
> - It should be easy to understand tests
> - It should be easy to fix problems shown by tests
> - The test infrastructure should easily scale

As we've already discussed on IRC some time ago, I would really like to see all GTK+ unit tests in one single place, instead of in several different places in the source code.  We really need people to run the unit tests more often and thus this needs to be made easy (like you also mention in your enumeration above), I don't think putting different unit tests at different places makes this easier.

So I think it would be good to consolidate into one location.  Some ideas below.


> - It should be easy to create tests
> Writing a test is something people hate to do. It's the #1 reason why
> Open Source projects don't write tests. Also, it's the #1 reason why
> bugs aren't fixed. If people would file bugs with easy to reproduce
> tests instead of saying "in my custom application, when I do X, Y
> happens and not Z", there'd be a much higher chance developers would
> be interested in looking at it.
> This is why the reftests use stock ui files that can be created in
> Glade. So everyone that is able to use Glade can create a test file.
> And we can just use it.

Agreed.  For all different components of GTK+, we need to think on how to make it easy to write tests.  I did this for the filter model in the past and I actually receive additional tests in bugzilla now (which I am in the process of reviewing).


> - It should be easy to run tests
> It's quite hard to get someone to run a test. It requires compilation
> of a GTK checkout. That is not good.

We can always distribute the unit tests as a separate tarball if that will help, can't we?

> For a developer, too, it's quite complicated to run a test from
> someone else, say from bugzilla or a pastebin. Either you have to
> invoke gcc manually or you have to integrate it into the testsuite
> infrastructure.
> With reftests, you dump the ui file somewhere and run
> tests/reftests/gtk-reftest path/to/file.ui and that's it. You can then
> spend the rest of the day updating the testcase wherever you want, and
> pastebin or mail it back and forth with whoever you work on the test
> together.

Of course this will work fine with glade files, but I don't see how this makes it easier to run other kinds of tests.

Another question: why was gtk-reftest put in gtk+/tests/reftests/gtk-reftest instead of in gtk+/gtk/tests/gtk-reftest, with a subdirectory reftests containing the glade files?  Then on make check for the GTK+ unit tests, the reftests would automatically be executed as well.  Currently, you also need to compile a GTK+ checkout to use reftests, right?


> - It should be easy to understand tests
> Here's an example output from the current testsuite:
>  /FilterModel/filled/hide-root-level:
>  ** ERROR **: Signal queue empty
>  aborting...
> It's hard to understand what might be broken. The output from current
> tests is both sparse and not very informative. If somebody came into
> IRC and said he ran make check and got this, I doubt anybody would
> know how to fix it.

This error is very easy to improve, for example, 4 lines down in the source code are "expected this, got that" error messages.

I think your actual point is that the output of GTest can be significantly improved.  These filter model errors are just done with separate g_error() and g_assert_not_reached() calls, because GTest did not provide API for outputting more elaborate diagnostics about test failures.  I have a similar case in the scrolling tests for tree view:

        g_assert (allocation.y == rect.y + ((rect.height - allocation.height) / 2));

The output of this failed assertion is not really nice to the eyes.  It would be nice if the assertion macros could be improved to also accept a human-readable string of what's going wrong together with the expected and received value.  But perhaps this is already present in the gtestutils and I missed it.

In any case, we can improve gtestutils here and we should really try to do that :)

> - The test infrastructure should easily scale
> This is mostly a question about how to organize a test suite so that
> people actually run it. Or at least run the parts that are relevant to
> them and an automatic testing infrastructure can do the full run and
> actually produce useful output to developers of something fails. So
> far we're pretty bad at this. Our patented test runner named Dan
> Winship interacts with the developers by reopening bugs with a bit of
> output from stderr. That works for now, but I'm not sure that test
> runner wants to scale.
> To give everyone a clue for what I'm aiming at:
> * The Swfdec testsuite contains 2.500+ tests. It takes 3 minutes to run.
> * The cairo testsuite contains 350 tests. It takes about 10 minutes to
> run for a normal run. A full run easily takes an hour.
> * The Webkit testsuite contains 20.000+ tests. It takes 15-20 minutes
> to run them all.
> So from looking at those numbers (and I didn't include Mozilla because
> I couldn't find any numbers - but they would be frightening) I would
> guess that a "proper" GTK testsuite should contain 10.000+ tests and a
> full run would take at least 10 minutes. And in there, it should be
> easy to identify tests, run some of them and generate useful outputs.
> In particular, it should be easy to skip it.

These sounds like numbers I would expect.  What in GTest would need improvement to realize this?

About organization, I think for one all GTK+ unit tests should be in one place (and the GDK tests in another place). 

Secondly, we also need to develop a consistent naming scheme for tests.  Unit tests currently have different ways of naming tests:

/FilterModel/self/verify-test-suite:
/expander/click-expander:
/recent-manager/get-default:
/tests/column-new: 			 (these are for icon view)
/Builder/Window:

etc, etc.



regards,

-kris.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]