Re: [BuildStream] Proposal: Add support for running tests in BuildStream

From: Chandan Singh <chandan chandansingh net>
To: Tristan Van Berkom <tristan vanberkom codethink co uk>
Cc: paul sherwood codethink co uk, buildstream-list gnome org
Subject: Re: [BuildStream] Proposal: Add support for running tests in BuildStream
Date: Wed, 1 Aug 2018 18:17:57 +0100

Hi Tristan,

On Fri, Jul 27, 2018 at 8:21 AM Tristan Van Berkom
<tristan vanberkom codethink co uk> wrote:


On Mon, 2018-07-23 at 18:28 +0100, Chandan Singh wrote:

Hi Tristan,


Hi again Chandan,

As you can probably see it's been a busy week, but this proposal is
also important. I'll probably be distracted with the upcoming release a
lot, I'll try not to drop the ball on this thread but if I do, please
don't hesitate to ping me and remind me :)


Now I have to share some of the blame too as I wanted to reply to this thread
earlier but caught up in other issues.

Hi again Chandan,

This is a lot to think about, I took the time to try to break down
these requirements and brainstorm a bit further...
Ok so there are a few main points I can glean from this, I'll try to
condense them here:

 a.) We don't want tests to block reverse dependency builds.

 b.) We want to ensure that tests are passing before deploying
     something (I think in your case it's a .deb, for other use cases
     it can be another packaging format, a full system firmware, etc).

 c.) We want to structure things such that we are sure which backing
     element is associated to a failing test, this should be clearer
     and stronger than a naming convention.

 d.) We want to minimize the amount of .bst files and the amount of
     YAML which needs to be maintained.

 e.) Tests often require a different execution environment and set
     of dependencies

     E.g. many tests found in the modules which make up a typical linux
     system require external services to run, like an X server or a
     database.

In addition to your points, I would add:

 f.) We want to be able to run many kinds of tests, e.g. some tests
     are provided by upstream module maintainers to test a specific
     module, other tests may involve launching a VM and capturing
     screenshots.

 g.) With caching of build trees on the horizon, we probably want some
     semantic allow reuse of that build tree for testing purposes.


Thanks for taking the time to break down my message into more concrete
requirements. (g) is definitely important as otherwise we would waste a
considerable amount of time in re-staging the sources and dependencies for the
tests.

Let's skip over (a), I think all solutions aside from running tests
inside the BuildElement as a workaround would cater to (a).

Focusing on (b) and (c), I would imagine that a natural way to
structure a package deploying project would have a pipeline structured
similar to this (with (B)uild, (T)est, (P)ackage and (S)tack elements):

     B(foo)     (additional test deps)
       |  \       /
       |    \    /
       |    T(foo)
       |    /
       |  /
     P(foo)
       |
       |
  S(main target)

It appears that satisfying (b) is very easy, we just need to make
packaging of "foo" contingent not only on building of "foo", but also
on testing of "foo".

Addressing (c) I think is a matter of having a strong relationship
between the test and what element that test is testing (although at
this point there are many ways of looking at the problem).

To brainstorm this, I would suggest that the potential test Element
plugin have the following properties:

   o Tests a specific element
   o Can add dependencies for the sake of testing, these must be build
     only dependencies (so they are not propagated forward).
   o Can optionally stage the build tree of the built element
     - needed for `make check` style tests, not necessary for other
       integration style tests
     - satisfy (g) as an optimization (no need to rebuild just to run
       make check)
     - alternatively without (g), needs to stage the sources of the
       element it's testing
   o The output of the test element is exactly the same output
     as the build element itself (similar to a filter element without
     the filtering, acts as a "pass through" but with a check/test)

With a test Element designed like this, the dependency graphs can be
simplified and I think we manage to address (c) by ensuring that
semantically, the test Element "tests a specific element".

     B(foo)  (additional test deps)
       |     /
       |   /
     T(foo)
       |
       |
     P(foo)
       |
       |
  S(main target)

Orthogonal BuildElements which depend on "foo", can depend directly on
B(foo), while deployments of "foo" should depend on T(foo), a failure
of T(foo) is always clearly a failure related to B(foo), there is no
room for doubt.


I'm not sure about the part where you say that orthogonal build elements should
directly depend on B(foo). If T(foo) fails, how would foo's dependents know
about that? This correctly addresses the problem where P(foo) fails if T(foo)
fails but what about P(baz) that depends on B(baz). How can we ensure that it
also fails when T(foo) fails?


Right, you are making the assumption that it is important to deny the
existence of an artifact, if some associated testing has passed or not,
I am asserting that this is false.

The point is that:

* The existence of the BuildElement's artifact is a statement that the
  code has built successfully.

* The existence of the testing element's artifact is a statement that
  the built code has also been tested.

* BuildElements which depend on other BuildElements for the purpose of
  building, do not require that other testing elements have also
  passed.

  Note: this is what affords you the parallelism of builds not blocking
        on the testing of elements you depend on.


This is the part where I am not very convinced :) The existence of the
BuildElement's artifact is surely a statement that the code has been built
successfully. But as a user what I care about the most is not whether or not
it is built successfully but rather "is the output integratable?".
"integratable" here means that can other elements safely depend on the output
of this element. I think elements other than "package" elements should also
have a way of asserting this.

The test elements can surely act as a pass-through but if all dependents of
B(foo) have to now depend on T(foo) instead in order to assert that, I am not
sure if we have solved the problem.

How would you feel about adding something like the following to the list of
requirements you originally proposed?

h.) Elements should be able to assert that the tests for their dependencies are
    passing before they declare their own builds to be a success.

If we satisfy this requirement, then we can guarantee that when anything
declares a successful build, all its tests have passed and one does not have
to look to another element to find out about the status of tests.

* A potential Test element could have the convenience of providing pass
  through behavior for the content of the element it is testing, such
  that a `stack` element or some other element involved later in the
  pipeline (closer to deployment of tested software).

  This kind of convenience just makes things easier to express later
  on, passthrough behavior of dependencies would just allow:

    - Depending on the test element, basically a statement that
      "this has been tested"
    - Still obtain the same data and implied dependencies which
      the originating element provided.

  This passthrough behavior seems useful in a general sense for some
  plugins, while this part might be a bit tricky to implement, it will
  certainly be less tricky than having one element have alternative
  sets of dependencies and cache keys.

Part of the flexibility BuildStream affords is due to plugins not
becoming too complex on their own, try to think more about how multiple
elements can come together to form a pipeline, rather than trying to
add more functionality to base classes, this is always preferred.

The orthogonal, but very related parts I've been mentioning about
enhancing the YAML format to make it easier for you to express more
elements with the same amount of YAML, is to help in situations where
you want to create complex pipeline constructs, but currently requires
too many individual YAML files.


I am very much up for condensing multiple related element definitions into a
single YAML file but I am starting to think that we should consider that as an
orthogonal issue (as you said above) at this point and have that discussion
separately so that we can focus this conversation on testing.

Unless I'm missing something, this only leave the problem of (d), which
can at least be thought of as a completely orthogonal problem. Solving
(d) is of high value on it's own I think as it should let us express
complex pipelines more easily.

As an example of what could be possible with a little YAML-foo, we
should be able to express B(foo), T(foo) and P(foo) in the same foo.bst
file, and it would probably make sense that if a multi element .bst
file was referred to without a subscript, it should always mean the
last element. We could even have a grouping mode which implies a series
of elements which depend on eachother, if that makes sense.

Such a foo.bst file might look like this:

   ===============================================================
   # Specify the grouping type, lets say `sequence` means that
   # the elements depend on eachother in a sequence
   sequence

   --
   kind: autotools
   # depend on the successful *build* of bar
   depends:
   - bar.bst[0]

   --
   kind: test
   # Build depend on additional deps
   depends:
   - filename: base/mysql.bst
     type: build

   config:
     # Specify the element to test, probably in such a grouping
     # there is a way to specify this without saying "foo.bst",
     # perhaps self[0] or such could work.
     #
     test-element: foo.bst[0]
     commands:
     - make check

   --
   # In the deploy stage we probably dont need much configuration,
   # this is driven mostly by public data from the depended upon
   # elements, and if the test element is a passthrough, we already
   # implicitly depend on it due to being part of a "sequence".
   #
   kind: deb_deploy
   ===============================================================

If it is the case that a lot of the configuration in these files become
redundant (i.e. only the "name" of something changes, lots of boiler
plate), then it would be interesting to employ something like macros,
turning large parts of boiler plate into one liners.


I like the idea of having the base build definition plus other related elements
like tests, deployable etc in the same yaml file for clarity and readability so
+1 on that.

For tests specifically, my concern is that if we follow this approach, we will
end up with roughly as many 'kind' of test elements as there are build
elements, like 'make-test', 'cmake-test' etc. since most build systems support
some way of running tests using them. This also runs the risk that the build
element and the test element can potentially get out of sync.


I think we need only 2 types of test element, only one of which we
really need to think of in the scope of this proposal (other kinds of
testing elements are not related to "make check" style testing, but are
certainly still pertinent to think about).

To solve the issue you mention above, I would assert that this is
exactly what public data is for.

  * We can add a new "test-commands" member to the builtin "bst"
    public data domain.

  * The BuildElements can add new defaults to their public data,
    so each BuildElement type would now additionally add a default
    way of testing.

  * The new test element would run the commands specified by the
    element it is testing, so the test element by itself would not need
    too much configuration.


This could work. So long as BuildElements have _some_ way of providing defaults
for the tests, I think we should be good.

Another interesting thing to note here; is that most things will
absolutely fail if you start enabling "make check" tests in a
controlled integration build environment.

Every module in the Linux stack is different, their maintainers use
their `make check` style tests in different ways; and special care is
needed to make these tests work - not only by adding additional
dependencies, but probably also by starting some services.

Some tests just wont work unless you have a video output on the machine
you are testing (e.g., think of GUI apps which run tests which
initialize a toolkit that just doesnt initialize if you are on a
headless machine).


This is true but I am thinking about packages which come with simple unit tests
and not-so-complex integration tests that can fit nicely within the sandbox
with just some additional dependencies. So, even if we are not able to support
all use-cases, we can at least support the basic use-cases with sensible plugin
defaults and for more complex elements, they will have to provide their own
`test-commands` for the extra setup process.

For this, we probably need the "test" element to provide some way to
declare a "scaffold" for the test (a scaffold could start a fake X
server, or run mysqld, for the duration of a test).

In the same way that we want the test element to be strongly connected to a
specific build element, perhaps it would also make sense to somehow strongly
tie the test element types with corresponding build element types. If the same
element plugin was providing both the 'build-commands' and the 'test-commands',
it would have been simpler but I am not sure how that fits into this model.

In the end, as a user, I would like to be able to say "I am an element of kind
make, build and test me using the standard make commands", 'make check' in this
case. The case of 'make' is simpler and may not warrant an additional plugin
but such plugins would be useful for more complex test setups.


Right, I think public data addresses this part above.

In use cases where you do not deploy a "package", a later "compose" and
firmware creating element would still depend on the last element of the
groups.

In the unlikely but possible case where you want to `bst checkout` the
individual build results of each BuildElement as a part of your
production pipeline, then doing `bst checkout foo.bst` would still be a
checkout of the BuildElement content that is contingent on the tests
having passed.


Of course this is not entirely specced out, but do you see anything
with the general direction which fails to satisfy your use cases, or
the wider general use cases we should be considering for the tool ?


Thanks! IMHO such an approach could work if we address the above comments.


I hope we are getting closer, this is definitely something important we
have to think about !

Cheers,
    -Tristan


Cheers!

Chandan

-- 
Regards,
Chandan Singh
https://chandansingh.net

Follow-Ups:
- Re: [BuildStream] Proposal: Add support for running tests in BuildStream
  - From: Tristan Van Berkom

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]