Re: [BuildStream] Update CI and testing process

From: Tristan Van Berkom <tristan vanberkom codethink co uk>
To: Chandan Singh <chandan chandansingh net>, buildstream-list gnome org
Subject: Re: [BuildStream] Update CI and testing process
Date: Sun, 09 Dec 2018 16:14:54 +0900

Hi Chandan,

On Fri, 2018-12-07 at 19:56 +0000, Chandan Singh wrote:

Hi,

In this thread, I wish to document some of the problems that developers face
due to the way our CI process currently works, and propose some changes to the
way we run our tests. If people agree, I'd be happy to make these changes but
I want to ensure that we are on the same page before I do so.

[...]

Current CI process
==================

Since it may not be obvious to everyone, the way things work currently is that
we have some testsuite Docker images that are provided by the docker-images
repo [1].  These images have all the dependencies that are required to build,
run and test BuildStream. These images get rebuild frequently, but we only
update their tags in BuildStream's `.gitlab-ci.yml` when we need to pick up
some changes.  This is most common when we change versions of any dependencies.
Due to the way it is setup, we essentially have to duplicate our dependencies
in the main BuildStream repo as well as the docker-images repo, which is
error-prone and not very intuitive.

From what I understand, there are two reasons for doing this:

a. Since we don't need to install anything during the testing phase, it saves
   some time.


We don't really care about (a), it might be of minor concern but was
never a motivation behind our setup.

b. It ensures that we always run our tests with pre-determined versions of our
   dependencies.


That is correct, except this phrasing of (b) is a dramatic
understatement.

The main reason is this:

   For any given run of CI, the conditions are always exactly the same
   in CI, unless we commit a change to .gitlab-ci.yml which explicitly
   changes the conditions and test environment.

   In other words: Only a commit to the BuildStream repository can ever
   be allowed to result in a pipeline failing.

An unacceptable situation, is where CI fails for reasons outside of our
control; or at least in that case we should not have a CI failure
reported, but rather an error in the gitlab infrastructure instead.

I hope you first understand how paramount and important this
consideration is compared to any considerations of convenience in
updating the images.

Problems
========

Local reproduciblity
~~~~~~~~~~~~~~~~~~~~

It is not obvious which versions of dependencies the teststuite images are
running at a given tag. This makes it harder for anyone to debug CI failures
locally. It also makes it quite common for developers to hit the classic
problem of "works on my laptop but fails on CI".

Other than this, our `.gitlab-ci.yml` already is quite complicated in general,
which makes it hard for one to locally do "exactly whatever the CI is doing".


It will never be possible to reproduce the CI exactly locally without
using the docker image.

Ideally, running the test suites locally *always passes*, but we should
accept that a local installation will always be different from the
exact conditions we have explicitly programmed into the different CI
runs.

I should add that we are currently aware of exactly 2 issues where
tests are behaving differently in CI than on developer laptops:

  * Race condition which happens in test_push_pull
    https://gitlab.com/BuildStream/buildstream/issues/144

    This appears to be intermittent, it rarely fails on developer
    setups and never fails in CI.

    There were some other race conditions which were failing in CI
    but they were fixed.


  * pylint behaves differently depending on what is installed
    https://gitlab.com/BuildStream/buildstream/issues/570

    This issue is very very frustrating for everyone, we cannot
    have a minimal bound dependency (i.e. pytest >= 1.2.3) on the
    linter, OR various dependencies of the linter, because the linter
    and it's dependencies are not API stable by a long shot.

    Updates to the linter and its dependencies cause new errors to
    occur, which means that a given commit in BuildStream history
    which passes it's own tests last year does not pass its tests
    today, depending on what you've upgraded.


I've spent some energy trying to track down what exact versions of what
libraries need to be installed in order to reproduce the successful
test runs in CI, but I was still not able to reproduce a successful run
on my laptop in a long time.

I was hoping to find out what we needed to control *before* fixing the
new linting errors; so that we can pin things in place and be 100% sure
that linting behavior never changes on arbitrary developer laptops, but
I have recently been considering just fixing the most recent errors and
missing out on the opportunity to fix it properly, because the
situation is very, very severe.

Updating versions of dependencies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since we have a setup.py in the main repo, folks are often tempted to change
that directly when restricting the version of any dependency. However, if the
testuite images don't already have the required version, the build will just
fail. This has often been a source of confusion in the past. A recent example
is this discussion on !909 [2].

The "right" way to do this today is to make the change in the docker-images
repo first, wait for it to be merged, find the new tag, and finally use the
new tag in your MR on the main BuildStream repo. This process is more complex
than it needs to be, but more importantly makes it harder to test new versions
of dependencies as they have to go via the docker-images repo.


I know it is complicated, but it has to ensure the golden rule: that
nothing every changes at all in a CI setting from one run to the next,
except for changes introduced via the BuildStream commits under test.

I think it is currently exactly as complicated as it has to be.

Proposed solution
=================

There are a few things that I think we can do to reduce the responsibility of
the testuite images, and do everything in the main BuildStream repo to avoid
unnecessary complexity.

1. Do not install any python packages in the testsuite images. This will get
   rid of the hack (`pip install BuildStream && pip uninstall BuildStream`)
   that we currently use to install BuildStream's dependencies, and we won't
   need to maintain a duplicate copy of `dev-requirements.txt`.

2. Add `requirements.in` file to list our dependencies, that we currently do in
   `setup.py`.

3. Generate a `requirements.txt` file based on `requirements.in` that will have
   frozen versions of all dependencies.

4. Similarly, have `dev-requirements.in` and `dev-requirements.txt` for test
   dependencies.

5. Install `requirements.txt` and `dev-requirements.txt` before running any
   tests.

The `.in` files will provide the minimum version requirements, while the `.txt`
files will list exact versions of the packages that are known to work.



The only way for this to possibly work, is to pin every single python
dependency to an exact version, including indirect dependencies
implicitly pulled in via other dependencies.

This cannot fly because at least for the runtime dependencies (i.e. not
the dependencies needed *only* for tests, but those required to run
BuildStream), need to offer some measure of flexibility for distro
package maintainers (i.e. the outward statement of BuildStream needs to
be "We need >= x.y.z of package FOO", and let distro packagers decide
which exact version they will use).



Another variation of your plan, might be to have a completely separate
set of *specific* dependencies for running tests, and maintain that
separately from the official *minimal bound* dependencies we ship to
users and distro package maintainers.

However, I do not know that I trust us to always capture every version
of every dependency in such a file, and it also makes *sense* that
running `make check` on a developer laptop uses whatever is there; and
that in CI, we concoct a variety of situations and different distro
versions where different versions of things are there to test a variety
of likely installation combinations.

Cheers,
    -Tristan

Follow-Ups:
- Re: [BuildStream] Update CI and testing process
  - From: Chandan Singh
- Re: [BuildStream] Update CI and testing process
  - From: Chandan Singh

References:
- [BuildStream] Update CI and testing process
  - From: Chandan Singh

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]