Re: [BuildStream] Update CI and testing process

From: Chandan Singh <chandan chandansingh net>
To: tristan vanberkom codethink co uk, buildstream-list gnome org
Subject: Re: [BuildStream] Update CI and testing process
Date: Sat, 29 Dec 2018 13:59:37 +0000

Hi Tristan,

Apologies for the late reply. I haven’t completely dropped the ball on this one, it just got delayed due to various seasonal happenings :)

I’ll try summarize what we briefly talked about on IRC yesterday. Then, I’ll also leave inline replies to some of your comments.

First, we think that having something like `tox` as a frontend for running our tests will be useful for the following reasons:

- More consistent testing environment across CI and developer laptops

- Less chance of host contamination by disallowing site packages

- Less boilerplate code for installing dependencies, etc

- Dependency requirements will be closer to the BuildStream code, and not in a separate repo

To get there, I have created MR !1027 [1]. It is marked as WIP, but it should be ready for review. The reason it is in WIP state is because before landing this, I would like to land the corresponding changes in the buildtream-docker-images [2] repo first. Once that happens, the testsuite images will no longer provide pure-Python runtime dependencies of BuildStream, but rather provide the non-Python build dependencies of our Python dependencies. Both MRs are in a merge-able state but I wanted to check with the community before making this change.

So, any feedback on this thread or the MR will be greatly appreciated.

In a later work, we can also do the following enhancements:

* Provide a way to update the `.txt`, which will be useful for picking up new versions, or when the corresponding `.in` file is changed.

* Other than running tests against frozen requirements, also attempt to run tests against latest versions to check if they are compatible. This will help us catch failures earlier in the cycle.

* Provide a way to run tests locally inside Docker, mimicking what happens in a CI job.

I haven’t done these as part of my current MR to keep the reviews more manageable but they can surely be added later.

---

Now the inline bits...

On 9 Dec 2018, at 07:14, Tristan Van Berkom <tristan vanberkom codethink co uk> wrote:

<snip>

b. It ensures that we always run our tests with pre-determined versions of our
   dependencies.

That is correct, except this phrasing of (b) is a dramatic
understatement.

The main reason is this:

  For any given run of CI, the conditions are always exactly the same
  in CI, unless we commit a change to .gitlab-ci.yml which explicitly
  changes the conditions and test environment.

Agreed. And, I think you would agree that the proposed approach of having both the loose and frozen requirements keeps this behavior.

In other words: Only a commit to the BuildStream repository can ever
be allowed to result in a pipeline failing.

An unacceptable situation, is where CI fails for reasons outside of our
control; or at least in that case we should not have a CI failure
reported, but rather an error in the gitlab infrastructure instead.

I agree in principle, but it’s worth noting that it might not be practical to completely guarantee this. For example, if Docker Hub is down, we won’t be able to pull any images, and thus not be able to run any tests. I would say that keeping the number of external dependencies in CI is definitely desired. In this case, PyPI is already a dependency of our pipelines, so we won’t be adding any new dependencies.

I hope you first understand how paramount and important this
consideration is compared to any considerations of convenience in
updating the images.

Problems
========

Local reproduciblity
~~~~~~~~~~~~~~~~~~~~

It is not obvious which versions of dependencies the teststuite images are
running at a given tag. This makes it harder for anyone to debug CI failures
locally. It also makes it quite common for developers to hit the classic
problem of "works on my laptop but fails on CI".

Other than this, our `.gitlab-ci.yml` already is quite complicated in general,
which makes it hard for one to locally do "exactly whatever the CI is doing".

It will never be possible to reproduce the CI exactly locally without
using the docker image.

Ideally, running the test suites locally *always passes*, but we should
accept that a local installation will always be different from the
exact conditions we have explicitly programmed into the different CI
runs.

This will always be true, but we can attempt to come close and let the perfect be the enemy of the good. Also, providing a way to run tests inside Docker might not be that far-fetched, and I think should be doable. I can see that as a possible enhancement, as mentioned above.

I should add that we are currently aware of exactly 2 issues where
tests are behaving differently in CI than on developer laptops:

* Race condition which happens in test_push_pull
   https://gitlab.com/BuildStream/buildstream/issues/144

   This appears to be intermittent, it rarely fails on developer
   setups and never fails in CI.

   There were some other race conditions which were failing in CI
   but they were fixed.

* pylint behaves differently depending on what is installed
   https://gitlab.com/BuildStream/buildstream/issues/570

   This issue is very very frustrating for everyone, we cannot
   have a minimal bound dependency (i.e. pytest >= 1.2.3) on the
   linter, OR various dependencies of the linter, because the linter
   and it's dependencies are not API stable by a long shot.

   Updates to the linter and its dependencies cause new errors to
   occur, which means that a given commit in BuildStream history
   which passes it's own tests last year does not pass its tests
   today, depending on what you've upgraded.

I've spent some energy trying to track down what exact versions of what
libraries need to be installed in order to reproduce the successful
test runs in CI, but I was still not able to reproduce a successful run
on my laptop in a long time.

This is exactly the kind of problem I am hoping to avoid. Getting the exact versions of our Python dependencies (including transitive ones) should now be trivial with this change. Since they have caused the most problems in the past, I think the situation would improve. The non-Python dependencies will still be an issue. That can also be tackled, if we provide a simple way of running tests inside Docker.

I was hoping to find out what we needed to control *before* fixing the
new linting errors; so that we can pin things in place and be 100% sure
that linting behavior never changes on arbitrary developer laptops, but
I have recently been considering just fixing the most recent errors and
missing out on the opportunity to fix it properly, because the
situation is very, very severe.

Updating versions of dependencies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since we have a setup.py in the main repo, folks are often tempted to change
that directly when restricting the version of any dependency. However, if the
testuite images don't already have the required version, the build will just
fail. This has often been a source of confusion in the past. A recent example
is this discussion on !909 [2].

The "right" way to do this today is to make the change in the docker-images
repo first, wait for it to be merged, find the new tag, and finally use the
new tag in your MR on the main BuildStream repo. This process is more complex
than it needs to be, but more importantly makes it harder to test new versions
of dependencies as they have to go via the docker-images repo.

I know it is complicated, but it has to ensure the golden rule: that
nothing every changes at all in a CI setting from one run to the next,
except for changes introduced via the BuildStream commits under test.

I think it is currently exactly as complicated as it has to be.

Proposed solution
=================

There are a few things that I think we can do to reduce the responsibility of
the testuite images, and do everything in the main BuildStream repo to avoid
unnecessary complexity.

1. Do not install any python packages in the testsuite images. This will get
   rid of the hack (`pip install BuildStream && pip uninstall BuildStream`)
   that we currently use to install BuildStream's dependencies, and we won't
   need to maintain a duplicate copy of `dev-requirements.txt`.

2. Add `requirements.in` file to list our dependencies, that we currently do in
   `setup.py`.

3. Generate a `requirements.txt` file based on `requirements.in` that will have
   frozen versions of all dependencies.

4. Similarly, have `dev-requirements.in` and `dev-requirements.txt` for test
   dependencies.

5. Install `requirements.txt` and `dev-requirements.txt` before running any
   tests.

The `.in` files will provide the minimum version requirements, while the `.txt`
files will list exact versions of the packages that are known to work.

The only way for this to possibly work, is to pin every single python
dependency to an exact version, including indirect dependencies
implicitly pulled in via other dependencies.

This cannot fly because at least for the runtime dependencies (i.e. not
the dependencies needed *only* for tests, but those required to run
BuildStream), need to offer some measure of flexibility for distro
package maintainers (i.e. the outward statement of BuildStream needs to
be "We need >= x.y.z of package FOO", and let distro packagers decide
which exact version they will use).

Another variation of your plan, might be to have a completely separate
set of *specific* dependencies for running tests, and maintain that
separately from the official *minimal bound* dependencies we ship to
users and distro package maintainers.

However, I do not know that I trust us to always capture every version
of every dependency in such a file, and it also makes *sense* that
running `make check` on a developer laptop uses whatever is there; and
that in CI, we concoct a variety of situations and different distro
versions where different versions of things are there to test a variety
of likely installation combinations.

I think your concerns above should be resolved by having different `.in` and `.txt` files. The`.in` files will be used for specifying minimum bound dependencies we ship to users and package maintainers (via setup.py). And, the `.txt` files will be used to indicate a consistent set of dependencies that we know works because our CI will run using those. Since tox will put all the dependencies in a virtual environment, there’s no risk of contaminating developer laptops by randomly updating their python packages.

Let me know what you think!

Cheers,
-Tristan

Cheers,

Chandan

[1]: https://gitlab.com/BuildStream/buildstream/merge_requests/1027

[2]: https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/84

References:
- [BuildStream] Update CI and testing process
  - From: Chandan Singh
- Re: [BuildStream] Update CI and testing process
  - From: Tristan Van Berkom

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]