Re: [BuildStream] Update CI and testing process

From: Chandan Singh <chandan chandansingh net>
To: Tristan Van Berkom <tristan vanberkom codethink co uk>
Cc: buildstream-list gnome org
Subject: Re: [BuildStream] Update CI and testing process
Date: Sat, 29 Dec 2018 18:09:25 +0000

(Please discard the previous message. My new email client messed up
the formatting, and I also missed a few points.)

Hi Tristan,

Apologies for the late reply. I haven’t completely dropped the ball on
this one, it just got delayed due to various seasonal happenings :)

I’ll try summarize what we briefly talked about on IRC yesterday.
Then, I’ll also leave inline replies to some of your comments.

First, we think that having something like `tox` as a frontend for
running our tests will be useful for the following reasons:

- More consistent testing environment across CI and developer laptops
- Less chance of host contamination by disallowing site packages
- Less boilerplate code for installing dependencies, etc
- Dependency requirements will be closer to the BuildStream code, and
not in a separate repo

To get there, I have created MR !1027 [1]. It is marked as WIP,  but
it should be ready for review. The reason it is in WIP state is
because before landing this, I would like to land the corresponding
changes in the buildtream-docker-images [2] repo first. Once that
happens, the testsuite images will no longer provide pure-Python
runtime dependencies of BuildStream, but rather provide the non-Python
build dependencies of our Python dependencies. Both MRs are in a
merge-able state but I wanted to check with the community before
making this change.

So, any feedback on this thread or the MR will be greatly appreciated.

In a later work, we can also do the following enhancements:

* Skip ostree-related linting checks when ostree is not available.
This will bring it more in line with how rest of the plugins behave
everywhere else in our tests.

* Additionally, separate linting step from testing steps. This will
enable developers either or both easily, and will also allow us to run
the linting job in parallel to the testing jobs.

* Provide a way to update the `.txt`, which will be useful for picking
up new versions, or when the corresponding `.in` file is changed.
Something like what we have on this branch. [3]

* Other than running tests against frozen requirements, also attempt
to run tests against latest versions to check if they are compatible.
This will help us catch failures earlier in the cycle.

* Provide a way to run tests locally inside Docker, mimicking what
happens in a CI job.

I haven’t done these as part of my current MR to keep the reviews more
manageable but they can surely be added later.

---

Now the inline bits...


On Sun, Dec 9, 2018 at 7:15 AM Tristan Van Berkom
<tristan vanberkom codethink co uk> wrote:

<snip>

b. It ensures that we always run our tests with pre-determined versions of our
   dependencies.


That is correct, except this phrasing of (b) is a dramatic
understatement.

The main reason is this:

   For any given run of CI, the conditions are always exactly the same
   in CI, unless we commit a change to .gitlab-ci.yml which explicitly
   changes the conditions and test environment.


Agreed. And, I think you would agree that the having both the loose
and frozen requirements would satisfy this.

   In other words: Only a commit to the BuildStream repository can ever
   be allowed to result in a pipeline failing.

An unacceptable situation, is where CI fails for reasons outside of our
control; or at least in that case we should not have a CI failure
reported, but rather an error in the gitlab infrastructure instead.


I agree in principle, but it’s worth noting that it might not be
practical to completely guarantee this. For example, if Docker Hub is
down, we won’t be able to pull any images. I would say that keeping
the number of external dependencies in CI is definitely desired, but
having some will probably be unavoidable. Having said that, I agree
with you that we should attempt to identify and mark such errors as
infrastructure failures, and not as test failures.

I hope you first understand how paramount and important this
consideration is compared to any considerations of convenience in
updating the images.

Problems
========

Local reproduciblity
~~~~~~~~~~~~~~~~~~~~

It is not obvious which versions of dependencies the teststuite images are
running at a given tag. This makes it harder for anyone to debug CI failures
locally. It also makes it quite common for developers to hit the classic
problem of "works on my laptop but fails on CI".

Other than this, our `.gitlab-ci.yml` already is quite complicated in general,
which makes it hard for one to locally do "exactly whatever the CI is doing".


It will never be possible to reproduce the CI exactly locally without
using the docker image.

Ideally, running the test suites locally *always passes*, but we should
accept that a local installation will always be different from the
exact conditions we have explicitly programmed into the different CI
runs.


This will always be true, but we can attempt to come close and let the
perfect be the enemy of the good. Also, providing a way to run tests
inside Docker might be a good idea anyway, especially for folks on
Windows and Mac, who can't otherwise run the integration tests
directly.

I should add that we are currently aware of exactly 2 issues where
tests are behaving differently in CI than on developer laptops:

  * Race condition which happens in test_push_pull
    https://gitlab.com/BuildStream/buildstream/issues/144

    This appears to be intermittent, it rarely fails on developer
    setups and never fails in CI.

    There were some other race conditions which were failing in CI
    but they were fixed.


  * pylint behaves differently depending on what is installed
    https://gitlab.com/BuildStream/buildstream/issues/570

    This issue is very very frustrating for everyone, we cannot
    have a minimal bound dependency (i.e. pytest >= 1.2.3) on the
    linter, OR various dependencies of the linter, because the linter
    and it's dependencies are not API stable by a long shot.

    Updates to the linter and its dependencies cause new errors to
    occur, which means that a given commit in BuildStream history
    which passes it's own tests last year does not pass its tests
    today, depending on what you've upgraded.


I've spent some energy trying to track down what exact versions of what
libraries need to be installed in order to reproduce the successful
test runs in CI, but I was still not able to reproduce a successful run
on my laptop in a long time.


This is exactly the kind of problem I am hoping to avoid. Getting the
exact versions of our Python dependencies (including transitive ones)
should be trivial with this change. The non-Python dependencies might
still be an issue, but they haven't caused any major problems in the
past.

I was hoping to find out what we needed to control *before* fixing the
new linting errors; so that we can pin things in place and be 100% sure
that linting behavior never changes on arbitrary developer laptops, but
I have recently been considering just fixing the most recent errors and
missing out on the opportunity to fix it properly, because the
situation is very, very severe.

Another variation of your plan, might be to have a completely separate
set of *specific* dependencies for running tests, and maintain that
separately from the official *minimal bound* dependencies we ship to
users and distro package maintainers.

However, I do not know that I trust us to always capture every version
of every dependency in such a file, and it also makes *sense* that
running `make check` on a developer laptop uses whatever is there; and
that in CI, we concoct a variety of situations and different distro
versions where different versions of things are there to test a variety
of likely installation combinations.


I think we are thinking the same thing, i.e. having both `.in` and
`.txt` files. The`.in` files will be used for specifying minimum bound
dependencies we ship to users and package maintainers (via setup.py).
And, the `.txt` files will be used to indicate a consistent set of
dependencies that we know works because our CI will run using those.
Since tox will put all the dependencies in a virtual environment,
there’s no risk of contaminating developer laptops by randomly
updating their python packages.

Let me know what you think!

Cheers,
    -Tristan


Cheers,
Chandan

[1]: https://gitlab.com/BuildStream/buildstream/merge_requests/1027
[2]: https://gitlab.com/BuildStream/buildstream-docker-images/merge_requests/84
[3]: https://gitlab.com/BuildStream/buildstream/blob/chandan/tox-make/Makefile#L25

References:
- [BuildStream] Update CI and testing process
  - From: Chandan Singh
- Re: [BuildStream] Update CI and testing process
  - From: Tristan Van Berkom

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]