Hi Tristan,
Apologies for the late reply. I haven’t completely dropped the ball on this one, it just got delayed due to various seasonal happenings :)
I’ll try summarize what we briefly talked about on IRC yesterday. Then, I’ll also leave inline replies to some of your comments.
First, we think that having something like `tox` as a frontend for running our tests will be useful for the following reasons:
- More consistent testing environment across CI and developer laptops - Less chance of host contamination by disallowing site packages - Less boilerplate code for installing dependencies, etc - Dependency requirements will be closer to the BuildStream code, and not in a separate repo
To get there, I have created MR !1027 [1]. It is marked as WIP, but it should be ready for review. The reason it is in WIP state is because before landing this, I would like to land the corresponding changes in the buildtream-docker-images [2] repo first. Once that happens, the testsuite images will no longer provide pure-Python runtime dependencies of BuildStream, but rather provide the non-Python build dependencies of our Python dependencies. Both MRs are in a merge-able state but I wanted to check with the community before making this change.
So, any feedback on this thread or the MR will be greatly appreciated.
In a later work, we can also do the following enhancements:
* Provide a way to update the `.txt`, which will be useful for picking up new versions, or when the corresponding `.in` file is changed.
* Other than running tests against frozen requirements, also attempt to run tests against latest versions to check if they are compatible. This will help us catch failures earlier in the cycle.
* Provide a way to run tests locally inside Docker, mimicking what happens in a CI job.
I haven’t done these as part of my current MR to keep the reviews more manageable but they can surely be added later.
---
Now the inline bits... <snip>
b. It ensures that we always run our tests with pre-determined versions of our
dependencies.
That is correct, except this phrasing of (b) is a dramaticunderstatement.The main reason is this: For any given run of CI, the conditions are always exactly the same in CI, unless we commit a change to .gitlab-ci.yml which explicitly changes the conditions and test environment.
Agreed. And, I think you would agree that the proposed approach of having both the loose and frozen requirements keeps this behavior.
In other words: Only a commit to the BuildStream repository can ever be allowed to result in a pipeline failing.
An unacceptable situation, is where CI fails for reasons outside of our control; or at least in that case we should not have a CI failure reported, but rather an error in the gitlab infrastructure instead.
I agree in principle, but it’s worth noting that it might not be practical to completely guarantee this. For example, if Docker Hub is down, we won’t be able to pull any images, and thus not be able to run any tests. I would say that keeping the number of external dependencies in CI is definitely desired. In this case, PyPI is already a dependency of our pipelines, so we won’t be adding any new dependencies.
I hope you first understand how paramount and important thisconsideration is compared to any considerations of convenience inupdating the images.Problems
========
Local reproduciblity
~~~~~~~~~~~~~~~~~~~~
It is not obvious which versions of dependencies the teststuite images are
running at a given tag. This makes it harder for anyone to debug CI failures
locally. It also makes it quite common for developers to hit the classic
problem of "works on my laptop but fails on CI".
Other than this, our `.gitlab-ci.yml` already is quite complicated in general,
which makes it hard for one to locally do "exactly whatever the CI is doing".
It will never be possible to reproduce the CI exactly locally withoutusing the docker image.Ideally, running the test suites locally *always passes*, but we shouldaccept that a local installation will always be different from theexact conditions we have explicitly programmed into the different CIruns.
This will always be true, but we can attempt to come close and let the perfect be the enemy of the good. Also, providing a way to run tests inside Docker might not be that far-fetched, and I think should be doable. I can see that as a possible enhancement, as mentioned above.
I should add that we are currently aware of exactly 2 issues wheretests are behaving differently in CI than on developer laptops: * Race condition which happens in test_push_pull https://gitlab.com/BuildStream/buildstream/issues/144 This appears to be intermittent, it rarely fails on developer setups and never fails in CI. There were some other race conditions which were failing in CI but they were fixed. * pylint behaves differently depending on what is installed https://gitlab.com/BuildStream/buildstream/issues/570 This issue is very very frustrating for everyone, we cannot have a minimal bound dependency (i.e. pytest >= 1.2.3) on the linter, OR various dependencies of the linter, because the linter and it's dependencies are not API stable by a long shot. Updates to the linter and its dependencies cause new errors to occur, which means that a given commit in BuildStream history which passes it's own tests last year does not pass its tests today, depending on what you've upgraded.I've spent some energy trying to track down what exact versions of whatlibraries need to be installed in order to reproduce the successfultest runs in CI, but I was still not able to reproduce a successful runon my laptop in a long time.
This is exactly the kind of problem I am hoping to avoid. Getting the exact versions of our Python dependencies (including transitive ones) should now be trivial with this change. Since they have caused the most problems in the past, I think the situation would improve. The non-Python dependencies will still be an issue. That can also be tackled, if we provide a simple way of running tests inside Docker.
I was hoping to find out what we needed to control *before* fixing thenew linting errors; so that we can pin things in place and be 100% surethat linting behavior never changes on arbitrary developer laptops, butI have recently been considering just fixing the most recent errors andmissing out on the opportunity to fix it properly, because thesituation is very, very severe.Updating versions of dependencies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Since we have a setup.py in the main repo, folks are often tempted to change
that directly when restricting the version of any dependency. However, if the
testuite images don't already have the required version, the build will just
fail. This has often been a source of confusion in the past. A recent example
is this discussion on !909 [2].
The "right" way to do this today is to make the change in the docker-images
repo first, wait for it to be merged, find the new tag, and finally use the
new tag in your MR on the main BuildStream repo. This process is more complex
than it needs to be, but more importantly makes it harder to test new versions
of dependencies as they have to go via the docker-images repo.
I know it is complicated, but it has to ensure the golden rule: thatnothing every changes at all in a CI setting from one run to the next,except for changes introduced via the BuildStream commits under test.I think it is currently exactly as complicated as it has to be.Proposed solution
=================
There are a few things that I think we can do to reduce the responsibility of
the testuite images, and do everything in the main BuildStream repo to avoid
unnecessary complexity.
1. Do not install any python packages in the testsuite images. This will get
rid of the hack (`pip install BuildStream && pip uninstall BuildStream`)
that we currently use to install BuildStream's dependencies, and we won't
need to maintain a duplicate copy of `dev-requirements.txt`.
2. Add `requirements.in` file to list our dependencies, that we currently do in
`setup.py`.
3. Generate a `requirements.txt` file based on `requirements.in` that will have
frozen versions of all dependencies.
4. Similarly, have `dev-requirements.in` and `dev-requirements.txt` for test
dependencies.
5. Install `requirements.txt` and `dev-requirements.txt` before running any
tests.
The `.in` files will provide the minimum version requirements, while the `.txt`
files will list exact versions of the packages that are known to work.
The only way for this to possibly work, is to pin every single pythondependency to an exact version, including indirect dependenciesimplicitly pulled in via other dependencies.This cannot fly because at least for the runtime dependencies (i.e. notthe dependencies needed *only* for tests, but those required to runBuildStream), need to offer some measure of flexibility for distropackage maintainers (i.e. the outward statement of BuildStream needs tobe "We need >= x.y.z of package FOO", and let distro packagers decidewhich exact version they will use).Another variation of your plan, might be to have a completely separateset of *specific* dependencies for running tests, and maintain thatseparately from the official *minimal bound* dependencies we ship tousers and distro package maintainers.However, I do not know that I trust us to always capture every versionof every dependency in such a file, and it also makes *sense* thatrunning `make check` on a developer laptop uses whatever is there; andthat in CI, we concoct a variety of situations and different distroversions where different versions of things are there to test a varietyof likely installation combinations.
I think your concerns above should be resolved by having different `.in` and `.txt` files. The`.in` files will be used for specifying minimum bound dependencies we ship to users and package maintainers (via setup.py). And, the `.txt` files will be used to indicate a consistent set of dependencies that we know works because our CI will run using those. Since tox will put all the dependencies in a virtual environment, there’s no risk of contaminating developer laptops by randomly updating their python packages.
Let me know what you think!
Cheers, -Tristan
Cheers, Chandan
|