Stop the train ! Caching build trees is going to be too big
- From: Tristan Van Berkom <tristan vanberkom codethink co uk>
- To: BuildStream <buildstream-list gnome org>
- Subject: Stop the train ! Caching build trees is going to be too big
- Date: Fri, 27 Apr 2018 18:09:46 +0900
Hi all.
This is just a quick email to raise a problem early, make sure that we
adjust our expectations, take a pause and fix a big flaw in our plan.
So, last week we identified that, it is not going to be realistic to
blindly cache build trees, because VCS data tends to cost a damn lot of
disk space (feel free to substitute "damn lot" with less family
friendly wording for dramatic effect).
For this, we opened this issue to block it:
https://gitlab.com/BuildStream/buildstream/issues/376
But the buck doesn't stop here, unfortunately.
For example, my workspace directory for WebKit (from a *tarball*, with
no VCS data added), costs me 5.8GB of disk space after a build. This is
only the source code we mean to build, plus the resulting object files.
The object files in the `_build/` subdirectory cost 5.6GB, so the
source code is only a couple of hundred MB.
To put this in perspective; when we started building GNOME against a
debian sysroot runtime, which costed about 3GB, it was quite annoying
because it takes a *damn long time* to download the base runtime before
we even start building.
Introducing a 5.8GB download for a prebuilt WebKit artifact is just not
gonna fly, we cannot start introducing these downloads into the build
process.
What I propose that we do, is the following:
* Split artifact keys in two:
* The regular artifact remains "${project}/${element}/${key}"
* The cached build tree is addressable as
"${project}/${element}/${key}/build"
* Alternatively, we split the artifact into metadata, logs,
output and build components, this remains to be discussed
and analyzed.
* Uploading of the build tree to artifact shares remains mandatory
* We should ensure integrity of artifact share servers
* In the usual cases, regular users do not contribute to artifact
shares anyway, automated build servers do this part
* Downloading the build tree of an artifact must only ever be done
on demand
* We could have an option to force download all the sources if
we expect to need them later for offline work, but this is
not mandatory in order to land the feature I think
* The build trees are only useful for a subset of purposes:
- opening workspaces in a state ready for an incremental build
- running a `bst shell` with all of the element's dependencies
source code and built objects "in tree", such that debugging
experience can be much more powerful in a `bst shell`
In both of these cases, I think it even makes sense to have the
download optional - I might rather enjoy opening a workspace
on WebKit *right now* instead of waiting for a 5.8GB download.
Things are probably not as apocalyptic as I'm making them sound, but we
have to keep in mind that:
* Dramatic effect is just super FUN !
* People are already working on this feature and related features,
so we need them to pause, think and strategize.
So I don't want people to panic, but please be understanding that we've
hit some roadblocks, reality has struck and we have to adapt to that.
Any thoughts about the proposed plan for splitting up the artifact
cache into separate addressable units, and making the downloads
optional ?
Cheers,
-Tristan
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]