Hi all.
This is just a quick email to raise a problem early, make sure that we
adjust our expectations, take a pause and fix a big flaw in our plan.
So, last week we identified that, it is not going to be realistic to
blindly cache build trees, because VCS data tends to cost a damn lot of
disk space (feel free to substitute "damn lot" with less family
friendly wording for dramatic effect).
For this, we opened this issue to block it:
https://gitlab.com/BuildStream/buildstream/issues/376
But the buck doesn't stop here, unfortunately.
For example, my workspace directory for WebKit (from a *tarball*, with
no VCS data added), costs me 5.8GB of disk space after a build. This is
only the source code we mean to build, plus the resulting object files.
The object files in the `_build/` subdirectory cost 5.6GB, so the
source code is only a couple of hundred MB.
To put this in perspective; when we started building GNOME against a
debian sysroot runtime, which costed about 3GB, it was quite annoying
because it takes a *damn long time* to download the base runtime before
we even start building.
Introducing a 5.8GB download for a prebuilt WebKit artifact is just not
gonna fly, we cannot start introducing these downloads into the build
process.
Ugh, yes, we cannot unconditionally introduce this overhead without
any added value.
What I propose that we do, is the following:
* Split artifact keys in two:
* The regular artifact remains "${project}/${element}/${key}"
* The cached build tree is addressable as
"${project}/${element}/${key}/build"
* Alternatively, we split the artifact into metadata, logs,
output and build components, this remains to be discussed
and analyzed.
I would prefer us to take a slightly different approach than storing
under different keys, and instead store a "BuildStreamArtifact"
message under the key. That can then be used to download
the different elements of the artifact.
This is a similar approach to the
ActionResult stored in the
BuildFarm ActionCache.
* Uploading of the build tree to artifact shares remains mandatory
* We should ensure integrity of artifact share servers
* In the usual cases, regular users do not contribute to artifact
shares anyway, automated build servers do this part
* Downloading the build tree of an artifact must only ever be done
on demand
* We could have an option to force download all the sources if
we expect to need them later for offline work, but this is
not mandatory in order to land the feature I think
* The build trees are only useful for a subset of purposes:
- opening workspaces in a state ready for an incremental build
- running a `bst shell` with all of the element's dependencies
source code and built objects "in tree", such that debugging
experience can be much more powerful in a `bst shell`
In both of these cases, I think it even makes sense to have the
download optional - I might rather enjoy opening a workspace
on WebKit *right now* instead of waiting for a 5.8GB download.
Things are probably not as apocalyptic as I'm making them sound, but we
have to keep in mind that:
* Dramatic effect is just super FUN !
It is... I do caution that with many different backgrounds drama there is
definitely room for misinterpretation :). I like that you're calling this out
to prevent that from happening.
* People are already working on this feature and related features,
so we need them to pause, think and strategize.
So I don't want people to panic, but please be understanding that we've
hit some roadblocks, reality has struck and we have to adapt to that.
Yup.
Any thoughts about the proposed plan for splitting up the artifact
cache into separate addressable units, and making the downloads
optional ?
+1 on splitting it up.
Cheers,
-Tristan
Cheers,
Sander
_______________________________________________
Buildstream-list mailing list
Buildstream-list gnome org
https://mail.gnome.org/mailman/listinfo/buildstream-list