Re: Stop the train ! Caching build trees is going to be too big
- From: Tristan Van Berkom <tristan vanberkom codethink co uk>
- To: Phillip Smyth <phillip smyth codethink co uk>
- Cc: BuildStream <buildstream-list gnome org>
- Subject: Re: Stop the train ! Caching build trees is going to be too big
- Date: Sat, 28 Apr 2018 21:54:37 +0900
On Fri, 2018-04-27 at 13:44 +0100, Phillip Smyth wrote:
Hi Tristan,
Just a quick follow up email on some of my thoughts for this problem.
On Fri, Apr 27, 2018 at 06:09:46PM +0900, Tristan Van Berkom wrote:
Hi all.
This is just a quick email to raise a problem early, make sure that we
adjust our expectations, take a pause and fix a big flaw in our plan.
So, last week we identified that, it is not going to be realistic to
blindly cache build trees, because VCS data tends to cost a damn lot of
disk space (feel free to substitute "damn lot" with less family
friendly wording for dramatic effect).
For this, we opened this issue to block it:
https://gitlab.com/BuildStream/buildstream/issues/376
But the buck doesn't stop here, unfortunately.
For example, my workspace directory for WebKit (from a *tarball*, with
no VCS data added), costs me 5.8GB of disk space after a build. This is
only the source code we mean to build, plus the resulting object files.
The object files in the `_build/` subdirectory cost 5.6GB, so the
source code is only a couple of hundred MB.
To put this in perspective; when we started building GNOME against a
debian sysroot runtime, which costed about 3GB, it was quite annoying
because it takes a *damn long time* to download the base runtime before
we even start building.
Introducing a 5.8GB download for a prebuilt WebKit artifact is just not
gonna fly, we cannot start introducing these downloads into the build
process.
What I propose that we do, is the following:
* Split artifact keys in two:
* The regular artifact remains "${project}/${element}/${key}"
* The cached build tree is addressable as
"${project}/${element}/${key}/build"
I believe this is already done, as the build tree cache is currently being stored in a subdirectory of the
artifact.
Yes I understand this.
I think if the core functionality can be modified to download all subdirs excluding the build tree cache,
this issue be avoided.
Anything is essentially possible, this is software after all :)
But, it is a simpler API contract (and probably also a simpler
implementation) to just address what we want to download separately,
than to ask that Artifact Cache implementations support fancy semantics
to allow partial extraction and partial downloads for addressable
blobs.
* Alternatively, we split the artifact into metadata, logs,
output and build components, this remains to be discussed
and analyzed.
* Uploading of the build tree to artifact shares remains mandatory
* We should ensure integrity of artifact share servers
* In the usual cases, regular users do not contribute to artifact
shares anyway, automated build servers do this part
* Downloading the build tree of an artifact must only ever be done
on demand
Could we add a "--cached-build-tree" flag to bst build?
So that if the flag is false, it builds using it's sources
And if it is true, then it can download the cached build tree and use that instead
* We could have an option to force download all the sources if
we expect to need them later for offline work, but this is
not mandatory in order to land the feature I think
I'm not certain about what you mean by this.
Are you suggesting an option to download the entire cache to a location on your local machine?
No, I'm essentially suggesting something similar to what you are
suggesting above: a flag to inform `bst build` and `bst pull` that we
definitely want to download the build trees along with the rest of the
artifact contents, for any artifacts which we do download.
The reason why I add:
"...if we expect to need them later for offline work"
Is because:
* You only need the cached build trees for certain activities, like
say if you want an enhanced `bst shell` experience where all the
object files are in context.
* You might not know in advance if you're going to need to run
`bst shell`, or open a workspace and desire an incremental build.
* If you have an option to download these at `bst build` time, and
also at `bst pull` time, then you can have these build trees in
advance, so they will be available when working offline.
The above option however, I feel is more of an enhancement for
predicting offline work, it does not preclude that the option to
download these should most certainly be made available to the commands
which actually make use of them.
In other words:
* `bst workspace open` should let the user decide to download the
cached build trees at the time of opening the workspace, if the
the cached build tree is not already available locally.
* Similar options should be made available to any commands for which
the UX is improved by cached build trees.
Cheers,
-Tristan
[
Date Prev][
Date Next] [
Thread Prev][Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]