Re: Stop the train ! Caching build trees is going to be too big



Hi Tristan,

Just a quick follow up email on some of my thoughts for this problem.

On Fri, Apr 27, 2018 at 06:09:46PM +0900, Tristan Van Berkom wrote:
Hi all.

This is just a quick email to raise a problem early, make sure that we
adjust our expectations, take a pause and fix a big flaw in our plan.

So, last week we identified that, it is not going to be realistic to
blindly cache build trees, because VCS data tends to cost a damn lot of
disk space (feel free to substitute "damn lot" with less family
friendly wording for dramatic effect).

For this, we opened this issue to block it:

    https://gitlab.com/BuildStream/buildstream/issues/376

But the buck doesn't stop here, unfortunately.

For example, my workspace directory for WebKit (from a *tarball*, with
no VCS data added), costs me 5.8GB of disk space after a build. This is
only the source code we mean to build, plus the resulting object files.
The object files in the `_build/` subdirectory cost 5.6GB, so the
source code is only a couple of hundred MB.

To put this in perspective; when we started building GNOME against a
debian sysroot runtime, which costed about 3GB, it was quite annoying
because it takes a *damn long time* to download the base runtime before
we even start building.

Introducing a 5.8GB download for a prebuilt WebKit artifact is just not
gonna fly, we cannot start introducing these downloads into the build
process.

What I propose that we do, is the following:

  * Split artifact keys in two:

    * The regular artifact remains "${project}/${element}/${key}"

    * The cached build tree is addressable as
      "${project}/${element}/${key}/build"

I believe this is already done, as the build tree cache is currently being stored in a subdirectory of the 
artifact.
I think if the core functionality can be modified to download all subdirs excluding the build tree cache, 
this issue be avoided.

    * Alternatively, we split the artifact into metadata, logs,
      output and build components, this remains to be discussed
      and analyzed.

  * Uploading of the build tree to artifact shares remains mandatory

    * We should ensure integrity of artifact share servers

    * In the usual cases, regular users do not contribute to artifact
      shares anyway, automated build servers do this part

  * Downloading the build tree of an artifact must only ever be done
    on demand
Could we add a "--cached-build-tree" flag to bst build?
So that if the flag is false, it builds using it's sources
And if it is true, then it can download the cached build tree and use that instead

    * We could have an option to force download all the sources if
      we expect to need them later for offline work, but this is
      not mandatory in order to land the feature I think

I'm not certain about what you mean by this.
Are you suggesting an option to download the entire cache to a location on your local machine?

    * The build trees are only useful for a subset of purposes:

      - opening workspaces in a state ready for an incremental build
      - running a `bst shell` with all of the element's dependencies
        source code and built objects "in tree", such that debugging
        experience can be much more powerful in a `bst shell`

      In both of these cases, I think it even makes sense to have the
      download optional - I might rather enjoy opening a workspace
      on WebKit *right now* instead of waiting for a 5.8GB download.

Things are probably not as apocalyptic as I'm making them sound, but we
have to keep in mind that:

  * Dramatic effect is just super FUN !

  * People are already working on this feature and related features,
    so we need them to pause, think and strategize.

So I don't want people to panic, but please be understanding that we've
hit some roadblocks, reality has struck and we have to adapt to that.

Any thoughts about the proposed plan for splitting up the artifact
cache into separate addressable units, and making the downloads
optional ?

Cheers,
    -Tristan

_______________________________________________
Buildstream-list mailing list
Buildstream-list gnome org
https://mail.gnome.org/mailman/listinfo/buildstream-list

Thanks,

Phillip Smyth (Nexus)


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]