Re: Stop the train ! Caching build trees is going to be too big



On Mon, 2018-04-30 at 13:20 +0000, Sander Striker wrote:
Hi,

On Sat, Apr 28, 2018 at 2:31 PM Tristan Van Berkom <tristan vanberkom codethink co uk> wrote:
On Fri, 2018-04-27 at 11:25 +0000, Sander Striker wrote:
[...]
What I propose that we do, is the following:
 
  * Split artifact keys in two:
 
    * The regular artifact remains "${project}/${element}/${key}"
 
    * The cached build tree is addressable as
      "${project}/${element}/${key}/build"
 
    * Alternatively, we split the artifact into metadata, logs,
      output and build components, this remains to be discussed
      and analyzed.
 
I would prefer us to take a slightly different approach than storing
under different keys, and instead store a "BuildStreamArtifact"
message under the key.  That can then be used to download
the different elements of the artifact.
This is a similar approach to the ActionResult stored in the
BuildFarm ActionCache.

The only real problem I have with the approach you suggest is that it
sounds like it can only be implemented with the CAS artifact cache,
while implementing it the way I suggest would probably be transparent
when we change artifact cache implementations from OSTree -> CAS.

If what you are suggesting is that nowhere other than the '_artifactcache/ostreecache.py' implementation, 
we are going to be using
multiple keys, then I have no concern.

If however, we are looking to change the code to use the
ArtifactCache API to deal with a key per partial artifact, that seems like
we're no longer keeping the change localized to the artifactcache
implementation.

Well, the ArtifactCache API is rather internal, while we currently have
a Tar and OSTree implementation, it's not a huge deal considering we're
hoping to move to a uniform CAS implementation.

Also I should note, the Tar cache never supported push/pull, so while
this change will possibly still require an update to tar things
separately, I hope it's not a lot of needless churn which we're
intending to throw away with the arrival of CAS.

Really this is only a matter of making it easier for separate ongoing
developments to happen without friction or blocking on eachother, and
this also sounds like an implementation detail which we could change
without too much effort later on, at the cost of an artifact version
bump.
Does this make sense ?

I'll admit that I'm not sure :).  If we're truly talking implementation detail,
and we'll be churning a version anyway, then disregard my previous
remarks.

To be honest, this is mostly immaterial and I think the decision should
mostly be based on what is ready to land in master first.

I think that a lot of time has been spent on caching of build trees and
we're overcoming a few obstacles already, I'd just rather not block
landing it on the new CAS cache.

If the new CAS cache is ready to land sooner (will have to consult with
Jürg), then I agree we should avoid this workaround of splitting up the
artifact completely.

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]