Re: [BuildStream] Partial local CAS



Hi,

On Tue, Nov 20, 2018 at 10:50 AM Tristan Van Berkom <tristan vanberkom codethink co uk> wrote:
Hi Sander, Jim...

On Fri, 2018-11-16 at 17:28 +0000, Sander Striker wrote:
> Hi Tristan,
>
> I am not clear if we're violently agreeing or very much disagreeing :).  More inline.

I think that we agree on most points here.

Good to hear.
 
I feel very strongly that the default imperative to have the artifacts
in the local cache as a result of running `bst build` should not be
changed just because remote execution is active.
 
We may only disagree on this point.

That is likely the case.  It's a fairly fundamental disagreement though :).
 
As Jim points out:

On Mon, 2018-11-19 at 13:38 +0000, Jim MacArthur via BuildStream-list wrote:
> On 16/11/2018 11:42, Tristan Van Berkom via BuildStream-list wrote:
> > Most importantly I just think it is important to bare in mind that
> > remote execution is just an optimization, and the way people use remote
> > execution will not always match our expectations.
> >
>
> We have at least once discussed using remote execution for cases when 
> software cannot be built on the local machine; C code which cannot be 
> cross-compiled, for example. In these cases, not only would remote 
> execution be required for a build, but it's likely the resulting 
> artifacts would be of no use on the local system either.

It is true that remote execution is required for building things which
require an execution environment unsupported by the host, but I also
don't think that we should break the expectation to "have something you
just built" just because that thing which you built is not runnable on
your host.

I would argue that it is a false expectation to begin with, which we should reset as soon as possible.  That all artifacts that result from a bst build of a pipeline, including any intermediate artifacts, are implicitly available on local disk seems overly broad.

Currently (before remote execution), whether an artifact is built by a
machine in CI and downloaded from an artifact server, or built locally
as a result of `bst build`, the expectation is that `bst build` gets
you the artifact; remote execution is not really much different than
the artifact having already been built by a third party and then
shared.

I would argue that the expectation should be that the artifact is built after the invocation of `bst build`.  And that the artifact is _guaranteed_ to be available locally after a `bst pull`.
We could (and probably should) introduce the convenience of having `bst shell` or `bst [artifact] checkout` do an implicit pull.

The way I see this is essentially:

  * The behavior of BuildStream should not change unexpectedly, just
    because remote execution is enabled does not mean you expect
    the results of running BuildStream to differ.

I don't think it is behavior that users necessarily need to notice.  Apart from the going-offline case, which I think is ok to special case.
 
  * We always optimize default settings for those who run BuildStream
    manually on the command line, not for a CI autobuilding setup.

I don't think we are talking about the same thing here, I am thinking of developer configurations as well as CI setups.
 
    This is because even if there are many more builds run in CI than
    those run by a developer on their laptop, those CI builds can be
    configured once - we should consider that there will always be
    many more separate developer installations than CI installations,
    and to minimize overall configuration pain; optimize defaults for
    developers because of this.

    I consider the case of running `bst build` and not being interested
    in having the results in your local cache to be the CI setting,

I would disagree on that.
 
    while the developer who runs `bst build` on their laptop/desktop
    almost certainly wants to do something with the artifact they
    built.

There is a huge piece of nuance here.  Which is: the developer is almost certainly not interested in *all* of the intermediate artifacts that were built as a result of running `bst build app.bst`.

The time where the interest in an element is actually known is at `checkout` or `pull` time.
 
Even the artifact for app.bst might not see as much interest as you expect.  I might just be interested in knowing that it `builds` for a subset of my `bst build` invocations.

  * If we are going to add an option to make having the build results
    in the local cache not an imperative of `bst build`, it should not
    only be done for remote execution.

    I.e. currently when we run builds in CI, those builds ensure that
    the CI instances download artifacts from remote artifact servers
    regardless if anything even needs to be built.

Removing the implicit pull even for the local build case?  That makes sense for consistency, yes.
 
    So this optimization where we can avoid steps because we are only
    interested in a remote artifact server having the results, but not
    the local cache, is just as relevant for pulled artifacts as it is
    for remote execution, this should probably be opt-in with the same
    configuration option.

Sneaky on the opt-in :).  Agreed that it can be similarly configurable.

Cheers,

Sander
 
Cheers,
    -Tristan
 
--

Cheers,

Sander


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]