Re: [BuildStream] Partial local CAS



Hi Tristan,

I am not clear if we're violently agreeing or very much disagreeing :).  More inline.

On Fri, Nov 16, 2018 at 12:42 PM Tristan Van Berkom <tristan vanberkom codethink co uk> wrote:
On Thu, 2018-11-15 at 18:39 +0000, Sander Striker wrote:
> Hi,
>
> On Thu, Nov 15, 2018 at 2:43 PM Tristan Van Berkom <tristan vanberkom codethink co uk> wrote:
> > Hi Sander,
> >
> > In the abstract, it looks to me like you did a valuable in-depth
> > analysis of what code paths we are traversing in the process of
> > building remotely, and which parts are redundant.
> >
> > As you highlight, there are already plans for CAS-to-CAS import and
> > SourceCache which are intended to remove some of these redundancies,
> > but others apparently will remain.
> >
> > Thanks for doing this !
>
> You're most welcome.  While my contributions to the project are not
> in writing code, due to being too short on time, I do have the
> occasional moment to do an in-depth review - this one happened to
> coincide with air travel :).
>
> > As I recall, Jim volunteered to write-up a new section to the
> > Architecture documentation for remote execution, so I hope this email
> > is valuable to him in drafting an ideal description of the remote
> > execution architecture for BuildStream / BuildGrid.
> >
> > A little bit more at the end...
> >
> > On Thu, 2018-11-08 at 14:01 -0800, Sander Striker via BuildStream-list wrote:
> > > Hi,
> > > 
> > > After the exchange in the "Coping with partial artifacts" thread, I
> > > realize that we haven't actually had a conversation on list about
> > > partial local CAS, and by extension local ArtifactCache.  Let me
> > > first explain what I mean with partial local CAS.  Let's define it as
> > > a CAS that contains Tree and Directory nodes, but not the [all of]
> > > actual file content blobs.
> > > 
> > > I'll outline the context and importance of this concept.  In remote
> > > execution builds do not run on the local machine.  As such to be able
> > > to perform a build, it is important to be able to _describe_ the
> > > inputs to a build.  When all of the input files are locally
> > > available, this can be done.  However, when the input files are not
> > > locally available, should we then incur the cost of fetching them? 
> > > Is there another way?
> > > 
> > > To answer that question let's review how remote execution is supposed
> > > to work again in the context of BuildStream.  To build an element:
> > > 
> > > 1) Compose a merkle tree of all dependencies, and all sources
> > > 2) Create a Command and an Action message
> > > 3) FindMissingBlobs(command, action, blobs in the merkle tree)
> > > 4) Upload the missing blobs
> > > 5) Submit the request to the execution service
> > > 6) Wait for the request to complete
> > > 7) Download the result merkle tree
> > > 8) Construct a merkle tree for the Artifact (based on the result)
> > > 9) FindMissingBlobs(blobs in the artifact merkle tree)
> > > 10) Upload the missing blobs
> > > 11) Store a ref to the artifact merkle tree in ArtifactCache
> > > 
> > > Let's dive in a bit and look where the inefficiencies are in the
> > > current implementation.
> > > 
> > > Step 1 happens during staging.  More specifically in
> > > buildelement.py:stage().  We start with the dependencies.  For
> > > directories backed by CAS, we don't need to actually stage them on
> > > the filesystem.  We can import files between CAS directories by
> > > reference (hash), without even needing the files locally.  This isn't
> > > currently implemented (_casbaseddirectory.py:import_files), but that
> > > should change with CAS-to-CAS import (MR !911).  
> > > After the depencies are staged, we move on to the sources.  Currently
> > > this is still fairly clunky, as we are actually staging sources on
> > > the filesystem and then importing that into our virtual staging
> > > directory (element.py:_stage_sources_at).  With SourceCache this
> > > should be as efficient as staging dependencies for non-modified
> > > elements.
> > > 
> > > Step 2 through 11 all happen during _sandboxremote.py:run().
> > > Step 2-4 aren't currently implemented in this fashion, and instead
> > > serially call a number of network RPCs.  In _sandboxremote.py:run() a
> > > call is made to cascache.push_directory().  This will push up any
> > > missing directory nodes, or any missing files.
> > > In _sandboxremote:run_remote_command() we are using
> > > cascache.push_message(), followed by
> > > cascache.verify_digest_pushed().  This results in a Write RPC,
> > > followed by a FindMissingBlobs RPC.  For both the Command and the
> > > Action.  In short, we could be eliminating a couple of RPCs and thus
> > > network roundrips here.
> > > I'll skip over step 5-6 as these are not very interesting.  Although
> > > it should be noted that _sandboxremote.py:run() is ignoring the build
> > > logs from the execution response.
> > > In step 7, which happens in _sandboxremote.py:process_job_output(),
> > > we take a Tree digest that we received from the execution service,
> > > and use it in a call to cascache.pull_tree().  This will fetch all of
> > > the file blobs that are present in the tree that are not available
> > > locally.  It will also store all of the directory nodes that are
> > > referenced in the tree, and return the root digest.  This is used to
> > > construct the result virtual directory of the sandbox.
> > > In step 8 we go back to constructing a file system representation of
> > > the artifact, instead of using a CAS backed directory.  This happens
> > > in element.py:assemble() through a call to cascache.commit().  This
> > > will do a local filesystem import of files, the majority of which we
> > > exported in step 7.  It will put an entry in the local ArtifactCache.
> > > Step 9-11 happen during the push phase.  Here we rely on
> > > cascache.push() to ensure that the artifact is made available on the
> > > remote CAS server.
> > > 
> > > Sidenote while we're here: apart from step 9-11 we don't actually
> > > make it clear to the scheduler which resources are needed.  As far as
> > > it is concerned a remote build job is currently taking up PROCESS
> > > tokens.
> > > 
> > > If you made it all the way here, thank you :).  I think we need to
> > > eliminate the unneeded filesystem access first.
> > > Then we can go further and support partial CAS by:
> > > - erroring when FindMissingBlobs() calls return digests that we don't
> > > actually have locally
> > > - retrieving just the Tree, rather than all blobs when we process
> > > ActionResults
> > > 
> > > Only when you actually want to use the artifact locally should we
> > > fetch the actual file objects.  For instance in case of bst
> > > [artifact] checkout.  Or bst shell.  If we are not using the files,
> > > there is no real point in downloading all this content, which takes
> > > both time and disk space.
> >
> > I mostly agree with this but I think it is also questionable and
> > depends on circumstance and user expectations.
> >
> > One way to look at things is simply:
> >
> >   * I run BuildStream to build, and whether I am getting assistance
> >     from a remote build service or not, my expectation is that I
> >     have the result locally.
>
> When do you actually expect to have these results exactly?  And what
> results specifically?  Because when you are not actually using these
> artifacts locally, you wouldn't even notice that you don't have them.
> If you're building a large stack, where you've made a change at the
> bottom and you want to evaluate the impact at the top, the likely two
> results that you care about are the bottom element and the top
> element.
> And maybe even just the top element, as that is the one that you are
> evaluating.  And how will you evaluate?  By running bst [artifact]
> checkout or bst shell, or something similar.
>
> I have a feeling we are probably on opposing ends on this, which can
> only lead to one compromise: make it configurable to always download
> all artifacts.  That way users that don't need artifacts are not
> penalized by having to download them anyway, and clean them up
> afterwards.

Starting from ground zero, before we had artifact servers (or before a
user sets up and configures an artifact server), building will result
in the artifacts being safely available on your host where you built
them, and available later for offline use.

Safely is a relative term, but sure, you have the artifacts locally.  And you can get to them through the 'bst' cli, as how they are stored is an implementation detail.

I wonder how high the offline _by default_ aspect ranks in our mostly online world.

Now that we have artifact servers, or if the user configures one, they
still have the certainty that after a build; the results are safely
available for later offline use whether or not they were also uploaded
to an artifact server.

Even if you didn't have the artifacts locally, you could get to them through the 'bst' cli.  The offline aspect may not be as important here, and I could see that as something that you are explicit about.
 
I don't think that it is correct that by default this behavior changes
just because the user has opted to try out building their project on a
remote execution service - this seems to me an unwelcome surprise.

To start with I don't think that characterizing this as "trying out" is fair.

Furthermore I think it is totally fine to have this behavior.  You built it remotely, therefor it is available remotely.  The artifacts are available through the 'bst' cli as before.

If you want to make sure you have everything so you can work offline, you'll need to pull artifacts and fetch sources, which is an expectation that we already set if you want to build locally offline.

I think it's fine for us to evolve and honestly I don't see the controversy you are seeing.
 
That said, I *do* see that in a CI setting that happens to use remote
execution services, it is not always an imperative to have the built
artifacts on the machine which ran BuildStream after a build succeeds,
so I would not be against an option to disable that imperative.

My perspective is the inverse :).  I don't we need to have the [intermediary] built artifacts on the machine that ran BuildStream after a build succeeds (or fails), beyond the i-want-to-work-offline use case.

Everything still works.  We may want to pick up the thread again in which explicit vs implicit fetching and pulling is discussed. In particular whether bst checkout and bst shell should pull implicitly by default.

> >   * I run BuildStream to build, and whether I am getting assistance
> >     from a remote build service or not, my expectation is that the
> >     resulting artifacts are uploaded to the appropriate artifact
> >     server[s].
>
> Only if you have push permissions to the artifact servers, right?

Right, I think the expectation right now is either to have the build
results locally, or to *also* have the build results uploaded
synchronously to an artifact server (i.e. the BuildStream client does
not exit until it is sure that all the artifacts have been uploaded or
failed to upload with an error message).

I think we can still set and manage expectations.  Especially if the user experience improves as a result - I never heard complaints when wait times were reduced.

Concretely, if the artifact exists remotely, I think we can declare that the results exist.  We can from that point on get these results locally anywhere.

Or better, set the expectation that:
- after a bst build, an artifact exists
- when running bst [artifact] checkout, the artifact is extracted to a local location
- when running bst shell, the artifact (and its runtime deps) are staged and the user is dropped into a shell
- when running bst shell --build, all artifacts that are build dependencies are staged, including the source and the user is dropped in a shell

That doesn't require us to state where artifacts come from.  I also think it's a reasonable expectation to set that:
- when running commands offline, artifacts and sources should be pulled / fetched ahead of time.

We can take this to another thread of course.

> > I have been hearing a lot about expectations that people will use
> > setups where the project has a dedicated execution service AND artifact
> > cache, and that the CAS tied to remote execution will happen to be the
> > same CAS which is used as an artifact server.
>
> Correct.  ArtifactCache is just a name -> digest mapping service. 
> All of the actual content of an artifact will be in a CAS service. 
> The optimal setup would have the same CAS store the buildstream
> artifact content as well as the remote execution action result
> content, as there will be a lot overlap there.
>  
> > While I agree that this seems to be an optimal setup, I have my doubts
> > as to how reasonable the expectation is; for this to be the norm in a
> > distributed environment where multiple projects maintained by multiple
> > entities overlap, and artifacts from subprojects are shared, etc - this
> > expectation only seems to hold water in a closed and controlled
> > environment.
>
>  
> I'm not sure I agree on that, but maybe your example below can be
> used to illustrate.
>
> > I rather envision that people will want to work on project A which
> > depends on project B, both with different artifact servers, but having
> > credentials only to upload to project A - and that one day a developer
> > might want to branch out and try/setup a remote execution service.
> >
> > In that light, I am also interested in the cases where you build
> > something on a remote execution service that is unrelated to your
> > artifact servers, and hope that the execution service can download and
> > upload artifacts to the appropriate servers without round tripping to
> > the host running the BuildStream client (unless we are in a scenario
> > where the goal of running the client is to have the created artifacts
> > locally, where at least the host needs to download results after builds
> > complete).
>
> In that scenario there is currently no provision in the protocol
> other than having the host "pull" and "push" the content from one to
> the other.
> In your scenario above, the artifacts would come from project B, be
> uploaded to project A's CAS, because of a FindMissingBlobs()
> returning missing blobs as part of a remote executing an element from
> project A.
>
> One could conceive a Replication service definition that contains
> APIs for:
> - pushing Trees|Directories|blobs to a target cas endpoint
> - pulling Trees|Directories|blobs from a source cas endpoint
> We can't make any assumption about any CAS endpoint configured in a
> BuildStream project will actually implement this service.

Certainly we can bail out with an error if what BuildStream needs to do
is not supported, which is the general approach so far for having
feature additions in the artifact cache service which are required by
BuildStream in the cases where there is no fallback.

Note that this replication API would have to be synchronous in order
for BuildStream to ensure that it does not exit until it is sure that
the build results have been pushed to the relevant artifact cache
servers.

I think we can design it once we get to that point.  Initially this interface won't be in place.
 
>   Also, this service definition comes with interesting
> authentication/authorization implications as well.

Right, the authentication implications seem to be the most important
ones to address in order to optimize this.

> In other words, in the less optimal setup, where artifactcache CAS
> endpoint is different from the remote execution endpoint _and_ you
> have push privileges to the artifactcache CAS endpoint, bst would
> need to pull the results from the remote execution CAS endpoint and
> push them to artifactcache CAS endpoint.
> Note that my use of push and pull is liberal here - I expect these to
> be implemented by using the normal ContentAddressableStorage service
> calls.

Right, that is how I expect it is working right now, by trucking along
the artifacts through the client host and pushing them to where they
need to go.

While it is desirable to have optimizations in place for this
bandwidth, it is alright that it "just works" even if it is not
optimal.

I would agree with the caveat that it should be optimal when the setup allows for it.
 
Most importantly I just think it is important to bare in mind that
remote execution is just an optimization,

I am not sure if I would characterize it like that.  I consider local execution and remote execution both as first class citizens.
 
and the way people use remote
execution will not always match our expectations.

Maybe.  My expectation is that anyone that has a remote execution service at their disposal, will use it as the primary way of building.  The only exception to that will be offline building.
 
For instance, consider an organization which right now can afford to
run and maintain an artifact server. One day they might say: "Hey, in
the next 3 months we have a very intense and fast moving development
branch and we need more build power, lets temporarily pay for an
account and use a nifty remote execution service to augment our build
power for this development phase".

Personally I think this is more coming down to how much remote execution capacity you want, rather than if you want remote execution capacity.  In other words, for these three months lets increase capacity.
Obviously we can agree to disagree.  We'll need to see what happens.

It is important that adding and removing a remote execution service to
an existing workflow is a seamless experience which does not result in
data loss, and of course desirable that it is not needlessly slow due
to not coupling the target artifact server with the ephemeral execution
service.

I don't think that is a controversial statement, and should work.
 
The reverse could also be true, where the remote execution service is a
constant and an artifact storage is ephemeral.
 
Cheers,
    -Tristan

Cheers,

Sander 
--

Cheers,

Sander


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]