Re: [BuildStream] Partial local CAS

From: Tristan Van Berkom <tristan vanberkom codethink co uk>
To: Sander Striker <s striker striker nl>, BuildStream <buildstream-list gnome org>
Subject: Re: [BuildStream] Partial local CAS
Date: Thu, 15 Nov 2018 22:43:10 +0900

Hi Sander,

In the abstract, it looks to me like you did a valuable in-depth
analysis of what code paths we are traversing in the process of
building remotely, and which parts are redundant.

As you highlight, there are already plans for CAS-to-CAS import and
SourceCache which are intended to remove some of these redundancies,
but others apparently will remain.

Thanks for doing this !

As I recall, Jim volunteered to write-up a new section to the
Architecture documentation for remote execution, so I hope this email
is valuable to him in drafting an ideal description of the remote
execution architecture for BuildStream / BuildGrid.

A little bit more at the end...

On Thu, 2018-11-08 at 14:01 -0800, Sander Striker via BuildStream-list wrote:

Hi,

After the exchange in the "Coping with partial artifacts" thread, I
realize that we haven't actually had a conversation on list about
partial local CAS, and by extension local ArtifactCache.  Let me
first explain what I mean with partial local CAS.  Let's define it as
a CAS that contains Tree and Directory nodes, but not the [all of]
actual file content blobs.

I'll outline the context and importance of this concept.  In remote
execution builds do not run on the local machine.  As such to be able
to perform a build, it is important to be able to _describe_ the
inputs to a build.  When all of the input files are locally
available, this can be done.  However, when the input files are not
locally available, should we then incur the cost of fetching them? 
Is there another way?

To answer that question let's review how remote execution is supposed
to work again in the context of BuildStream.  To build an element:

1) Compose a merkle tree of all dependencies, and all sources
2) Create a Command and an Action message
3) FindMissingBlobs(command, action, blobs in the merkle tree)
4) Upload the missing blobs
5) Submit the request to the execution service
6) Wait for the request to complete
7) Download the result merkle tree
8) Construct a merkle tree for the Artifact (based on the result)
9) FindMissingBlobs(blobs in the artifact merkle tree)
10) Upload the missing blobs
11) Store a ref to the artifact merkle tree in ArtifactCache

Let's dive in a bit and look where the inefficiencies are in the
current implementation.

Step 1 happens during staging.  More specifically in
buildelement.py:stage().  We start with the dependencies.  For
directories backed by CAS, we don't need to actually stage them on
the filesystem.  We can import files between CAS directories by
reference (hash), without even needing the files locally.  This isn't
currently implemented (_casbaseddirectory.py:import_files), but that
should change with CAS-to-CAS import (MR !911).  
After the depencies are staged, we move on to the sources.  Currently
this is still fairly clunky, as we are actually staging sources on
the filesystem and then importing that into our virtual staging
directory (element.py:_stage_sources_at).  With SourceCache this
should be as efficient as staging dependencies for non-modified
elements.

Step 2 through 11 all happen during _sandboxremote.py:run().
Step 2-4 aren't currently implemented in this fashion, and instead
serially call a number of network RPCs.  In _sandboxremote.py:run() a
call is made to cascache.push_directory().  This will push up any
missing directory nodes, or any missing files.
In _sandboxremote:run_remote_command() we are using
cascache.push_message(), followed by
cascache.verify_digest_pushed().  This results in a Write RPC,
followed by a FindMissingBlobs RPC.  For both the Command and the
Action.  In short, we could be eliminating a couple of RPCs and thus
network roundrips here.
I'll skip over step 5-6 as these are not very interesting.  Although
it should be noted that _sandboxremote.py:run() is ignoring the build
logs from the execution response.
In step 7, which happens in _sandboxremote.py:process_job_output(),
we take a Tree digest that we received from the execution service,
and use it in a call to cascache.pull_tree().  This will fetch all of
the file blobs that are present in the tree that are not available
locally.  It will also store all of the directory nodes that are
referenced in the tree, and return the root digest.  This is used to
construct the result virtual directory of the sandbox.
In step 8 we go back to constructing a file system representation of
the artifact, instead of using a CAS backed directory.  This happens
in element.py:assemble() through a call to cascache.commit().  This
will do a local filesystem import of files, the majority of which we
exported in step 7.  It will put an entry in the local ArtifactCache.
Step 9-11 happen during the push phase.  Here we rely on
cascache.push() to ensure that the artifact is made available on the
remote CAS server.

Sidenote while we're here: apart from step 9-11 we don't actually
make it clear to the scheduler which resources are needed.  As far as
it is concerned a remote build job is currently taking up PROCESS
tokens.

If you made it all the way here, thank you :).  I think we need to
eliminate the unneeded filesystem access first.
Then we can go further and support partial CAS by:
- erroring when FindMissingBlobs() calls return digests that we don't
actually have locally
- retrieving just the Tree, rather than all blobs when we process
ActionResults

Only when you actually want to use the artifact locally should we
fetch the actual file objects.  For instance in case of bst
[artifact] checkout.  Or bst shell.  If we are not using the files,
there is no real point in downloading all this content, which takes
both time and disk space.


I mostly agree with this but I think it is also questionable and
depends on circumstance and user expectations.

One way to look at things is simply:

  * I run BuildStream to build, and whether I am getting assistance
    from a remote build service or not, my expectation is that I
    have the result locally.

  * I run BuildStream to build, and whether I am getting assistance
    from a remote build service or not, my expectation is that the
    resulting artifacts are uploaded to the appropriate artifact
    server[s].

I have been hearing a lot about expectations that people will use
setups where the project has a dedicated execution service AND artifact
cache, and that the CAS tied to remote execution will happen to be the
same CAS which is used as an artifact server.

While I agree that this seems to be an optimal setup, I have my doubts
as to how reasonable the expectation is; for this to be the norm in a
distributed environment where multiple projects maintained by multiple
entities overlap, and artifacts from subprojects are shared, etc - this
expectation only seems to hold water in a closed and controlled
environment.

I rather envision that people will want to work on project A which
depends on project B, both with different artifact servers, but having
credentials only to upload to project A - and that one day a developer
might want to branch out and try/setup a remote execution service.

In that light, I am also interested in the cases where you build
something on a remote execution service that is unrelated to your
artifact servers, and hope that the execution service can download and
upload artifacts to the appropriate servers without round tripping to
the host running the BuildStream client (unless we are in a scenario
where the goal of running the client is to have the created artifacts
locally, where at least the host needs to download results after builds
complete).

Cheers,
    -Tristan

Follow-Ups:
- Re: [BuildStream] Partial local CAS
  - From: Sander Striker

References:
- [BuildStream] Partial local CAS
  - From: Sander Striker

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]