Re: Proposal for Remote Execution



On Wed, 2018-04-11 at 22:37 +0200, Jürg Billeter wrote:
Hi all,

This is a proposal to support remote execution in BuildStream. I will first
describe the overall goals and the proposed service architecture, followed
by a plan forward with some details to changes and additions required in the
BuildStream code base. As this touches many areas, I've split the plan into
multiple phases that can be merged one after the other. The initial phases
do not actually enable remote execution yet but bring local execution in
line with what is required for remote execution.

Hi Jürg,

Sorry it's taken me a while to get to this email.

Goals
~~~~~
Remote execution enables BuildStream to run build jobs in a distributed
network instead of on the local machine. This allows massive speedups when
a powerful cluster of servers is available.

Besides speeding up builds, a goal is also to allow running builds on
workers that use a different execution environment, e.g., a different
operating system or ISA.

The goal is not to offload a complete BuildStream session to a remote
system. BuildStream will still run locally and dispatch individual build
jobs for elements with the existing pipeline.


Service Architecture and API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This proposal builds upon Bazel's Remote Execution API¹. This allows use of
the same server infrastructure for BuildStream, Bazel, and other tools that
support the Remote Execution API.

The API defines a ContentAddressableStorage (CAS) service, an Execution
service, and an ActionCache service. They are all defined as gRPC² APIs
using protocol buffers.

The CAS is similar to a OSTree repository, storing directories as Merkle
trees. However, unlike OSTree, CAS does not include support for refs.

The Execution service allows clients to execute actions remotely. An action
is described with a command, an input directory, output paths, and platform
requirements. The input directory is supplied as a CAS digest and the result
of an action also refers to CAS digests for output files and logs. This
roughly corresponds to BuildStream's Sandbox object for local execution.

The ActionCache service is a cache to map action descriptions to action
results to avoid executing the same action again if the result is already
available. As an action description includes the CAS digest of the input
directory, any changes in the input directory will result in a different
cache entry.

The ActionCache service does not suffice as top-level caching service for
BuildStream for the following reasons:
* Incremental builds and non-strict build plans require support for less
  strict cache keys where some changes in the input directory are ignored.
* The build of a single element may involve execution of multiple actions.
  We want to be able to cache the artifact for the overall build job.
* A BuildStream artifact includes metadata generated locally outside the
  sandbox, which means that the output directory from the ActionResult
  does not constitute a complete artifact.
* As the sources are included in the input directory, BuildStream can't
  check the ActionCache service without fetching the sources.

For BuildStream I'm thus proposing to add a small artifact cache service on
top of the Remote Execution API that maps from BuildStream cache keys to CAS
artifact directories. Adding an entry to the BuildStream cache is considered
a privileged action, only trusted clients will be permitted write access.

The Execution service may still internally use an ActionCache service, of
course. However, BuildStream will not directly use the ActionCache service
API.

¹ https://docs.google.com/document/d/1AaGk7fOPByEvpAbqeXIyE8HX_A3_axxNnvroblTZ_6s/edit
² https://grpc.io/

Architecturally this all sounds great, and is the missing piece we need
to enable cross architecture and cross OS builds.

I will mostly have concerns regarding the layout of dependencies which
come together to form this, and the installation story for BuildStream
users.

Currently we have a hard dependency on Linux for OSTree for the local
artifact cache, which is critical for operation, but OSTree is a mature
enough C library which is able to provide a stable API surface.

If we are to move to a CAS based artifact cache, I wonder what we will
depend on here. I understand that the dependencies shall all be
relatively easily buildable and so this should not effect long term
reproducibility of a system which can build systems, so that is not
where my concern lies.

Rather, if the CAS implementation we use for our local and remote
artifact cache is an external moving part, it should be API stable and
reliable, if it is not stable and reliable, then we should probably
have it live inside our repository until such a time that it is viable
to have as an external dependency, otherwise I suspect our user
installation and upgrade story is severely at risk.

Strategies for making sure this is solid need to be considered before
moving in this direction.

Beyond this, I should make clear that BuildStream should always
continue to work in the standalone environment it currently does, even
if we implement the remote workers etc, local execution should not be
compromised or require exorbitant dependencies.

CAS Artifact Cache
~~~~~~~~~~~~~~~~~~
As a first phase I'm proposing to introduce an artifact cache based on CAS
as defined in the Remote Execution API. This can completely replace the
existing tar and OSTree artifact cache backends with a single cross-platform
implementation.

As this is the first step, I will concentrate on this step; and let us
revisit the other parts when their times come.

This will add grpcio³ as hard dependency and requires code generation for
protocol buffers. To avoid a hard dependency on the code generator
(grpcio-tools), which cannot always be easily installed via pip (development
environment is required), I'm proposing to import the generated Python
code into the BuildStream repository. We can add a setup.py command to
make it easy to regenerate the code on systems where grpcio-tools is
available.

I expect some back and forth on this, I think we've all lived through
bad experiences stemming from generated files committed to the VCS.

We also will be shooting for distro adoption at some point, where it
will make more sense to just require that users who need a bleeding
edge version of BuildStream have the additional tooling to build it.

That said, I don't want this detail to stall your work in the short
term, just mentioning that I don't think it's desirable in the long
term.

I already pushed a Python implementation of a local-only CAS artifact cache
to a branch a while ago, see WIP merge request !337. I've also prototyped
support for push and pull and a toy server.

To get this to a mergeable state, a complete solution for the server side is
required. I.e., we need a CAS server that projects can install with suitable
instructions without introducing unreasonable dependencies. We also need the
BuildStream artifact cache service described in the previous section incl.
support for handling privileged push support.

Projects will be required to migrate their artifact cache servers from
OSTree to CAS.

This is a less important part of our API contract at least in the short
term, perhaps after this step has been completed, we can really declare
remote artifact cache servers as a "stable thing".

I'm not planning on supporting anything like OSTree's summary file, which
is a list of all available refs and the corresponding checksums. This means
that BuildStream will no longer check at the beginning of a session which
artifacts are downloadable and we can no longer skip build dependencies of
artifacts that are in the remote cache. Such checks will instead happen as
part of the pipeline. The reasons for the change are as follows:
* With artifact expiry, the artifact might no longer be available when
  we actually want to pull.
* Conversely, the artifact may become available on the remote server after
  the session has already started, see also #179.
* The OSTree summary file doesn't scale. The server has to rewrite a
  potentially huge file in a cron job and the client always has to download
  the whole file.
* We don't always know the cache keys at the beginning of the session,
  e.g., in non-strict mode or when tracking, so we need support for dynamic
  checks in the pipeline anyway.

If I understand this properly, this is not a problem.

It will be a very, very serious problem if the build plan demands that
build-of-build dependencies become locally cached as a part of the
build, though; please keep this in mind.

I dont think supporting this is going to be too difficult though, we
will already need to transform our queuing system such that:
  * We first try to pull an artifact
  * If the artifact cannot be pulled, we try to build it.

From an abstract point of view, it seems logical that we can
conditionally build the build-of-build dependencies, in the case which
depended on build dependencies cannot be pulled.

This could mean planning out the pipeline such that the build-of-build
dependencies are queued *after* direct build dependencies, and they are
*skipped* in the case that the direct build dependencies could in fact
be downloaded, causing the earlier elements in the pipeline to block in
a QueueStatus.WAIT state until the build dependencies are present.

The Remote Execution API is not stable yet, however, in this first phase we
control both client and server and we can support multiple protocol versions
in the server at the same time for transition periods.

This phase on its own does not enable remote execution. However, it still
provides benefits for local execution:
* On non-Linux systems we no longer need the tar artifact cache, which is
  much slower than OSTree/CAS and doesn't support remote caches.
* Support for LRU cache expiration is possible on the server as we can
  track artifact downloads.

This will also drop OSTree as a hard dependency on Linux. However, it will
remain an optional dependency for projects using the OSTree source plugin.

This all mostly sounds fine to me, and I look forward to dropping the
multiple artifact cache implementations in favor of a single one which
we have the power to control and improve such that it will work on any
platform we would ever want to support.

Also, LRU expiry on the remote cache, and probably an easier to use
install story for artifact share servers sounds like a great
improvement.

I have read through the rest of your email and while it seems mostly
sound, I suspect that we will have more discussions on those items
while we discover things during implementation, so I would rather not
elaborate on them too much.

The issue of depending on external things I have raised first is
important: we have to have a rock solid install and upgrade story,
which does not make BuildStream suddenly difficult to install or
upgrade, nor allow for situations where users are suddenly stuck with
incompatible versions of things.

Best Regards,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]