Re: Proposal for Remote Execution



On Thu, 2018-04-19 at 17:06 +0900, Tristan Van Berkom wrote:
If we are to move to a CAS based artifact cache, I wonder what we will
depend on here. I understand that the dependencies shall all be
relatively easily buildable and so this should not effect long term
reproducibility of a system which can build systems, so that is not
where my concern lies.

Rather, if the CAS implementation we use for our local and remote
artifact cache is an external moving part, it should be API stable and
reliable, if it is not stable and reliable, then we should probably
have it live inside our repository until such a time that it is viable
to have as an external dependency, otherwise I suspect our user
installation and upgrade story is severely at risk.

The plan is indeed to have the implementation live in our repository.
One issue is that the whole Remote Execution API is still subject to
change, including CAS. However, with the implementation under our
control, we should be able to provide a graceful migration path for
CAS, if necessary.

CAS Artifact Cache
~~~~~~~~~~~~~~~~~~
[...]

This will add grpcio³ as hard dependency and requires code generation for
protocol buffers. To avoid a hard dependency on the code generator
(grpcio-tools), which cannot always be easily installed via pip (development
environment is required), I'm proposing to import the generated Python
code into the BuildStream repository. We can add a setup.py command to
make it easy to regenerate the code on systems where grpcio-tools is
available.

I expect some back and forth on this, I think we've all lived through
bad experiences stemming from generated files committed to the VCS.

We also will be shooting for distro adoption at some point, where it
will make more sense to just require that users who need a bleeding
edge version of BuildStream have the additional tooling to build it.

That said, I don't want this detail to stall your work in the short
term, just mentioning that I don't think it's desirable in the long
term.

Yes, long term we may be able to depend on a stable Python package that
provides all of this, or rely on distro packaging.

Projects will be required to migrate their artifact cache servers from
OSTree to CAS.

This is a less important part of our API contract at least in the short
term, perhaps after this step has been completed, we can really declare
remote artifact cache servers as a "stable thing".

We probably should postpone this until the Remote Execution API itself
is declared stable upstream.

I'm not planning on supporting anything like OSTree's summary file, which
is a list of all available refs and the corresponding checksums. This means
that BuildStream will no longer check at the beginning of a session which
artifacts are downloadable and we can no longer skip build dependencies of
artifacts that are in the remote cache. Such checks will instead happen as
part of the pipeline. The reasons for the change are as follows:
* With artifact expiry, the artifact might no longer be available when
  we actually want to pull.
* Conversely, the artifact may become available on the remote server after
  the session has already started, see also #179.
* The OSTree summary file doesn't scale. The server has to rewrite a
  potentially huge file in a cron job and the client always has to download
  the whole file.
* We don't always know the cache keys at the beginning of the session,
  e.g., in non-strict mode or when tracking, so we need support for dynamic
  checks in the pipeline anyway.

If I understand this properly, this is not a problem.

It will be a very, very serious problem if the build plan demands that
build-of-build dependencies become locally cached as a part of the
build, though; please keep this in mind.

I dont think supporting this is going to be too difficult though, we
will already need to transform our queuing system such that:
  * We first try to pull an artifact
  * If the artifact cannot be pulled, we try to build it.

From an abstract point of view, it seems logical that we can
conditionally build the build-of-build dependencies, in the case which
depended on build dependencies cannot be pulled.

This could mean planning out the pipeline such that the build-of-build
dependencies are queued *after* direct build dependencies, and they are
*skipped* in the case that the direct build dependencies could in fact
be downloaded, causing the earlier elements in the pipeline to block in
a QueueStatus.WAIT state until the build dependencies are present.

As discussed on IRC, I'll try to support dynamic queueing of build-only 
dependencies, which should solve this issue.

Jürg


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]