Re: [BuildStream] [PROPOSAL] Adopt Remote Asset API



Hi Sander,

Thanks for your work on the Remote Asset API and writing this proposal.
Overall this definitely makes sense to me.

On Mon, 2020-02-10 at 22:45 +0100, Sander Striker via buildstream-list
wrote:
TL;DR
In January the Remote Asset API[1] landed in the Remote APIs
repository.  The API opens up a number of opportunities for
standardization and consolidation.

I propose we:
- Retire Artifact Cache, use a Remote Asset API based Asset Cache
instead
- Retire Source Cache, use a Remote Asset API based Asset Cache
instead

Yes, it will be good to drop the requirement to deploy BuildStream-
specific servers.

- Introduce caching of individual sources, using a Remote Asset API
based Asset Cache
- Introduce tracking of individual sources, using the Remote Asset
API

A few questions/comments regarding the server implementation:
 * As far as I know, there is no server implementation yet, however,
   there are plans. Is there already a web page detailing these plans
   or even a repository?
 * Will the planned server be suitable for use in the BuildStream test
   suite as well?
 * We probably want a buildbox-casd-like local proxy for this, see also
   #1064. That proxy could double as a simple local server for testing
   (and possibly small-scale server deployments) as well, similar to
   buildbox-casd being usable without a remote. In that case, our focus
   should likely be on the local proxy and we wouldn't be blocked by
   the "real"/scalable server implementation.

## Retire Artifact Cache, use a Remote Asset API based Asset Cache
instead

Instead of a dedicated ArtifactService, we can leverage the Remote
Asset API FetchService and PushService.  We will retain the Artifact
message proto to describe an artifact.  However, we will associate it
remotely via PushService.PushBlob:
- PushBlobRequest.uris is
[ARTIFACT_URI_TEMPLATE.format(Artifact.strong_key),
                         
 ARTIFACT_URI_TEMPLATE.format(Artifact.weak_key)] 
- PushBlobRequest.blob_digest is the digest of the Artifact message. 
The Artifact message will need to be stored in CAS separately.
- PushBlobRequest.references_directories is [Artifact.files,
                                             Artifact.logs.digest,
                                             Artifact.buildtree,
                                             Artifact.sources]
depending on lifetime requirements.

Similarly, we will retrieve Artifacts using FetchService.FetchBlob:
- FetchBlobRequest.uris is ARTIFACT_URI_TEMPLATE.format(cache_key). 
The Artifact can be retrieved from CAS at
FetchBlobResponse.blob_digest.

This sounds reasonable. Same for the similar source cache replacement.

ARTIFACT_URI_TEMPLATE could be defined as
"urn:buildstream:artifact:{}".  However this would require
registration with IANA, see 
https://www.iana.org/assignments/urn-namespaces/urn-namespaces.xhtml.

Could it make sense to use "urn:fdc:buildstream.build:2020:" as prefix
instead?

## Introduce caching of individual sources, using a Remote Asset API
based Asset Cache

We can reduce the load and reliance on additional services by
leveraging the Remote Asset API.  CAS and Remote Asset API combined
can serve as a cache of the content, with the additional benefit that
having the content in CAS means it can be referred to in Remote
Execution without additional uploads.

To support this we need to extend the Source Plugin API to return the
list of URIs and qualifiers as needed by the FetchService. 
Specifically:
- FetchDirectoryRequest.uris is the complete set of URLs that
represent the content of the source.  This is the full set after
alias expansion.  For example: git.example.com/foo/bar.git and git-
mirror.example.com/foo/bar.git.
- FetchDirectoryRequest.qualifiers is the minimal set of qualifiers
that uniquely identifies the source.  It's expected to be closely
tied to the value of `ref`.  For example:
  - vcs.commit = b5123b1bb2853393c7b9aa43236db924d7e32d61
  - resource_type = application/x-git

Do we want to replace the existing `ref` mechanism to be suitable for
both mechanisms or do we want mostly independent API methods for the
two?

We will need to expand the Source Plugin API similarly to return the
list of URIs an qualifiers as needed by the PushService.  

Based on your examples the qualifiers used for push is a combination of
the qualifiers used for fetching and tracking. Can we design the API in
such a way that plugins don't have to duplicate the common parts?

Behavior should be configurable to support the following use cases:
- client does *not* use the Remote Asset API to fetch sources, and
only uses the source plugin native fetch
- client uses the Remote Asset API to fetch sources, and falls back
to using the source plugin native fetch
- client uses the Remote Asset API to fetch sources, and does *not*
fall back to using the source plugin native fetch
- client uses the Remote Asset API to push sources, after using the
source plugin native fetch
- client does *not* use the Remote Asset API to push sources

At first glance the number of possible behaviors seems rather high.
However, splitting the configuration up into a couple of orthogonal
knobs might make this simple enough.

# Implementation plan

Implementing the whole proposal will likely take some time, especially
if we include the effort for the local proxy. I would suggest focusing
on retiring artifact and source caches and deferring caching/tracking
of individual sources until the former is ready. Or do you think it's
important to tackle everything right away?

We may also want to keep the release of BuildStream 2.0 in mind. We
most likely don't want to release 2.0 with the current artifact/source
cache and then drop it soon after. However, caching and tracking of
individual sources is more like an extension and could probably be
added after 2.0. That said, fleshing out the source plugin API before
2.0 would be useful.

Cheers,
Jürg



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]