[BuildStream] [PROPOSAL] Adopt Remote Asset API



TL;DR
In January the Remote Asset API[1] landed in the Remote APIs repository.  The API opens up a number of opportunities for standardization and consolidation.

I propose we:
- Retire Artifact Cache, use a Remote Asset API based Asset Cache instead
- Retire Source Cache, use a Remote Asset API based Asset Cache instead
- Introduce caching of individual sources, using a Remote Asset API based Asset Cache
- Introduce tracking of individual sources, using the Remote Asset API

Note that a change to the Source Plugin API is required for the last two items.

Cheers,

Sander

[1] https://github.com/bazelbuild/remote-apis/blob/master/build/bazel/remote/asset/v1/remote_asset.proto

## Retire Artifact Cache, use a Remote Asset API based Asset Cache instead

Instead of a dedicated ArtifactService, we can leverage the Remote Asset API FetchService and PushService.  We will retain the Artifact message proto to describe an artifact.  However, we will associate it remotely via PushService.PushBlob:
- PushBlobRequest.uris is [ARTIFACT_URI_TEMPLATE.format(Artifact.strong_key),
                           ARTIFACT_URI_TEMPLATE.format(Artifact.weak_key)]
- PushBlobRequest.blob_digest is the digest of the Artifact message.  The Artifact message will need to be stored in CAS separately.
- PushBlobRequest.references_directories is [Artifact.files,
                                             Artifact.logs.digest,
                                             Artifact.buildtree,
                                             Artifact.sources] depending on lifetime requirements.

Similarly, we will retrieve Artifacts using FetchService.FetchBlob:
- FetchBlobRequest.uris is ARTIFACT_URI_TEMPLATE.format(cache_key).  The Artifact can be retrieved from CAS at FetchBlobResponse.blob_digest.

ARTIFACT_URI_TEMPLATE could be defined as "urn:buildstream:artifact:{}".  However this would require registration with IANA, see https://www.iana.org/assignments/urn-namespaces/urn-namespaces.xhtml.


## Retire Source Cache, use a Remote Asset API based Asset Cache instead

Instead of a dedicated SourceService, we can leverage the Remote Asset API FetchService and PushService.  We will retain the Source message proto to describe an source asset.  However, we will associate it remotely via PushService.PushBlob:
- PushBlobRequest.uris is SOURCE_URI_TEMPLATE.format(cache_key)
- PushBlobRequest.blob_digest is the digest of the Source message.  The Source message will need to be stored in CAS separately.
- PushBlobRequest.references_directories is Source.files.

Similarly, we will retrieve Sources using FetchService.FetchBlob:
- FetchBlobRequest.uris is SOURCE_URI_TEMPLATE.format(cache_key).  The Source can be retrieved from CAS at FetchBlobResponse.blob_digest.

SOURCE_URI_TEMPLATE could be defined as "urn:buildstream:source:{}".  However this would require registration with IANA, see https://www.iana.org/assignments/urn-namespaces/urn-namespaces.xhtml.


## Introduce caching of individual sources, using a Remote Asset API based Asset Cache

We can reduce the load and reliance on additional services by leveraging the Remote Asset API.  CAS and Remote Asset API combined can serve as a cache of the content, with the additional benefit that having the content in CAS means it can be referred to in Remote Execution without additional uploads.

To support this we need to extend the Source Plugin API to return the list of URIs and qualifiers as needed by the FetchService.  Specifically:
- FetchDirectoryRequest.uris is the complete set of URLs that represent the content of the source.  This is the full set after alias expansion.  For example: git.example.com/foo/bar.git and git-mirror.example.com/foo/bar.git.
- FetchDirectoryRequest.qualifiers is the minimal set of qualifiers that uniquely identifies the source.  It's expected to be closely tied to the value of `ref`.  For example:
  - vcs.commit = b5123b1bb2853393c7b9aa43236db924d7e32d61
  - resource_type = application/x-git

We will need to expand the Source Plugin API similarly to return the list of URIs an qualifiers as needed by the PushService.  Specifically:
- PushDirectoryRequest.uris is the complete set of URLs that represent the content of the source.  This is the full set after alias expansion.  For example: git.example.com/foo/bar.git and git-mirror.example.com/foo/bar.git.
- PushDirectoryRequest.qualifiers is the complete set of qualifiers associated with the source.  For example:
  - vcs.commit = b5123b1bb2853393c7b9aa43236db924d7e32d61
  - resource_type = application/x-git
  - vcs.branch = master

Behavior should be configurable to support the following use cases:
- client does *not* use the Remote Asset API to fetch sources, and only uses the source plugin native fetch
- client uses the Remote Asset API to fetch sources, and falls back to using the source plugin native fetch
- client uses the Remote Asset API to fetch sources, and does *not* fall back to using the source plugin native fetch
- client uses the Remote Asset API to push sources, after using the source plugin native fetch
- client does *not* use the Remote Asset API to push sources

## Introduce tracking of individual sources, using the Remote Asset API

We can reduce the load and reliance on additional services by leveraging the Remote Asset API.  In short, translate For example, instead of having all clients poll git services, the FetchService.FetchDirectory API is used to resolve the commit at a certain branch.  As clients all track to the same revision, in cache hits are more likely for sources, artifacts and actions.

To support this we need to extend the Source Plugin API to return the list of URIs and qualifiers as needed by the FetchService.  Specifically:
- FetchDirectoryRequest.uris is the complete set of URLs that represent the content of the source.  This is the full set after alias expansion.  For example: git.example.com/foo/bar.git and git-mirror.example.com/foo/bar.git.
- FetchDirectoryRequest.qualifiers is the minimal set of qualifiers that identifies the source.  This must exclude the commit for the breanch.  For example:
  - vcs.branch = master
  - resource_type = application/x-git

In the response the client will learn the Digest of the source as well as all other qualifiers the service knows about.  This would include identifying information the source plugin would use in its `ref`. For example:
- FetchDirectoryResponse.uri is the URL that matched, that represent the content of the source.  For example: git.example.com/foo/bar.git.
- FetchDirectoryResponse.qualifiers is the complete set of qualifiers associated with the source.  For example:
  - *vcs.commit = b5123b1bb2853393c7b9aa43236db924d7e32d61*
  - resource_type = application/x-git
  - vcs.branch = master

Behavior should be configurable to support the following use cases:
- client does *not* use the Remote Asset API to track sources, and only uses the source plugin native track
- client uses the Remote Asset API to track sources, and falls back to using the source plugin native track
- client uses the Remote Asset API to track sources, and does *not* fall back to using the source plugin native track
- client uses the Remote Asset API to push sources, after using the source plugin native track and fetch
- client does *not* use the Remote Asset API to push sources
- client is configured to only accept results younger than a certain age.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]