[BuildStream] bst-artifact-server plans



Hi all,

A few months ago we replaced the ReferenceStorage service with an
Artifact service, which works with BuildStream Artifact protos instead
of storing simple references to CAS directories. As that protocol is
BuildStream-specific there are no plans to support this in BuildGrid or
other Remote Execution API server implementations, as far as I know.

bst-artifact-server works and can soon¹ also be used in combination
with an external CAS server. However, for larger projects or
organizations, the simple filesystem storage may not be sufficiently
scalable, depending on the filesystem that is used.

In my opinion, a single Artifact service implementation with
configurable storage backends would be sensible. I propose the
following changes:

# Make the CAS service part in bst-artifact-server a simple proxy

This should support a local buildbox-casd instance or an external CAS
server (e.g., Buildbarn, Buildfarm, BuildGrid). This allows us to drop
the dependency on the BuildStream-internal CASCache API (which is now
anyway just an API on top of buildbox-casd).

# Move bst-artifact-server to a separate repository

The bst-artifact-server code base will get larger with additional
backends and features. As servers don't need the BuildStream client and
clients don't need the server, I think it makes sense to use separate
repositories. With bst-artifact-server no longer depending on
significant parts of BuildStream code, moving it into a new repository
should be relatively straight forward.

BuildStream's test suite will still need an artifact server, however,
if we release bst-artifact-server on PyPI, this should be automatically
installable by tox. I'd suggest creating the bst-artifact-server
repository in the BuildStream GitLab group.

# Configurable storage backends

As the next step we should make the storage backend for Artifact (and
Source) protos configurable. The simple filesystem storage should still
be supported but the plan is to add at least one scalable backend. The
first new backend will likely use PostgreSQL, allowing multi-master
operation, but other options exists as well, of course. For single
server deployments, an SQLite backend could be an interesting
alternative to the filesystem backend.

# Query support

To manage a server with a large number of artifacts, supporting queries
and deletions will be important. The details are not defined yet,
however, I expect us to create a CLI tool in the same repository for
this.


Any thoughts, concerns or comments on this proposal?

Cheers,
Jürg

¹ https://gitlab.com/BuildStream/buildstream/merge_requests/1540



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]