Re: [BuildStream] Proposal: Artifact as a Proto



On Fri, Jan 18, 2019 at 14:21:11 +0100, Sander Striker via BuildStream-list wrote:
We want the server to know which blobs are referenced by a particular
ref. This allows the server to influence cache expiry of blobs based on
GetReference requests. I.e., the server needs to be able to decode the
referenced object. And thus, I think only two options make sense: keep
using the directory proto-based approach with a generic service or
define a separate artifact-specific service where we can use the
artifact proto.


My vote would be with the latter.  I would like us to think about artifact
caching a lot more like we do about action caching.  Instead of assuming
you're talking to one service, you are actually dealing with two:
ArtifactCache and CAS.

Absolutely, and then with the introduction of SourceCache, there'd be a third
service associated with that CAS.

I would expect the ArtifactCache service to return Artifact protos.

As would I.

In its implementation it could use FindMissingBlobs to ensure the Artifact
has the non-optional content.  If not it can remove the Artifact mapping
and return that there is not Artifact for this cache key [ref].  That
allows ArtifactCache and CAS to remain a bit more decoupled.  With respect
to giving guarantees of availability is a separate concern, and has more to
do with CAS lifetimes and thus which CAS endpoint you are pointing to.

In addition, it could use the same process on the optional components which the
Artifact it has stored claims to still have, and *remove* the entires before
returning it if the optional parts are no longer satisfiable from the CAS.
E.g. if an artifact claims to have a 'buildtree' but GetTree / FindMissingBlobs
doesn't indicate it's all there, then remove the claim before returning the
incomplete, but still valid, Artifact to the caller.

Whether the implementation of the service uses a direct mapping of
artifact_cache_key: artifact to Artifact, or whether it stores the actual
Artifact in CAS and uses a mapping of artifact cache_key to artifact
Digest, I'm ambivalent on.

My preference is for the Artifact proto to be held outside of the CAS since it
is nominally mutable and acts as a GC anchor.  Also it removes an entire layer
of indirection for the service handler *and* ensures that the CAS remains
effectively a collection of objects only as defined by RemoteExecution and
friends.

D.

-- 
Daniel Silverstone                          https://www.codethink.co.uk/
Solutions Architect               GPG 4096/R Key Id: 3CCE BABE 206C 3B69


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]