Re: [BuildStream] Source cache plan

From: Jürg Billeter <j bitron ch>
To: Sander Striker <s striker striker nl>, Raoul Hidalgo Charman <raoul hidalgocharman codethink co uk>
Cc: BuildStream <buildstream-list gnome org>
Subject: Re: [BuildStream] Source cache plan
Date: Mon, 21 Jan 2019 12:10:27 +0100

Hi Sander,

On Fri, 2019-01-18 at 23:29 +0100, Sander Striker via BuildStream-list
wrote:

Hi,

On Thu, Jan 17, 2019 at 1:57 PM Raoul Hidalgo Charman via BuildStream-list <
buildstream-list gnome org> wrote:
[...]

* fetch: depending on the consistency, will fetch from remote CAS or using
   source plugin.


This seems to fall into the same trap as we had with artifacts.  If we're
staging for remote execution the only thing we're interested in fetching is
the Tree representing the staged sources, but not the blobs (read: files).


While I agree, I think we should be consistent with artifacts in
regards to this as well. I.e., if we merge SourceCache before properly
supporting partial CAS for ArtifactCache, let's also not use partial
CAS for SourceCache just yet and migrate them together. Otherwise, the
inconsistency might cause a lot of confusion and bugs.

* push: during the push stage, check if the remotes have the ref and if not push
   the staged source.


So really what this is doing under the hood is calling FindMissingBlobs on
the remote CAS, uploading any missing blobs, and then calling the
SourceCache service to put in a Source for a given source cache key?


We should probably first check whether it's already in the remote
SourceCache, which should trigger FindMissingBlobs on the server side
in the future. Same approach as for artifacts.

* Should the unstaged source also be pushed to the remote CAS, or an option to
   allow this? This would prevent additional fetching when a user wants to
   initialise a workspace.


The unstaged source?  I would say, probably not.


Agreed. We might want to revisit this, if and when we will consider
remote fetching of sources, but for now I would only push 'staged'
sources.

The artifact as a proto proposal [5] may also affect this, if reference services
are to distinguish between artifacts and directory objects, but I think this
also makes sense to be added later if this goes ahead.


I would prefer to start with stronger semantics here.


While I expect us to switch to that approach for artifacts, I'm here
also in favor of consistency. There is no point in a generic
ReferenceStorage service if it's used for artifacts but not for
sources. So either we decide to move away from the generic service,
creating separate services for artifacts and sources, or we keep the
generic service and then use that also for sources. Following a mixed
approach doesn't make sense to me.

This leads to the question whether we should move away from the generic
service for artifacts before implementing/landing SourceCache, to avoid
changing it shortly after. Maybe something to discuss at the gathering.

Cheers,
Jürg

References:
- [BuildStream] Source cache plan
  - From: Raoul Hidalgo Charman
- Re: [BuildStream] Source cache plan
  - From: Sander Striker

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]