[BuildStream] Source cache plan



Hi everyone,

Here's a bit of an update on the artifact cache refactor, and a plan for the
source cache to start discussion on implementation.

Artifact cache refactor
-----------------------

In the refactor artifact cache MR [1], Jurg pointed out that some of the work of the refactor while useful, is not necessary for source cache work to go ahead. As a result the MR has been split with the first part, involving renaming and
moving modules around, is here [2] and now merged.

The second part, which involves changing the CASCache and CASRemote API to split them more strictly can be postponed, though the work will be necessary if we are to implement features that involve a remote CAS and a local directory, such as if we want to upload a source straight to a remote CAS and skip the local CAS. Some further work still needs to be done for this to be merged, discussion of
which is on [2].

A further issue regarding the structure of cache directories raised in
discussions is here [3], which probably wants to be done before source cache
work.

Source cache plan
-----------------

Source cache, originally raised in [4], will use both local and remote CAS's to store staged sources, preferentially trying to fetch from the remote cache(s),
and if not present, fetch from the actual source.

I suggest that the 'SourceCache' class be part of the context (similar to how
'ArtifactCache' is), and contains config related to source cache such as
which remote(s) to use and the local CAS object. When the element class deals with sources it can now do it via the source cache rather than directly calling
source methods.

So far I think the source cache needs the following API which all take a source
object as an argument:
* get_consistency: Returns the sources consistency, I propose that the
  'Consistency' type has additional field 'STAGED', which corresponds to the
  source being staged in the CAS but not the unstaged source in sourcedir.
* fetch: depending on the consistency, will fetch from remote CAS or using
  source plugin.
* push: during the push stage, check if the remotes have the ref and if not push
  the staged source.
* stage: Also taking a virtual directory object, depending where this needs to   be staged, this may involve importing a CAS based directory, or copying the
  staged file into a directory.
* init_workspace: This requires an unstaged source, and so may require
  fetching the source if the consistency is just 'STAGED'.

There are some additional questions that warrant discussion:
* Do we want to use the same remote CAS as the artifact CAS, or should the user
  configure both separately?
* Should the unstaged source also be pushed to the remote CAS, or an option to
  allow this? This would prevent additional fetching when a user wants to
  initialise a workspace.
* A sources '_preflight' isn't necessary if we only require the staged sources.
  Should this check be removed in the cases where it's not needed?
Some of these points may be optimisations that can be added later.

The artifact as a proto proposal [5] may also affect this, if reference services
are to distinguish between artifacts and directory objects, but I think this
also makes sense to be added later if this goes ahead.

Points and criticisms appreciated.

Cheers,
Raoul

[1] https://gitlab.com/BuildStream/buildstream/merge_requests/1013
[2] https://gitlab.com/BuildStream/buildstream/merge_requests/1071
[3] https://gitlab.com/BuildStream/buildstream/issues/870
[4] https://gitlab.com/BuildStream/buildstream/issues/440
[5] https://mail.gnome.org/archives/buildstream-list/2019-January/msg00013.html



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]