[BuildStream] Source cache plan
- From: Raoul Hidalgo Charman <raoul hidalgocharman codethink co uk>
- To: buildstream-list gnome org
- Subject: [BuildStream] Source cache plan
- Date: Thu, 17 Jan 2019 12:56:57 +0000
Hi everyone,
Here's a bit of an update on the artifact cache refactor, and a plan for the
source cache to start discussion on implementation.
Artifact cache refactor
-----------------------
In the refactor artifact cache MR [1], Jurg pointed out that some of the
work of
the refactor while useful, is not necessary for source cache work to go
ahead.
As a result the MR has been split with the first part, involving
renaming and
moving modules around, is here [2] and now merged.
The second part, which involves changing the CASCache and CASRemote API
to split
them more strictly can be postponed, though the work will be necessary
if we are
to implement features that involve a remote CAS and a local directory,
such as
if we want to upload a source straight to a remote CAS and skip the
local CAS.
Some further work still needs to be done for this to be merged,
discussion of
which is on [2].
A further issue regarding the structure of cache directories raised in
discussions is here [3], which probably wants to be done before source cache
work.
Source cache plan
-----------------
Source cache, originally raised in [4], will use both local and remote
CAS's to
store staged sources, preferentially trying to fetch from the remote
cache(s),
and if not present, fetch from the actual source.
I suggest that the 'SourceCache' class be part of the context (similar
to how
'ArtifactCache' is), and contains config related to source cache such as
which remote(s) to use and the local CAS object. When the element class
deals
with sources it can now do it via the source cache rather than directly
calling
source methods.
So far I think the source cache needs the following API which all take a
source
object as an argument:
* get_consistency: Returns the sources consistency, I propose that the
'Consistency' type has additional field 'STAGED', which corresponds
to the
source being staged in the CAS but not the unstaged source in sourcedir.
* fetch: depending on the consistency, will fetch from remote CAS or using
source plugin.
* push: during the push stage, check if the remotes have the ref and if
not push
the staged source.
* stage: Also taking a virtual directory object, depending where this
needs to
be staged, this may involve importing a CAS based directory, or
copying the
staged file into a directory.
* init_workspace: This requires an unstaged source, and so may require
fetching the source if the consistency is just 'STAGED'.
There are some additional questions that warrant discussion:
* Do we want to use the same remote CAS as the artifact CAS, or should
the user
configure both separately?
* Should the unstaged source also be pushed to the remote CAS, or an
option to
allow this? This would prevent additional fetching when a user wants to
initialise a workspace.
* A sources '_preflight' isn't necessary if we only require the staged
sources.
Should this check be removed in the cases where it's not needed?
Some of these points may be optimisations that can be added later.
The artifact as a proto proposal [5] may also affect this, if reference
services
are to distinguish between artifacts and directory objects, but I think this
also makes sense to be added later if this goes ahead.
Points and criticisms appreciated.
Cheers,
Raoul
[1] https://gitlab.com/BuildStream/buildstream/merge_requests/1013
[2] https://gitlab.com/BuildStream/buildstream/merge_requests/1071
[3] https://gitlab.com/BuildStream/buildstream/issues/870
[4] https://gitlab.com/BuildStream/buildstream/issues/440
[5]
https://mail.gnome.org/archives/buildstream-list/2019-January/msg00013.html
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]