Re: [BuildStream] Source cache plan



Hi Sander and Jurg

On 18/01/2019 22:29, Sander Striker wrote:

[snip]

    So far I think the source cache needs the following API which all take a
    source
    object as an argument:
    * get_consistency: Returns the sources consistency, I propose that the
        'Consistency' type has additional field 'STAGED', which corresponds
    to the
        source being staged in the CAS but not the unstaged source in sourcedir.


I am not sure if it is clear what this means.  Given that I needed to re-read 
this to come up with my interpretation, it might be best to clarify.  
Specifically "the source being staged in the CAS" part.  Is this state 
essentially mapping to having the staged sources for the element (identified by 
a source cache key) available in CAS?


Yes, rather than keeping the sources unstaged in source directory as
buildstream currently does, sources will be staged and put in the CAS,
with the reference service pointing to the root when given the sources key.

    * fetch: depending on the consistency, will fetch from remote CAS or using
        source plugin.


This seems to fall into the same trap as we had with artifacts.  If we're 
staging for remote execution the only thing we're interested in fetching is the 
Tree representing the staged sources, but not the blobs (read: files).

At the end of the fallback to fetching from the actual source repositories, can 
we expect the staged sources to exist in CAS?

Yes, I think it would make sense to stage sources and put them in the
CAS once they been downloaded. Unless anyone thinks it would be better
to wait until the user wants them to be staged (i.e. don't stage them
into the CAS when just using the fetch command).

Do we need 2 separate calls here, one to just grab sources from CAS, and another 
to grab sources from the repositories
In the case of fallback I think we would only need to fetch the source
from repositories. Buildstream would them stage them in to the CAS and
push them to the remote source cache.

[snip]
    * stage: Also taking a virtual directory object, depending where this
    needs to
        be staged, this may involve importing a CAS based directory, or
    copying the
        staged file into a directory.


I'm not sure I follow.

This method should perhaps be renamed to export or similar, now that
it's the staged sources that. It would be used where the current source
method stage is used, and be used when, for example, exporting sources
to a sandbox.

    * init_workspace: This requires an unstaged source, and so may require
        fetching the source if the consistency is just 'STAGED'.


That's actually interesting.  I don't think it does mean that necessarily.  It 
may do for the git source case, and even in that case you could argue that using 
the staged source, and then _upgrading_ that to be a checked out clone of a git 
ref (this can be deferred as a later optimization, granted).  In the case of say 
a tar gz there should be no difference between staged and unstaged source (again 
this can be deferred as a later optimization - the 'upgrade' being effectively a 
no-op). Granted, his all gets a bit tricky when we're dealing with multiple 
sources for a single element...

Ah, yeah it isn't necessary for some other sources, the default for
init_workspace being the sources stage method. I'm not sure I follow the
upgrading the staged source, we probably don't want to replace the
staged source with the full git repo as we don't want this when building
elements.

On 21/01/2019 10:53, Jürg Billeter wrote:

Source cache, originally raised in [4], will use both local and remote CAS's to
store staged sources, preferentially trying to fetch from the remote cache(s),
and if not present, fetch from the actual source.

I suggest that the 'SourceCache' class be part of the context (similar to how
'ArtifactCache' is), and contains config related to source cache such as
which remote(s) to use and the local CAS object. When the element class deals
with sources it can now do it via the source cache rather than directly calling
source methods.

So far I think the source cache needs the following API which all take a source
object as an argument:

I think we should aim for consistency with ArtifactCache/Element, at
least to some extent. The Element class is the one with the logic to
check/use ArtifactCache and ArtifactCache doesn't invoke any real
Element methods (besides getting cache key and name). Your proposal
does it the other way round, putting SourceCache in control.

Either approach could make sense, however, I think we should use the
same approach for both ArtifactCache and SourceCache, unless I'm
overlooking a fundamental difference.

Do you prefer the approach you've described? If so, do you think it
would make sense to also change ArtifactCache/Element?


Sorry, maybe I wasn't clear, I *do* want to do it the same way as
artifact cache, with the elements calling the source cache. It will also
be much easier to implement it this way (effectively just replacing
calls to source objects with calls to the source cache).

I think an important discussion missing is how to extend the public
Source API to allow for efficient operation, without requiring plugins
to stage to a temporary directory. Although, maybe we can and should
tackle this as a follow-up step, possibly after switching to a CAS-
based workflow also for local builds (using BuildBox)

Yes, there are definitely extensions the source cache API that would be
useful. At the moment I'm just thinking about what a source cache needs
in order to replace current source functionality.

* A sources '_preflight' isn't necessary if we only require the staged sources.
   Should this check be removed in the cases where it's not needed?
Some of these points may be optimisations that can be added later.

Can we be sure that `get_unique_key()` never requires `preflight()`?

Ah actually having another look, we may be tracking the source which
will require `preflight`,. so the check should probably remain.

Cheers,
Jürg

Cheers,
Raoul


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]