Re: Storing Source references (e.g. commit shas) separately



On Wed, 2018-01-17 at 13:17 +0000, Sander Striker wrote:
[...]

So far so good - but we failed to look at the bigger picture here.


  A.) If you want to open a workspace, BuildStream needs to know what
      to use, we dont know what to stage into a workspace in the
      absence of a reference.

  B.) If you've run a build with `bst build --track` options and
      without saving the refs, congratulations. You cannot test it.

      o Since the refs were never stored, you cannot run a `bst shell`
        on what you've built, let alone be confident that its the exact
        binary output of what you just built (a later `bst track`
        invocation can pull new references which occurred upstream
        after your build completed).

      o Neither can you run `bst checkout` to obtain the output of what
        you just built, without the references of what it was.

This makes the case for not having the separation of --track and --track-save.  Or rather reconsolidate?

It's a bit of an annoying detail indeed - the distinction here becomes
rather pointless, it's usefulness is replaced by something a bit more
complete I think.

It may possibly be useful for autobuilders which simply populate a
shared artifact cache, and want to run:

   git pull --rebase
   bst build target.bst

In a loop or on a timeout.

Although the convenience is rather unneeded in a situation where you
can script it anyway...


Proposed Solution
~~~~~~~~~~~~~~~~~
What I'm proposing to fix this, is to store the source references in
a separate YAML dictionary stored beside `project.conf`. For the sake
of discussion, let's call this `project.refs`.

This is an interesting solution because BuildStream project maintainers
(i.e. those who maintain BuildStream projects in YAML) can decide
whether or not to revision this file; or only revision the file on
tagged release commits.

Thinking about this, is the idea to separate out essentially the bits that would be modified by bst?
If not, then is separating out the refs sufficient?
Will there be any other fields that a specialized source plugin could write?  e.g. an etag for a http 
source?

Very nice catch, the answer I think is "Yes, and Yes".

This should not present an issue for ETags, let's elaborate:

* The refs are not required to be a simple string; any simple
  serializable object will do (bzr plugin iirc uses a dictionary with a
  few fields).

* What BuildStream modifies when tracking has always been the ref, and
 
this is difficult to change because of the required set_ref/get_ref
 
semantics we use to apply a newly discovered ref to the data model
  in
the main process.

* The If-Not-Match etag thing can be a part of the ref, and this can
  be implemented in a backwards compatible fashion (by supporting
  both string or dictionary ref in the plugin, and bumping it's format
  version for the new feature).

* Interestingly, the Source plugin is responsible for implementing
  Plugin.get_unique_key() with details that cause the input to be
  unique. So even if the ref field contains the ETag, the ETag itself
  need not necessarily be considered in the resulting cache key.

That last point is a bit mind boggling - but interesting because:

  A.) As a convention, we use the aliased URI for cache keys, so
      as to allow building from different mirrors without effecting
      cache key.

  B.) Should we start downloading the exact same tarball from a new
      mirror, tracking may discover a new ETag for it, however if the
      sha256sum is the same, we also need not change the cache key.

Does this alleviate your concerns regarding the ETag optimization ?

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]