Re: Storing Source references (e.g. commit shas) separately





On Wed, Jan 17, 2018 at 3:45 PM Tristan Van Berkom <tristan vanberkom codethink co uk> wrote:
On Wed, 2018-01-17 at 13:17 +0000, Sander Striker wrote:
[...]
> This makes the case for not having the separation of --track and --track-save.  Or rather reconsolidate?

It's a bit of an annoying detail indeed - the distinction here becomes
rather pointless, it's usefulness is replaced by something a bit more
complete I think.

It may possibly be useful for autobuilders which simply populate a
shared artifact cache, and want to run:

   git pull --rebase
   bst build target.bst

In a loop or on a timeout.

Although the convenience is rather unneeded in a situation where you
can script it anyway...

Indeed.  So we should probably retire this distinction sooner rather than later, and re-simplify.
 
> > Proposed Solution
> > ~~~~~~~~~~~~~~~~~
> > What I'm proposing to fix this, is to store the source references in
> > a separate YAML dictionary stored beside `project.conf`. For the sake
> > of discussion, let's call this `project.refs`.
> >
> > This is an interesting solution because BuildStream project maintainers
> > (i.e. those who maintain BuildStream projects in YAML) can decide
> > whether or not to revision this file; or only revision the file on
> > tagged release commits.
>
> Thinking about this, is the idea to separate out essentially the bits that would be modified by bst?
> If not, then is separating out the refs sufficient?
> Will there be any other fields that a specialized source plugin could write?  e.g. an etag for a http source?

Very nice catch, the answer I think is "Yes, and Yes".
 
This should not present an issue for ETags, let's elaborate:

* The refs are not required to be a simple string; any simple
  serializable object will do (bzr plugin iirc uses a dictionary with a
  few fields).

* What BuildStream modifies when tracking has always been the ref, and

this is difficult to change because of the required set_ref/get_ref

semantics we use to apply a newly discovered ref to the data model
  in
the main process.

* The If-Not-Match etag thing can be a part of the ref, and this can
  be implemented in a backwards compatible fashion (by supporting
  both string or dictionary ref in the plugin, and bumping it's format
  version for the new feature).

* Interestingly, the Source plugin is responsible for implementing
  Plugin.get_unique_key() with details that cause the input to be
  unique. So even if the ref field contains the ETag, the ETag itself
  need not necessarily be considered in the resulting cache key.

That last point is a bit mind boggling - but interesting because:

  A.) As a convention, we use the aliased URI for cache keys, so
      as to allow building from different mirrors without effecting
      cache key.

  B.) Should we start downloading the exact same tarball from a new
      mirror, tracking may discover a new ETag for it, however if the
      sha256sum is the same, we also need not change the cache key.

Does this alleviate your concerns regarding the ETag optimization ?

It was just an example of one case where there was more than one field.
Thanks for the elaborate answer, this addresses my concerns regarding
that.

I was thinking that maybe it would make sense to separate source out
altogether, assuming build commands are more stable than source
definitions.  Maybe not.  Limiting the scope to the fields that bst modifies
as proposed may be a better idea to move forward.  Food for thought.

Cheers,

Sander
 

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]