Re: Feature proposal: multiple cache support



On Wed, 2017-10-11 at 15:52 +0100, Sam Thursfield wrote:
Hello

This is a proposal to:

   * allow projects and users to specify multiple artifact caches for
     pulling and pushing.

   * make pipelines pull artifacts from any cache that has a
     given artifact available, in a 'priority' order

   * make pipelines push artifacts to the highest 'priority' cache by
     default

   * add a `bst push --cache=$name` option to allow pushing to one
     of the other configured caches

We have a few use cases in mind for this feature.

   * A project that operates multiple caches with different expiry
     algoriths, maybe one cache with artifacts built from release
     branches and another cache for artifacts built from 'master'

   * A user or team that wants to set up a local cache for artifacts
     built from their work-in-progress branches, but wants to still
     be able to use pre-built artifacts from project-wide autobuilds

   * Projects that depend on other projects, where each project has
     its own autobuilder and cache. [Recursive pipelines a.k.a inter-
     project dependencies aren't yet possible with BuildStream, but it's
     been planned for a while and Jürg is going to propose it as a
     feature very soon].


Hi,

So I have a few points of contention on this proposal but it's
heart is in the right place :)


   A.) I feel that we are approaching the problem in the wrong
       order, I dislike that we have an unstable configuration
       user story for artifact caches already, and this proposal
       adds another layer to this.

       We already have the issue that push/pull URIs are not
       canonical, which implies that this configuration will break,
       we should be careful not to break the configuration API
       more often than we absolutely must (it's mostly going to be
       unstable for the sake of issue 112:
       https://gitlab.com/BuildStream/buildstream/issues/112)

   B.) Related to (A), I think the naming of artifact caches
       is really just tedium in the absence of fixing issue 112
       first.

         `bst pull --cache bst://canonical-url.com/artifacts`

       Will be nicer and demand less of users than having to
       remember that this push/pull url is related to that name,
       etc.

   C.) I dont really see the value add of configuring multiple
       caches for pushing of artifacts, while the value add for
       multiple fallback artifact shares for *pulling* is clear
       and was discussed at length at GUADEC.

       Having a semantic for pushing to multiple caches explicitly
       on the command line I can see obvious use cases for, but
       I'm not sold on what configuring multiple push URLs in
       the configurations, and what that might achieve.

   D.) In your presented use cases, I think the first two are
       handled pretty much in the same way.

       Your third use case related to recursive pipelines however
       I believe is moot, this should already be easily supported
       without any added configuration from this proposal.

       When it comes to recursive project dependencies, each project
       already has the first word on what artifact cache to use
       separately, and the user is able to override those on a
       project name basis in their configuration - so this should
       automatically be supported.

   E.) Rationale behind allowing explicit disabling of project.conf
       declared artifact caches

         "Note that user config overrides project config *completely*. If I have
          an empty 'artifacts' section for a project in my buildstream.conf, that
          means "ignore everything from project.conf". This will be inconvenient
          for some use cases, but it allows removing caches from the project.conf
          that one may not have access to on certain machines."

       This is weird.

       Can you explain why it is necessary to eliminate caches defined
       by the project.conf in the user configuration in the case which
       the user cannot authenticate or reach the remote server ?

       I mean, can you explain why it is necessary beyond bugs in our
       own code which might cause hangs or delays, which should be
       fixed in our own code _anyway_ ?

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]