Re: Feature proposal: multiple cache support



On 12/10/17 09:10, Tristan Van Berkom wrote:
    A.) I feel that we are approaching the problem in the wrong
        order, I dislike that we have an unstable configuration
        user story for artifact caches already, and this proposal
        adds another layer to this.

        We already have the issue that push/pull URIs are not
        canonical, which implies that this configuration will break,
        we should be careful not to break the configuration API
        more often than we absolutely must (it's mostly going to be
        unstable for the sake of issue 112:
        https://gitlab.com/BuildStream/buildstream/issues/112)

I'm a bit confused by "unstable user configuration story".... however issues #111 (no artifact sharing for TarCache users) and #112 (separate but related pull-url and push-url configuration) make sense.


    B.) Related to (A), I think the naming of artifact caches
        is really just tedium in the absence of fixing issue 112
        first.

          `bst pull --cache bst://canonical-url.com/artifacts`

        Will be nicer and demand less of users than having to
        remember that this push/pull url is related to that name,
        etc.


    C.) I dont really see the value add of configuring multiple
        caches for pushing of artifacts, while the value add for
        multiple fallback artifact shares for *pulling* is clear
        and was discussed at length at GUADEC.

        Having a semantic for pushing to multiple caches explicitly
        on the command line I can see obvious use cases for, but
        I'm not sold on what configuring multiple push URLs in
        the configurations, and what that might achieve.

From a user experience point of view, I completely agree.

However I'm not sure we can make a single URL work without massive work involved. For example, what actually goes on when pushing to `bst://canonical-url.com/artifacts` -- currently we log in over SSH, which means BuildStream needs to know the username and port. So either...

* the canonical URL is actually the less convenient `bst://ostree canonical-url com:22200/artifacts` * or there's some separate unwritten config to set username and port, and we're back to square 1 of the URL not actually being canonical * or we write and maintain our own internet-facing daemon that routes things in the correct way, which means you're now maintaining a security sensitive component that's doing user authentication

Git is not something to hold up as an example of good user experience design, but the fact that Git still uses named remotes with separate push and pull URLs after 12 years suggests that it's not an easy thing to solve.

    D.) In your presented use cases, I think the first two are
        handled pretty much in the same way.

        Your third use case related to recursive pipelines however
        I believe is moot, this should already be easily supported
        without any added configuration from this proposal.

        When it comes to recursive project dependencies, each project
        already has the first word on what artifact cache to use
        separately, and the user is able to override those on a
        project name basis in their configuration - so this should
        automatically be supported.

Great, we can leave that aspect of recursive pipelines up to Jürg.

    E.) Rationale behind allowing explicit disabling of project.conf
        declared artifact caches

          "Note that user config overrides project config *completely*. If I have
           an empty 'artifacts' section for a project in my buildstream.conf, that
           means "ignore everything from project.conf". This will be inconvenient
           for some use cases, but it allows removing caches from the project.conf
           that one may not have access to on certain machines."

        This is weird.

Agreed -- but see below

        Can you explain why it is necessary to eliminate caches defined
        by the project.conf in the user configuration in the case which
        the user cannot authenticate or reach the remote server ?

        I mean, can you explain why it is necessary beyond bugs in our
        own code which might cause hangs or delays, which should be
        fixed in our own code _anyway_ ?

There will always be a delay when contacting a server that isn't reachable. But there's no way for code to tell the difference between a server that's not responding, a temporary network glitch, and a server that has responded but there's a 3 second lag.

Another way to solve this might be to allow users to configure the timeout on the commandline. If you're on a fast network with a firewall that eats packets, you can set your timeout low and have a minimal impact on startup time when buildstream contacts each artifact server. If you're on a really slow network you set the timeout higher.

Another way to solve this would be to just edit the project.conf file to remove that server.

So there isn't much of a case for that overriding behaviour, in my opinion.

Thanks for the feedback
Sam

--
Sam Thursfield, Codethink Ltd.
Office telephone: +44 161 236 5575


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]