Re: Feature proposal: multiple cache support
- From: Tristan Van Berkom <tristan vanberkom codethink co uk>
- To: Sam Thursfield <sam thursfield codethink co uk>, buildstream-list gnome org
- Subject: Re: Feature proposal: multiple cache support
- Date: Thu, 12 Oct 2017 20:03:57 +0900
On Thu, 2017-10-12 at 11:21 +0100, Sam Thursfield wrote:
On 12/10/17 09:10, Tristan Van Berkom wrote:
A.) I feel that we are approaching the problem in the wrong
order, I dislike that we have an unstable configuration
user story for artifact caches already, and this proposal
adds another layer to this.
We already have the issue that push/pull URIs are not
canonical, which implies that this configuration will break,
we should be careful not to break the configuration API
more often than we absolutely must (it's mostly going to be
unstable for the sake of issue 112:
https://gitlab.com/BuildStream/buildstream/issues/112)
I'm a bit confused by "unstable user configuration story".... however
issues #111 (no artifact sharing for TarCache users) and #112 (separate
but related pull-url and push-url configuration) make sense.
What I mean by "unstable" is exactly that; users and projects which
have configured artifact caches will break when we change the user
configuration story to something different.
This is unstable configuration.
B.) Related to (A), I think the naming of artifact caches
is really just tedium in the absence of fixing issue 112
first.
`bst pull --cache bst://canonical-url.com/artifacts`
Will be nicer and demand less of users than having to
remember that this push/pull url is related to that name,
etc.
C.) I dont really see the value add of configuring multiple
caches for pushing of artifacts, while the value add for
multiple fallback artifact shares for *pulling* is clear
and was discussed at length at GUADEC.
Having a semantic for pushing to multiple caches explicitly
on the command line I can see obvious use cases for, but
I'm not sold on what configuring multiple push URLs in
the configurations, and what that might achieve.
From a user experience point of view, I completely agree.
However I'm not sure we can make a single URL work without massive work
involved. For example, what actually goes on when pushing to
`bst://canonical-url.com/artifacts` -- currently we log in over SSH,
which means BuildStream needs to know the username and port. So either...
* the canonical URL is actually the less convenient
`bst://ostree canonical-url com:22200/artifacts`
* or there's some separate unwritten config to set username and port,
and we're back to square 1 of the URL not actually being canonical
* or we write and maintain our own internet-facing daemon that routes
things in the correct way, which means you're now maintaining a security
sensitive component that's doing user authentication
Yes, it's not an easy problem to solve, probably requires real work
beyond a one week patch.
However once we made the leap to supporting multiple platforms with
necessarily different ways of communicating artifacts, issue #111
became relevant, and this real work is sort of unavoidable, short of
having hacks of treating different projects on different platforms as
totally seperate, non-interoperable things.
Git is not something to hold up as an example of good user experience
design, but the fact that Git still uses named remotes with separate
push and pull URLs after 12 years suggests that it's not an easy thing
to solve.
Ummm, however; I believe git still uses canonical urls, with the only
exception that some url schemes cannot be used for push but only for
pull. I feel this is still much, much more straight forward than having
the same remote configured with separate urls for push and pull.
Also note that, naming of urls for different artifact servers is
annoying in the same way as thinking about git remotes - I did consider
on multiple occasions too, that instead of having user facing
configuration for overriding projects - we might have a `bst remote`
api for managing remote cache configuration beside workspaces in the
project's `.bst/` directory.
D.) In your presented use cases, I think the first two are
handled pretty much in the same way.
Your third use case related to recursive pipelines however
I believe is moot, this should already be easily supported
without any added configuration from this proposal.
When it comes to recursive project dependencies, each project
already has the first word on what artifact cache to use
separately, and the user is able to override those on a
project name basis in their configuration - so this should
automatically be supported.
Great, we can leave that aspect of recursive pipelines up to Jürg.
E.) Rationale behind allowing explicit disabling of project.conf
declared artifact caches
"Note that user config overrides project config *completely*. If I have
an empty 'artifacts' section for a project in my buildstream.conf, that
means "ignore everything from project.conf". This will be inconvenient
for some use cases, but it allows removing caches from the project.conf
that one may not have access to on certain machines."
This is weird.
Agreed -- but see below
Can you explain why it is necessary to eliminate caches defined
by the project.conf in the user configuration in the case which
the user cannot authenticate or reach the remote server ?
I mean, can you explain why it is necessary beyond bugs in our
own code which might cause hangs or delays, which should be
fixed in our own code _anyway_ ?
There will always be a delay when contacting a server that isn't
reachable. But there's no way for code to tell the difference between a
server that's not responding, a temporary network glitch, and a server
that has responded but there's a 3 second lag.
Another way to solve this might be to allow users to configure the
timeout on the commandline. If you're on a fast network with a firewall
that eats packets, you can set your timeout low and have a minimal
impact on startup time when buildstream contacts each artifact server.
If you're on a really slow network you set the timeout higher.
Another way to solve this would be to just edit the project.conf file to
remove that server.
So there isn't much of a case for that overriding behaviour, in my opinion.
Yes for that last point, I am mostly just worried that we are papering
over our own bugs with additional features, which in turn give us a
larger bug vector :)
I'm not strongly against giving the user the ability to disable talking
with an artifact cache - and editing project.conf is the last thing we
want users doing for this sort of thing.
That said; it would be very nice to have a timeout option - which we
could hopefully apply to *all* network activity unilaterally - but this
is complicated to implement on every Source type as well.
Cheers,
-Tristan
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]