Re: Discussion on source mirroring (with counter proposal)



Hi Sander,

On Tue, 2018-03-20 at 10:00 +0000, Sander Striker wrote:
Hi Tristan,

On Tue, Mar 20, 2018 at 8:34 AM Tristan Van Berkom <tristan vanberkom codethink co uk> wrote:
Hi Sander,

First I have responded to your replies, below this I have some
extension to this proposal.

Let me try an step back a bit to what we are trying to achieve.  It seems useful for us to fully agree on 
that.
I'll reply to your replies in a separate post.

On Mon, Mar 19, 2018 at 6:47 AM Tristan Van Berkom <tristan vanberkom codethink co uk> wrote:

[...]
What are we trying to achieve ?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Before talking about how to achieve this, I want to pause and think
about what exactly we want to achieve - I feel that we are not all on
the same page about this.

I can see two separate ideas of what "mirror" means here:

  A.) Because a specific third party server proves to be unreliable,
      we want to be able to have a fallback for that server, which we
      can rollover to in the case that the upstream doesnt work.

      So this is a quick fix / bandaid for a specific pain point that a
      given organization experiences while using BuildStream; this
      allows us to have a tarball server for unreliable tarballs, or
      rollover to a github mirror for an unreliable upstream github.

I think we can summarize A as "alternative locations".  The alternative locations do not necessarily have 
to full mirrors of eachother.  An example:
I have a p.bst which uses a git source with the following properties:
  ...
  track: master
  ref: 9585191f37f7b0fb9444f35a9bf50de191beadc2

Now as an organization I need to retain this specific ref.  To do so I take a copy of the ref and store it 
in a separate git repo.
Imagine that the project owners now decide to rewrite history (it's git after all), and/or destroy the ref.
I am still perfectly fine with using the normal location to do track.  But if I want to recover the ref I 
had, I need to go to my alternate location.

I am having a hard time to understand and digest this part.

And I think perhaps this is because I may have not understood what you
meant by "partial mirrors" in your previous post.

Are you saying you would *want* to have a mirrored git repository that
is shallow, and / or does not contain the full project history of that
repo ?

I think this is quite exotic and requires some further justification
(i.e., the storage space for a source mirror is not immensely
expensive, rather processing and mirroring a lot of repositories
continuously in a timely fashion is more difficult).

That said, if you have your own methods of mirroring which achieve
this, possibly for the purpose of remaining resilient across history
rewrites, I think my extensions to the proposal allow you to achieve
this.

  B.) For the same reasons, an organization may just never want to
      experience unreliable access to source code ever again.

      The cost of hosting all sources which their BuildStream projects
      require on a single server; or even mirrored in some
      strategically placed locations (so that one can choose a mirror
      that is geographically closer when building), is a relatively low
      cost.


 
      Instead of many points of failure on various servers scattered
      across the globe, a single point of failure *that is under the
      control of the organization in question*, is much more desirable.

I wouldn't want to describe it as a single point of failure, because
that would be a deliberate design flaw :).  But agreed if you assume
that these mirrors are set up in a resilient fashion, and your client
is able to fall back to a different instance in case of failure
(which you are alluding to with the "mirrored in some strategically
placed locations").

I wanted to avoid too much fallback and rollover logic to be honest,
and consider a mirror much like say, a debian package mirror.

Mostly, you achieve resilience by hosting things yourself, such that if
your builders are failing, it is clearly a problem with your own infra.

That said, rolling over to a new mirror could still be a thing, and is
required at *least* for rollover to the original upstream URL, as I
pointed out at the end of my email.

I am not convinced about the need of a single concentrated mirror for
all sources.  For instance if you are hosting your own version
control systems, then those sources do not need to be included in the
mirror.

In the extension of my proposal this should be covered by this point:

  o A mirror may be allowed to have "gaps" and be incomplete, in which
    case project default aliases are used.

Considering that this concentrated mirror is accumulating all history
over time, I can see some scalability concerns.  A sharding approach
is not an option in this case.  If you do want to shard, your back to
custom "mirroring" solutions anyway.

In the extension of my proposal again:

 o There is no restriction that a given alias / source kind be
   provided from the same domain.

This should mostly allow interoperability in the sense that you mean, I
believe.

That said, in my experience with git.baserock.org, which is a little
bit more of a strange beast as it normalizes every VCS to git, I dont
believe we were bottlenecking on size - rather we were bottlenecking on
processing when the list of things to mirror was large - I could be
wrong here, though.

That said, even if git.baserock.org was/is *huge*, I can appreciate
that some special cases will be much larger and will require more
custom solutions than what we could achieve with `bst mirror`.


I think that B is about implementing a mirror server as well as a
client.  I think that A is just looking at a [more generic] client.

Then we have a misunderstanding, this is not exactly what I mean.

I really meant to be speaking about use cases when talking about (A)
and (B). To me; (A) fixes a problem on a source-by-source basis, and
(B) is a more generic approach which addresses the problem as a whole
instead.

My concerns with the (A) approach are that it is very, very
configurable, or rather requires an immense amount of configuration to
be useful.

Also computationally speaking, the client side of things appear to be
much more complex with such fine granularity, much more than "here is
an alternative but reliable location to get your sources" - this
complexity causes me to worry.

Both have merit, but I feel that we're probably being too optimistic
about the investment needed in BuildStream to have B be useful long
term.

Make sense?

I think that you have first confounded that (B) *must* be supported by
a backing `bst mirror`, which I had not expressed very well because the
content of my counter proposal did not cover this.

In my last email I have addressed this, and in earlier communications I
have tried to express that I want to have a model in place which
supports a `bst mirror` created mirror, i.e. I want configuration data
to be designed for this, while allowing alternatives.

Essentially, by treating a mirror as a project wide "block" (course
project level grain for a single "mirror definition"), instead of
having lists be a possibility for every alias or source in use, allows
for a more simple to use `bst mirror` approach, where very minimal
project configuration is needed.

Back to this point regarding my optimism, I honestly think that for the
case of the GNOME or freedesktop-sdk projects, or for most projects in
the embedded sector, a `bst mirror` implementation as proposed will be
useful for a very long time, and much easier to use.

You may be correct that this will encounter problems at scale, where
scale is... large, and more investment would be needed to keep this
solution practical.

Please do go through my last email as a lot of the content of this
email should be covered by the other.

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]