Re: Discussion on source mirroring (with counter proposal)



Hi,

[...]
On Tue, Mar 20, 2018 at 12:58 PM Tristan Van Berkom <tristan vanberkom codethink co uk> wrote:
[...]
> I think we can summarize A as "alternative locations".  The alternative locations do not necessarily have to full mirrors of eachother.  An example:
> I have a p.bst which uses a git source with the following properties:
>   ...
>   track: master
>   ref: 9585191f37f7b0fb9444f35a9bf50de191beadc2
>
> Now as an organization I need to retain this specific ref.  To do so I take a copy of the ref and store it in a separate git repo.
> Imagine that the project owners now decide to rewrite history (it's git after all), and/or destroy the ref.
> I am still perfectly fine with using the normal location to do track.  But if I want to recover the ref I had, I need to go to my alternate location.

I am having a hard time to understand and digest this part.

And I think perhaps this is because I may have not understood what you
meant by "partial mirrors" in your previous post.

Are you saying you would *want* to have a mirrored git repository that
is shallow, and / or does not contain the full project history of that
repo ?

I think this is quite exotic and requires some further justification
(i.e., the storage space for a source mirror is not immensely
expensive, rather processing and mirroring a lot of repositories
continuously in a timely fashion is more difficult).

That said, if you have your own methods of mirroring which achieve
this, possibly for the purpose of remaining resilient across history
rewrites, I think my extensions to the proposal allow you to achieve
this.

The resiliency across history rewrites is indeed the use case.
 
> > > >   B.) For the same reasons, an organization may just never want
 [...] 
> I wouldn't want to describe it as a single point of failure, because
> that would be a deliberate design flaw :).  But agreed if you assume
> that these mirrors are set up in a resilient fashion, and your client
> is able to fall back to a different instance in case of failure
> (which you are alluding to with the "mirrored in some strategically
> placed locations").

I wanted to avoid too much fallback and rollover logic to be honest,
and consider a mirror much like say, a debian package mirror.

Mostly, you achieve resilience by hosting things yourself, such that if
your builders are failing, it is clearly a problem with your own infra.

If we have retry logic for the same "host" that's probably ok, since then you can at least mitigate in other ways transparent to the client.
 
That said, rolling over to a new mirror could still be a thing, and is
required at *least* for rollover to the original upstream URL, as I
pointed out at the end of my email.

> I am not convinced about the need of a single concentrated mirror for
> all sources.  For instance if you are hosting your own version
> control systems, then those sources do not need to be included in the
> mirror.

In the extension of my proposal this should be covered by this point:

  o A mirror may be allowed to have "gaps" and be incomplete, in which
    case project default aliases are used.

> Considering that this concentrated mirror is accumulating all history
> over time, I can see some scalability concerns.  A sharding approach
> is not an option in this case.  If you do want to shard, your back to
> custom "mirroring" solutions anyway.

In the extension of my proposal again:

 o There is no restriction that a given alias / source kind be
   provided from the same domain.

This should mostly allow interoperability in the sense that you mean, I
believe.

I believe that is indeed the case.
 
That said, in my experience with git.baserock.org, which is a little
bit more of a strange beast as it normalizes every VCS to git, I dont
believe we were bottlenecking on size - rather we were bottlenecking on
processing when the list of things to mirror was large - I could be
wrong here, though.

That said, even if git.baserock.org was/is *huge*, I can appreciate
that some special cases will be much larger and will require more
custom solutions than what we could achieve with `bst mirror`.


> I think that B is about implementing a mirror server as well as a
> client.  I think that A is just looking at a [more generic] client.

Then we have a misunderstanding, this is not exactly what I mean.

I really meant to be speaking about use cases when talking about (A)
and (B). To me; (A) fixes a problem on a source-by-source basis, and
(B) is a more generic approach which addresses the problem as a whole
instead.

My concerns with the (A) approach are that it is very, very
configurable, or rather requires an immense amount of configuration to
be useful.

Also computationally speaking, the client side of things appear to be
much more complex with such fine granularity, much more than "here is
an alternative but reliable location to get your sources" - this
complexity causes me to worry.

I guess I was more simplistically thinking:
The base source implementation provides a fetch_multi that takes a list.  The default implementation iterates over that and calls regular fetch.  And stops on first success.
A more sophisticated source may chose to fetch in parallel from all at the same time.
 
> Both have merit, but I feel that we're probably being too optimistic
> about the investment needed in BuildStream to have B be useful long
> term.
>
> Make sense?

I think that you have first confounded that (B) *must* be supported by
a backing `bst mirror`, which I had not expressed very well because the
content of my counter proposal did not cover this.

Right, that was the part that didn't quite come across for me.  As long as we don't require a bst mirror, I think it's ok.
 
In my last email I have addressed this, and in earlier communications I
have tried to express that I want to have a model in place which
supports a `bst mirror` created mirror, i.e. I want configuration data
to be designed for this, while allowing alternatives.

Essentially, by treating a mirror as a project wide "block" (course
project level grain for a single "mirror definition"), instead of
having lists be a possibility for every alias or source in use, allows
for a more simple to use `bst mirror` approach, where very minimal
project configuration is needed.

Back to this point regarding my optimism, I honestly think that for the
case of the GNOME or freedesktop-sdk projects, or for most projects in
the embedded sector, a `bst mirror` implementation as proposed will be
useful for a very long time, and much easier to use.

Practical uses are always good to test the theory :).
 
You may be correct that this will encounter problems at scale, where
scale is... large, and more investment would be needed to keep this
solution practical.

Please do go through my last email as a lot of the content of this
email should be covered by the other.

Cheers,
    -Tristan

Thanks for your patience,

Cheers,

Sander
 


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]