Re: Discussion on source mirroring





On 2018-03-15 16:30, Jonathan Maw wrote:
I've been giving some thought on source mirroring, recently, after
reading the discussion at
https://gitlab.com/BuildStream/buildstream/issues/179.

Source mirroring will be valuable for us because:
* The canonical upstream may disappear without warning
* The canonical upstream may be slow to access due to limited
infrastructure or geographical distance.

I briefly considered whether it's possible to do a "one size fits all"
source mirror, but I don't think it's doable.
Each source is permitted to store files in the local source cache in
whatever format they feel is appropriate - as a result, merging the
remote and the local cache is dependent on which methods are suitable
for each kind of source.

Since we will have to do separate work for each source, we have the
opportunity to make fetching use the
same protocol as we use for fetching sources normally, so I suggest
the following:

In the project.conf, the aliases dict can key to a list of URLs
instead of just a single URL, e.g.

aliases:
  github:
  - https://mirrorsrv.example.com/github
  - https://github.com
  sourceforge: http://downloads.sourceforge.net

The implementation of being able to fetch from multiple sources is not
trivial, however.
At its simplest, we update all sources' fetch and track methods to use
multiple repo aliases.

To reduce the amount of complexity that we expect plugin authors to
write, we might do one of the following:
* Create a method that takes an aliased URL and yields every URL it
can generate from the aliases it knows.
* Where we currently call fetch and track, iterate over every possible
URL and keep calling fetch/track as long as they return an appropriate
return value / exception to indicate that it failed because it
couldn't access that URL.

Known issues:
* Are we likely to see a URL that uses multiple repo aliases?
* We are likely to see one mirror alias per type of source.
  Users who mix many kinds of source with multiple mirrors with have a
lot of boilerplate configuration.

Does anyone have a better idea of what we could do?

Best regards,

Jonathan

This topic is also discussed in issue https://gitlab.com/BuildStream/buildstream/issues/244

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]