Re: Redistributing refs from multiple origins in a single repository



Hey,

On Tue, 2017-05-30 at 21:56 +0200, Krzesimir Nowak wrote:
On Sun, May 28, 2017 at 12:53 AM, Philip Withnall
<philip tecnocode co uk> wrote:
Hi all,

Here’s a bit of a writeup about something which I’ve been
discussing
with Colin and Alex recently. It’s primarily of interest to them,
but
it affects the core of OSTree and how flatpak uses OSTree, so
feedback
from anyone else (especially other users of OSTree) is very
welcome.

Apologies for the length. There are cookies at the end.

I probably will only have some bikeshedding comments here.

🚲🖌

Various people have suggested a different approach which
disambiguates
ref names based on a second token (an ‘origin ID’, which has
previously
been called an ‘originish’, but that’s not very obvious
terminology),
so the combination of (origin ID, ref name) is globally unique.

My complaint would be that ostree already uses the "origin" name for
the files related to deployments. Overloading the name with yet
another meaning may be confusing - one may think that this file and
the origin ID are somehow related. My ideas for a name were "source"
or "init", but maybe googling for origin synonyms could result in
some
better alternatives.

Fair. I’ve replied to Colin about this. My suggestion is ‘collection’.

The origin ID can be added to the summary file as a new metadata
key,
leaving the existing ref map to be indexed by ref name as before.
Semantically, all the refs in the ref map can be assumed to have
that
same origin ID.
If the summary file contains refs from more than one origin, one of
the
origins is arbitrarily picked as the main one, to be treated as
above;
and the refs from the other origins are listed in a second map,
which
maps origin ID to a ref map of the refs from that origin (each with
the
same semantics as the main ref map).
Picking one of the origins as the main origin for the summary file,
rather than leaving the main ref map empty and using only the
second
map, means that new versions of Endless OS can propagate OS updates
to
older versions via P2P redistribution without needing a separate
backwards compatibility path (we already do P2P OS updates without
the
use of an origin ID).

I am not sure I understand this. Who picks the origin as the main
one?
The server? The client? Is main origin ID always the one with refs in
the main ref map and the "propagated" origin IDs are always in the
second map?

If you are publishing a repository on the internet, you pick an origin
ID and include it in your summary file. If you are redistributing refs
on the LAN, whatever software is doing the redistribution (in our case,
eos-update-server) chooses the origin-id.

The main origin-id always relates to the refs in the main ref map.
Other origin IDs are included in the origin-map. i.e. The set of refs
and origins we care about is (origin-id + refs-map) + origin-map.

Perhaps an example would help.

In this example, we’ve got two origin servers on the internet, and my
computer. My computer is redistributing refs via P2P, so it also
appears as a server.

Origin #1:
 - origin-id: eos-apps
 - Refs:
  - refs/heads/app1
  - refs/heads/app2
 - origin-map is unset in the summary file
 - refs map in the summary file lists [app1, app2]
 - refs/remotes and refs/mirrors are empty

Origin #2:
 - origin-id: eos-production
 - Refs:
  - refs/heads/eos/amd64/master
 - origin-map is unset in the summary file
 - refs map in the summary file lists [eos/amd64/master]
 - refs/remotes and refs/mirrors are empty

My computer:
 - origin-id is unset in the summary file (there is no summary file in
my local repository)
 - Remotes in local config:
  - eos-apps
  - eos-production
 - Refs:
  - refs/remotes/eos-apps/app1
  - refs/remotes/eos-apps/app2
  - refs/remotes/eos-production/eos/amd64/master
 - origin-map is unset (no summary file)
 - refs map is unset (no summary file)
 - refs/heads and refs/mirrors are empty

My computer, as seen by another machine on the LAN:
 - origin-id is arbitrarily set to eos-production (there *is* a
generated summary file)
 - No remotes exposed in the config file
 - Refs:
  - refs/heads/eos/amd64/master (alias of my local refs/remotes/eos-
production/eos/amd64/master)
  - refs/mirrors/eos-apps/app1 (alias of my local refs/remotes/eos-
apps/app1)
  - refs/mirrors/eos-apps/app2 (alias of my local refs/remotes/eos-
apps/app2)
 - refs/remotes is empty
 - origin-map lists all the origins and their refs except eos-
production:
  - eos-apps: [app1, app2]
 - refs map lists all the eos-production refs: [eos/amd64/master]

The choice of setting the origin-id to eos-production when
redistributing from my machine is an OS-specific one: in the EOS case,
we were already distributing OS updates over P2P, so we’d need to set
the origin-id to that of the OS repository so that the OS refs appear
in refs/heads for other, older, machine on the LAN to read.

If there were no backwards compatibility concerns, my computer could
appear as follows on the LAN, which is a bit simpler:
 - origin-id is unset (there *is* a generated summary file)
 - No remotes exposed in the config file
 - Refs:
  - refs/mirrors/eos-apps/app1 (alias of my local refs/remotes/eos-
apps/app1)
  - refs/mirrors/eos-apps/app2 (alias of my local refs/remotes/eos-
apps/app2)
  - refs/mirrors/eos-production/amd64/master (alias of my local
refs/remotes/eos-production/eos/amd64/master)
 - refs/heads and refs/remotes are empty
 - origin-map lists all the origins and their refs:
  - eos-apps: [app1, app2]
  - eos-production: [eos/amd64/master]
 - refs map is empty

I hope that makes sense.

The other use for having origin-id and origin-map be set in the same
summary file, is if a server were the origin for some refs and was also
redistributing refs from other origins. I guess maybe that could be
used for caching and distributed fault tolerance.

Origin naming scheme
---

So that origin IDs can match remote names, they must share the same
naming scheme (currently, for example, ‘gnome-apps’). We might want
to
transition to a different naming scheme (reverse-DNS, for example,
‘org.flathub’) in future if it would make uniqueness easier.
In any case, origin IDs have to be globally unique. If this is hard
to
achieve with free-form IDs, we might instead want to use GUIDs, and
match them to local remote configuration by including the origin
GUID
in the remote configuration as an additional key. This would
require a
migration step for existing configurations, whereas matching by
origin
ID = remote name potentially doesn’t, if we assume that most people
give their remote configuration a predictable name.

I'd prefer to stick and to enforce one scheme to avoid having a
situation where we have a mix of conventions for naming the origin
ID.
Also, anything but GUID. Or maybe we shouldn't care if the name is
not
going to be used/seen/typed by the user.

The branch I’ve got at the moment uses origin IDs which are like remote
names (and matches the two based on that), and it seems to fit into the
code fairly well.

In the absence of arguments against that, or disasters when I try to
integrate this approach into flatpak, I’ll go with that.

New API
===

For the moment, this will only require new API for resolving and
pulling refs over P2P (a very similar API to what is already in my
current attempt at https://github.com/pwithnall/ostree/tree/lan-and
-usb
).

None of the API which deals with local refspecs needs to change, as
their semantics remain unchanged.

A new version of ostree_repo_remote_list_refs() might need to be
created which returns the origin IDs as well as the refs. We’d have
to
ensure the existing version only returned the refs which that
remote is
an origin for — the ones listed in the summary file’s main ref map.
That should already be the case, so there is no backwards
compatibility
concern.

A bit offtopic here - it is ostree_repo_list_refs. And a minor pet
peeve of mine - shouldn't this be named ostree_repo_list_refspecs? It
has bitten me more than once, where I thought I got a ref, but really
it was a refspec.

Indeed. I’ll try to ensure the new functions are named consistently.
I’m going with OstreeOriginRef (as the (origin ID, ref name) tuple) and
ostree_blah_origin_ref_blah() for method names which would previously
have been ostree_blah_ref_blah().

Philip

Attachment: signature.asc
Description: This is a digitally signed message part



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]