Re: Redistributing refs from multiple origins in a single repository



On Sun, May 28, 2017 at 12:53 AM, Philip Withnall
<philip tecnocode co uk> wrote:
Hi all,

Here’s a bit of a writeup about something which I’ve been discussing
with Colin and Alex recently. It’s primarily of interest to them, but
it affects the core of OSTree and how flatpak uses OSTree, so feedback
from anyone else (especially other users of OSTree) is very welcome.

Apologies for the length. There are cookies at the end.

I probably will only have some bikeshedding comments here.


---

At Endless, we have a use case for supporting OS and flatpak updates
(and installations) over the LAN, or using USB sticks, to machines
which are completely disconnected from the internet.

The main problem which needs solving for this is that of redistributing
refs from LAN peers: my machine may have downloaded ref1 and ref2 from
remoteA, and ref2 and ref3 from remoteB; but it can’t then serve [ref1,
ref2, ref3] to other LAN peers, because I have two ref2s.
This entire design assumes that commit checksums from different
repositories will not collide; otherwise the approach of pulling
commits from multiple remotes into a single server to redistribute them
wouldn’t work.

I’ve got an existing branch which implements support for this in
OSTree, and another which implements support for it in flatpak, but the
flatpak changes are not very satisfactory since they require changing
how it uses ref names, and the migration path is non-trivial.
The idea with my existing branches is that ref names become globally
unique, and locally configured remotes become only one way out of
several to resolve and download those refs. The other ways are LAN and
USB resolution and downloads.

Various people have suggested a different approach which disambiguates
ref names based on a second token (an ‘origin ID’, which has previously
been called an ‘originish’, but that’s not very obvious terminology),
so the combination of (origin ID, ref name) is globally unique.

My complaint would be that ostree already uses the "origin" name for
the files related to deployments. Overloading the name with yet
another meaning may be confusing - one may think that this file and
the origin ID are somehow related. My ideas for a name were "source"
or "init", but maybe googling for origin synonyms could result in some
better alternatives.


Both approaches require a solution for merging summary files when
redistributing refs from multiple origins: the proposal is for unsigned
summary files.

See below for a discussion of the new suggestions. I’m interested in
whether it all makes sense, and whether there would be any problems in
interacting with third-party systems which use OSTree (flatpak being
one of them).
I haven’t included details of how LAN or USB ref resolution and
download work, since they are just a matter of coding and don’t impact
much on the design. If anyone’s interested, I can explain them
separately.
At the bottom I’ve included some example scenarios for how this is used
in P2P and non-P2P cases with OSTree and flatpak, and the migration
paths.


Open questions
===

For convenience, once you’ve read the sections below:

 - Is detached metadata signed? If so, would it be a better place to
put a ref list than the commit metadata? §(Unsigned summaries)
 - Are static deltas signed (like commits)? §(Unsigned summaries)
 - What naming scheme do we want to use for origin IDs? §(Origin naming
scheme)

Making ref names distributable
===

In order to allow refs to be resolved from a global namespace (which is
required for distributed P2P transmission of refs), they need to be
globally unique. One unforgeable global namespacing system is GPG keys,
which is what OSTree already uses for signing.
Let’s introduce the concept of an ‘origin’, which is a collection of
related refs, all within the same trust domain. For example, all the
refs for a particular OS as provided by that OS vendor; or all the refs
for flatpak apps packaged by a particular organisation; or all the
personal branches produced by a specific person.
An origin is tightly bound to the GPG key which is used to sign its
refs, although they are not actually equivalent, because the GPG key
could change in future (key rotation, etc.). So, instead of basing the
origin’s identity on its key, an opaque string identifier is used
instead.
This could be, for example, ‘gnome-apps’, or ‘flathub’, or
‘uk.co.tecnocode’. The exact naming scheme, and how to ensure
uniqueness, is up for discussion (see the section below).

With an origin ID, the tuple (origin ID, ref name) becomes globally
unique. It’s important to note that an origin ID is not the same as a
remote name: remote names are local configuration, and two peers could
easily be referring to the same origin repository using different
remote names.
However, iff the origin ID and remote name do match (and the configured
keyring matches the origin’s signing key), we can use the remote
configuration when doing a P2P pull of refs from that origin. See below
for details.

This new concept of origins is only used when doing P2P pulls. When
pulling from an origin repository on the internet, the locally
configured remote name and URIs are used as before.

When resolving a ref over P2P, the tuple (origin ID, ref name) is
queried with P2P peers. As in my existing branch for this, the P2P
peers expose an archive-z2 repository, which can be pulled from using
the existing OSTree code, if it’s found to contain the given (origin
ID, ref name).
Matching by (origin ID, ref name) requires some modifications to the
archive-z2 repository:
 - The origin ID must be added to the summary file
 - A new refs/mirrors directory must be added, similar to refs/remotes,
but listing refs as refs/mirrors/$origin_id/$ref_name rather than
refs/remotes/$remote_name/$ref_name — the distinction is necessary
because remote names don’t necessarily match origin IDs

The origin ID can be added to the summary file as a new metadata key,
leaving the existing ref map to be indexed by ref name as before.
Semantically, all the refs in the ref map can be assumed to have that
same origin ID.
If the summary file contains refs from more than one origin, one of the
origins is arbitrarily picked as the main one, to be treated as above;
and the refs from the other origins are listed in a second map, which
maps origin ID to a ref map of the refs from that origin (each with the
same semantics as the main ref map).
Picking one of the origins as the main origin for the summary file,
rather than leaving the main ref map empty and using only the second
map, means that new versions of Endless OS can propagate OS updates to
older versions via P2P redistribution without needing a separate
backwards compatibility path (we already do P2P OS updates without the
use of an origin ID).

I am not sure I understand this. Who picks the origin as the main one?
The server? The client? Is main origin ID always the one with refs in
the main ref map and the "propagated" origin IDs are always in the
second map?


Origin IDs matching remote names
---

When pulling a ref over P2P, OSTree needs to have a pre-configured GPG
keyring to use to verify what it pulls. Rather than introducing a new
set of configuration files for origins, it seems to make sense to re-
use the configuration for remotes. This already contains GPG keyrings.
The suggestion here is to use the configuration for a remote iff that
remote’s name matches the origin ID being pulled from. If no remote is
configured matching the given origin ID, a pull cannot happen. If a
remote is configured with the same name, but with the wrong key, the
pull will fail.

In any case, a non-P2P pull can still happen with any configured remote
name, regardless of whether an origin ID matches it, since only the URI
from that remote configuration will be accessed.

Origin naming scheme
---

So that origin IDs can match remote names, they must share the same
naming scheme (currently, for example, ‘gnome-apps’). We might want to
transition to a different naming scheme (reverse-DNS, for example,
‘org.flathub’) in future if it would make uniqueness easier.
In any case, origin IDs have to be globally unique. If this is hard to
achieve with free-form IDs, we might instead want to use GUIDs, and
match them to local remote configuration by including the origin GUID
in the remote configuration as an additional key. This would require a
migration step for existing configurations, whereas matching by origin
ID = remote name potentially doesn’t, if we assume that most people
give their remote configuration a predictable name.

I'd prefer to stick and to enforce one scheme to avoid having a
situation where we have a mix of conventions for naming the origin ID.
Also, anything but GUID. Or maybe we shouldn't care if the name is not
going to be used/seen/typed by the user.


Regardless of the format of origin IDs, the .flatpakrepo and
.flatpakref formats should acquire a new key to specify the
repository’s origin ID. This would make the remote name argument to
`flatpak remote-add` optional.

Unsigned summaries
===

The second change necessary to support P2P redistribution of refs is
the use of unsigned summaries, so that P2P peers can rebuild the
summary file in the repository they publish, so that it contains refs
from multiple origins, without needing to re-sign it with keys they
don’t have.

Currently, summary files contain a ref map, and an additional metadata
map, and the entire file is signed. If we drop the external signature,
and move the signatures to particular parts of the file, inline, it
becomes possible to rearrange the file to redistribute refs.

The existing signature on the ref map ensures that a man-in-the-middle
attacker cannot point a ref to a commit which was never on that branch.
An attacker can, however, keep a ref pointing to an old commit on the
branch by replaying an old version of the summary file.
flatpak has recently been updated (https://github.com/flatpak/flatpak/c
ommit/aeb31f794115daa0517b874da29ae7d3e49d40b6) to include the ref name
in commit metadata (which is signed separately from the summary file).
If a similar change was made in upstream OSTree, this could be used to
verify that the commit pointed to by a ref in an unsigned summary file
is intended to be on that branch. It cannot verify that it’s the *most
up to date* commit on that branch; but this is not a regression on the
old approach.

flatpak’s approach includes a single ref name in the commit metadata.
Upstream OSTree might want to use a list of ref names in detached
metadata instead, so that they can be updated after the commit is
written, and so that a single commit can be pointed to by multiple
refs.
That said, detached metadata is potentially not signed (I think?) which
would defeat the point of putting the ref name in the commit metadata.
Is there a good solution to this?

In addition, an origin ID needs to be included in the commit metadata,
paired with each ref name; otherwise an attacker could make a commit
from one origin (in a P2P server) be pointed at by an identically named
ref in another origin. This situation is not as rare as one might
think: it could easily apply to the `appstream/$arch` branches which
flatpak uses.

The additional metadata in the summary file should be signed as needed,
using inline signatures. For example, this would include the
repository’s origin ID. P2P redistribution of this signed metadata
would require copying it and its signature without modification. We
would need a definition of which metadata keys need to be signed, and
how they are merged from multiple origins when doing P2P
redistribution.
 - ostree.summary.last-modified: This can be regenerated by whoever
generates the summary file and doesn’t need to be signed (signing it
doesn’t meaningfully prevent any attacks).
 - ostree.static-deltas: Static deltas don’t appear to be individually
signed, so the ostree.static-deltas value must be signed. This means
the whole blob would have to refer to the main origin in the summary
file. Static delta lists from other origins would have to go in a map
from origin ID to static delta blob (with inline signature). It also
means static deltas can’t be spliced (if a P2P server has only some of
the static deltas from an origin). This could be solved by signing
static delta files and including ref and origin ID metadata in them, as
for commits.
 - xa.cache: Would definitely need to be merged into a map of origin ID
to cache data (i.e. a map of type {s{s(tts)}}). The main `xa.cache` key
could refer to the refs for the main origin in the summary file. This
must be signed inline (one signature per origin entry).
 - xa.title: Would probably need to be dropped when doing P2P
redistribution, or merged into a map of origin ID to title. Could be
signed inline, otherwise an attacker could rename repositories.
(Signing this title doesn’t stop the more likely attack where an
attacker creates a fake official-looking repository with a key they
control, hosts official-looking apps in it, and tries to trick users
into accepting their repository/key configuration. Then they can sign
whatever title they like, since they have the key.)
 - xa.default-branch: Would definitely need to be merged into a map of
origin ID to default branch. The main `xa.default-branch` key could
refer to the main branch for the main origin in the summary file. This
must be signed inline.

The advantages of an unsigned summary file are good:
 - No race between updating summary and summary.sig when publishing on
a server (https://github.com/ostreedev/ostree/issues/487)
 - No need to have the signing key available and used frequently to
regenerate the summary file on a busy server like flathub
 - P2P support

For backwards compatibility, origin servers must continue to publish up
to date summary.sig files, so that old OSTree clients can verify the
summaries they download. P2P servers don’t need to do this.

New API
===

For the moment, this will only require new API for resolving and
pulling refs over P2P (a very similar API to what is already in my
current attempt at https://github.com/pwithnall/ostree/tree/lan-and-usb
).

None of the API which deals with local refspecs needs to change, as
their semantics remain unchanged.

A new version of ostree_repo_remote_list_refs() might need to be
created which returns the origin IDs as well as the refs. We’d have to
ensure the existing version only returned the refs which that remote is
an origin for — the ones listed in the summary file’s main ref map.
That should already be the case, so there is no backwards compatibility
concern.

A bit offtopic here - it is ostree_repo_list_refs. And a minor pet
peeve of mine - shouldn't this be named ostree_repo_list_refspecs? It
has bitten me more than once, where I thought I got a ref, but really
it was a refspec.


New API would have to be added to allow setting (origin ID, ref name)
pairs, similar to ostree_repo_transaction_set_ref().

New API would also have to be added to retrieve the origin ID from a
repo.

Example scenarios
===

Pulling (ref name) from an internet server using commits
---

 1. The user has a remote configured locally already, but no idea about
origin IDs or anything else. Their remote name may or may not match an
origin ID.
 2. ostree_repo_pull() is called with the remote and ref name
(refspec).
 3. The summary file is pulled from the configured server URI (or one
of its mirrors) as before. It contains the ref name in its ref map, and
additionally contains its origin ID as a separate metadata key. This is
ignored by the client.
 4. The summary file signature is pulled from the server as before, and
used to verify the summary file (if the client is old).
 5. The commit metadata which is pointed at by the ref in the summary
file is pulled from the server, as before. It contains an additional
metadata key (or detached metadata; see above) listing the refs which
point to the commit. If the client is old, these are ignored. If the
client is new, these are verified and must include the ref name being
pulled and be signed correctly. The origin ID is ignored.
 6. The commit data is pulled from the server to a local branch named
for the refspec.

Pulling (ref name) from an internet server using static deltas
---

 1–2. As for §(Pulling (ref name) from an internet server using
commits).
 3. The summary file is pulled from the configured server URI (or one
of its mirrors) as before. It contains the ref name in its ref map, and
additionally contains its origin ID as a separate metadata key. This is
ignored by the client. It also contains the appropriate static delta
name in its main static delta map.
 4. The summary file signature is pulled from the server as before, and
used to verify the summary file (if the client is old). If the client
is new, it should additionally verify the inline signature for the
static delta map.
 5. The static delta which is pointed at by the summary file is pulled
from the server, as before.

Publishing on a P2P server
---

 1. The user has three local branches already: remoteA:ref1,
remoteA:ref2, remoteB:ref2, remoteB:ref3.
 2. They publish an archive-z2 repository with a summary file with an
empty refs map, no origin ID key, but an origin refs map of {
'originA': { 'ref1': 'commit1', 'ref2': 'commit2' }, 'originB': {
'ref2': 'commit3', 'ref3': 'commit3' } }. originB/ref3 deliberately
points at the same commit as ref2.
 3. The summary file additionally contains ostree.static-deltas,
xa.cache, xa.title, xa.default-branch copied from the origin
repositories (where available) and potentially merged into new maps
from origin ID to value. ostree.summary.last-modified would be set to
the current timestamp.
 4. The repository has empty refs/heads and refs/remotes directories,
but the following in refs/mirrors: refs/mirrors/originA/ref1,
refs/mirrors/originA/ref2, refs/mirrors/originB/ref2,
refs/mirrors/originB/ref3.
 5. The commit metadata for commit1 has a refs key of [ ('originA',
'ref1') ]. commit2 has [ ('originA', 'ref2') ]. commit3 has [
('originB', 'ref2'), ('originB', 'ref3') ].

Pulling (origin ID, ref name) from a P2P server using commits
---

 1. If the user has not pulled a ref from this origin before, they must
configure a new remote with the appropriate GPG keyring and a name
matching the origin ID. The remote configuration does not have to
include an upstream URI for the origin, but that would allow pulls from
the origin in future (and OSTree currently requires a URI to be
specified).
 2. ostree_repo_find_remotes() is called with the origin ID and ref
name, and resolves some URIs to pull from and an appropriate
order/parallelisation to try them in.
 3. As part of this resolution process, the summary file is pulled from
each potential remote. No signature file is pulled. If the origin ID
metadata key in the summary file matches the requested origin ID, and
the requested ref name is in the summary’s refs map, that’s a match.
    Otherwise, if the requested origin ID is in the origin refs map,
and its refs map contains the requested ref name, that’s a match.
 4. The commit metadata which is pointed at by each potential remote’s
summary file is pulled from each potential remote. It must contain a
key listing the requested origin ID and ref name and be signed using
the GPG key configured for that origin.
 5. The commit data is pulled from the server to a local branch named
for the refspec (which is ‘$origin_id:$ref_name’, as the locally
configured remote name is the origin ID).

Pulling (origin ID, ref name) from a P2P server using static deltas
---

 1–3. As for §(Pulling (origin ID, ref name) from a P2P server using
commits).
 3.5. The inline signature for the static delta map for the given
origin ID (the main static delta map if the given origin ID is the main
origin ID for that summary file) in the summary file must be verified
using the GPG key for that origin.
 4. The static delta which is pointed at by the summary file is pulled
from the server, as before, along with the commit metadata for its from
and to commits. The from and to commits must contain a key listing the
requested origin ID and ref name and be signed using the GPG key
configured for that origin. The static delta itself is unsigned.

Pulling (origin ID, ref name) from a USB stick
---

Just like pulling it from a P2P server, including the initial setup of
the remote configuration. The archive-z2 repository is read from the
USB stick using file:// URIs. The same verification happens.

Installing a flatpak application from an internet server
---

 1. Add the remote configuration using a .flatpakref or .flatpakrepo
file. If these files include an origin ID, that’s used as the remote
name or added as a key to the local configuration (if the
flatpak/OSTree versions are new enough to support this).
 2. Pulling continues as for §(Pulling (ref name) from an internet
server using commits/static deltas), pulling the appstream/$arch branch
before the app’s branch.
 3. The summary file from the origin is cached locally. Its signature
is ignored, unless the flatpak/OSTree versions are old. The local cache
is now open to a MITM attack from a local user (but only as much as
it’s open to a network MITM attack).

Installing a flatpak application from a P2P server
---

 1. Add the remote configuration using a .flatpakref or .flatpakrepo
file. These files must include an origin ID, that’s used as the remote
name or added as a key to the local configuration.
 2. Pulling continues as for §(Pulling (origin ID, ref name) from a P2P
server using commits/static deltas), pulling the appstream/$arch branch
before the app’s branch.

Philip

Cookies: 🍪🍪🍪

Om nom nom.

_______________________________________________
ostree-list mailing list
ostree-list gnome org
https://mail.gnome.org/mailman/listinfo/ostree-list




-- 
Kinvolk GmbH | Adalbertstr. 6a, 10999 Berlin
Geschäftsführer/Directors: Alban Crequy, Chris Kühl, Iago López Galeiras
Registergericht/Court of registration: Amtsgericht Charlottenburg
Registernummer/Registration number: HRB 171414 B
Ust-ID-Nummer/VAT ID number: DE302207000


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]