Hi all, Here’s a bit of a writeup about something which I’ve been discussing with Colin and Alex recently. It’s primarily of interest to them, but it affects the core of OSTree and how flatpak uses OSTree, so feedback from anyone else (especially other users of OSTree) is very welcome. Apologies for the length. There are cookies at the end. --- At Endless, we have a use case for supporting OS and flatpak updates (and installations) over the LAN, or using USB sticks, to machines which are completely disconnected from the internet. The main problem which needs solving for this is that of redistributing refs from LAN peers: my machine may have downloaded ref1 and ref2 from remoteA, and ref2 and ref3 from remoteB; but it can’t then serve [ref1, ref2, ref3] to other LAN peers, because I have two ref2s. This entire design assumes that commit checksums from different repositories will not collide; otherwise the approach of pulling commits from multiple remotes into a single server to redistribute them wouldn’t work. I’ve got an existing branch which implements support for this in OSTree, and another which implements support for it in flatpak, but the flatpak changes are not very satisfactory since they require changing how it uses ref names, and the migration path is non-trivial. The idea with my existing branches is that ref names become globally unique, and locally configured remotes become only one way out of several to resolve and download those refs. The other ways are LAN and USB resolution and downloads. Various people have suggested a different approach which disambiguates ref names based on a second token (an ‘origin ID’, which has previously been called an ‘originish’, but that’s not very obvious terminology), so the combination of (origin ID, ref name) is globally unique. Both approaches require a solution for merging summary files when redistributing refs from multiple origins: the proposal is for unsigned summary files. See below for a discussion of the new suggestions. I’m interested in whether it all makes sense, and whether there would be any problems in interacting with third-party systems which use OSTree (flatpak being one of them). I haven’t included details of how LAN or USB ref resolution and download work, since they are just a matter of coding and don’t impact much on the design. If anyone’s interested, I can explain them separately. At the bottom I’ve included some example scenarios for how this is used in P2P and non-P2P cases with OSTree and flatpak, and the migration paths. Open questions === For convenience, once you’ve read the sections below: - Is detached metadata signed? If so, would it be a better place to put a ref list than the commit metadata? §(Unsigned summaries) - Are static deltas signed (like commits)? §(Unsigned summaries) - What naming scheme do we want to use for origin IDs? §(Origin naming scheme) Making ref names distributable === In order to allow refs to be resolved from a global namespace (which is required for distributed P2P transmission of refs), they need to be globally unique. One unforgeable global namespacing system is GPG keys, which is what OSTree already uses for signing. Let’s introduce the concept of an ‘origin’, which is a collection of related refs, all within the same trust domain. For example, all the refs for a particular OS as provided by that OS vendor; or all the refs for flatpak apps packaged by a particular organisation; or all the personal branches produced by a specific person. An origin is tightly bound to the GPG key which is used to sign its refs, although they are not actually equivalent, because the GPG key could change in future (key rotation, etc.). So, instead of basing the origin’s identity on its key, an opaque string identifier is used instead. This could be, for example, ‘gnome-apps’, or ‘flathub’, or ‘uk.co.tecnocode’. The exact naming scheme, and how to ensure uniqueness, is up for discussion (see the section below). With an origin ID, the tuple (origin ID, ref name) becomes globally unique. It’s important to note that an origin ID is not the same as a remote name: remote names are local configuration, and two peers could easily be referring to the same origin repository using different remote names. However, iff the origin ID and remote name do match (and the configured keyring matches the origin’s signing key), we can use the remote configuration when doing a P2P pull of refs from that origin. See below for details. This new concept of origins is only used when doing P2P pulls. When pulling from an origin repository on the internet, the locally configured remote name and URIs are used as before. When resolving a ref over P2P, the tuple (origin ID, ref name) is queried with P2P peers. As in my existing branch for this, the P2P peers expose an archive-z2 repository, which can be pulled from using the existing OSTree code, if it’s found to contain the given (origin ID, ref name). Matching by (origin ID, ref name) requires some modifications to the archive-z2 repository: - The origin ID must be added to the summary file - A new refs/mirrors directory must be added, similar to refs/remotes, but listing refs as refs/mirrors/$origin_id/$ref_name rather than refs/remotes/$remote_name/$ref_name — the distinction is necessary because remote names don’t necessarily match origin IDs The origin ID can be added to the summary file as a new metadata key, leaving the existing ref map to be indexed by ref name as before. Semantically, all the refs in the ref map can be assumed to have that same origin ID. If the summary file contains refs from more than one origin, one of the origins is arbitrarily picked as the main one, to be treated as above; and the refs from the other origins are listed in a second map, which maps origin ID to a ref map of the refs from that origin (each with the same semantics as the main ref map). Picking one of the origins as the main origin for the summary file, rather than leaving the main ref map empty and using only the second map, means that new versions of Endless OS can propagate OS updates to older versions via P2P redistribution without needing a separate backwards compatibility path (we already do P2P OS updates without the use of an origin ID). Origin IDs matching remote names --- When pulling a ref over P2P, OSTree needs to have a pre-configured GPG keyring to use to verify what it pulls. Rather than introducing a new set of configuration files for origins, it seems to make sense to re- use the configuration for remotes. This already contains GPG keyrings. The suggestion here is to use the configuration for a remote iff that remote’s name matches the origin ID being pulled from. If no remote is configured matching the given origin ID, a pull cannot happen. If a remote is configured with the same name, but with the wrong key, the pull will fail. In any case, a non-P2P pull can still happen with any configured remote name, regardless of whether an origin ID matches it, since only the URI from that remote configuration will be accessed. Origin naming scheme --- So that origin IDs can match remote names, they must share the same naming scheme (currently, for example, ‘gnome-apps’). We might want to transition to a different naming scheme (reverse-DNS, for example, ‘org.flathub’) in future if it would make uniqueness easier. In any case, origin IDs have to be globally unique. If this is hard to achieve with free-form IDs, we might instead want to use GUIDs, and match them to local remote configuration by including the origin GUID in the remote configuration as an additional key. This would require a migration step for existing configurations, whereas matching by origin ID = remote name potentially doesn’t, if we assume that most people give their remote configuration a predictable name. Regardless of the format of origin IDs, the .flatpakrepo and .flatpakref formats should acquire a new key to specify the repository’s origin ID. This would make the remote name argument to `flatpak remote-add` optional. Unsigned summaries === The second change necessary to support P2P redistribution of refs is the use of unsigned summaries, so that P2P peers can rebuild the summary file in the repository they publish, so that it contains refs from multiple origins, without needing to re-sign it with keys they don’t have. Currently, summary files contain a ref map, and an additional metadata map, and the entire file is signed. If we drop the external signature, and move the signatures to particular parts of the file, inline, it becomes possible to rearrange the file to redistribute refs. The existing signature on the ref map ensures that a man-in-the-middle attacker cannot point a ref to a commit which was never on that branch. An attacker can, however, keep a ref pointing to an old commit on the branch by replaying an old version of the summary file. flatpak has recently been updated (https://github.com/flatpak/flatpak/c ommit/aeb31f794115daa0517b874da29ae7d3e49d40b6) to include the ref name in commit metadata (which is signed separately from the summary file). If a similar change was made in upstream OSTree, this could be used to verify that the commit pointed to by a ref in an unsigned summary file is intended to be on that branch. It cannot verify that it’s the *most up to date* commit on that branch; but this is not a regression on the old approach. flatpak’s approach includes a single ref name in the commit metadata. Upstream OSTree might want to use a list of ref names in detached metadata instead, so that they can be updated after the commit is written, and so that a single commit can be pointed to by multiple refs. That said, detached metadata is potentially not signed (I think?) which would defeat the point of putting the ref name in the commit metadata. Is there a good solution to this? In addition, an origin ID needs to be included in the commit metadata, paired with each ref name; otherwise an attacker could make a commit from one origin (in a P2P server) be pointed at by an identically named ref in another origin. This situation is not as rare as one might think: it could easily apply to the `appstream/$arch` branches which flatpak uses. The additional metadata in the summary file should be signed as needed, using inline signatures. For example, this would include the repository’s origin ID. P2P redistribution of this signed metadata would require copying it and its signature without modification. We would need a definition of which metadata keys need to be signed, and how they are merged from multiple origins when doing P2P redistribution. - ostree.summary.last-modified: This can be regenerated by whoever generates the summary file and doesn’t need to be signed (signing it doesn’t meaningfully prevent any attacks). - ostree.static-deltas: Static deltas don’t appear to be individually signed, so the ostree.static-deltas value must be signed. This means the whole blob would have to refer to the main origin in the summary file. Static delta lists from other origins would have to go in a map from origin ID to static delta blob (with inline signature). It also means static deltas can’t be spliced (if a P2P server has only some of the static deltas from an origin). This could be solved by signing static delta files and including ref and origin ID metadata in them, as for commits. - xa.cache: Would definitely need to be merged into a map of origin ID to cache data (i.e. a map of type {s{s(tts)}}). The main `xa.cache` key could refer to the refs for the main origin in the summary file. This must be signed inline (one signature per origin entry). - xa.title: Would probably need to be dropped when doing P2P redistribution, or merged into a map of origin ID to title. Could be signed inline, otherwise an attacker could rename repositories. (Signing this title doesn’t stop the more likely attack where an attacker creates a fake official-looking repository with a key they control, hosts official-looking apps in it, and tries to trick users into accepting their repository/key configuration. Then they can sign whatever title they like, since they have the key.) - xa.default-branch: Would definitely need to be merged into a map of origin ID to default branch. The main `xa.default-branch` key could refer to the main branch for the main origin in the summary file. This must be signed inline. The advantages of an unsigned summary file are good: - No race between updating summary and summary.sig when publishing on a server (https://github.com/ostreedev/ostree/issues/487) - No need to have the signing key available and used frequently to regenerate the summary file on a busy server like flathub - P2P support For backwards compatibility, origin servers must continue to publish up to date summary.sig files, so that old OSTree clients can verify the summaries they download. P2P servers don’t need to do this. New API === For the moment, this will only require new API for resolving and pulling refs over P2P (a very similar API to what is already in my current attempt at https://github.com/pwithnall/ostree/tree/lan-and-usb ). None of the API which deals with local refspecs needs to change, as their semantics remain unchanged. A new version of ostree_repo_remote_list_refs() might need to be created which returns the origin IDs as well as the refs. We’d have to ensure the existing version only returned the refs which that remote is an origin for — the ones listed in the summary file’s main ref map. That should already be the case, so there is no backwards compatibility concern. New API would have to be added to allow setting (origin ID, ref name) pairs, similar to ostree_repo_transaction_set_ref(). New API would also have to be added to retrieve the origin ID from a repo. Example scenarios === Pulling (ref name) from an internet server using commits --- 1. The user has a remote configured locally already, but no idea about origin IDs or anything else. Their remote name may or may not match an origin ID. 2. ostree_repo_pull() is called with the remote and ref name (refspec). 3. The summary file is pulled from the configured server URI (or one of its mirrors) as before. It contains the ref name in its ref map, and additionally contains its origin ID as a separate metadata key. This is ignored by the client. 4. The summary file signature is pulled from the server as before, and used to verify the summary file (if the client is old). 5. The commit metadata which is pointed at by the ref in the summary file is pulled from the server, as before. It contains an additional metadata key (or detached metadata; see above) listing the refs which point to the commit. If the client is old, these are ignored. If the client is new, these are verified and must include the ref name being pulled and be signed correctly. The origin ID is ignored. 6. The commit data is pulled from the server to a local branch named for the refspec. Pulling (ref name) from an internet server using static deltas --- 1–2. As for §(Pulling (ref name) from an internet server using commits). 3. The summary file is pulled from the configured server URI (or one of its mirrors) as before. It contains the ref name in its ref map, and additionally contains its origin ID as a separate metadata key. This is ignored by the client. It also contains the appropriate static delta name in its main static delta map. 4. The summary file signature is pulled from the server as before, and used to verify the summary file (if the client is old). If the client is new, it should additionally verify the inline signature for the static delta map. 5. The static delta which is pointed at by the summary file is pulled from the server, as before. Publishing on a P2P server --- 1. The user has three local branches already: remoteA:ref1, remoteA:ref2, remoteB:ref2, remoteB:ref3. 2. They publish an archive-z2 repository with a summary file with an empty refs map, no origin ID key, but an origin refs map of { 'originA': { 'ref1': 'commit1', 'ref2': 'commit2' }, 'originB': { 'ref2': 'commit3', 'ref3': 'commit3' } }. originB/ref3 deliberately points at the same commit as ref2. 3. The summary file additionally contains ostree.static-deltas, xa.cache, xa.title, xa.default-branch copied from the origin repositories (where available) and potentially merged into new maps from origin ID to value. ostree.summary.last-modified would be set to the current timestamp. 4. The repository has empty refs/heads and refs/remotes directories, but the following in refs/mirrors: refs/mirrors/originA/ref1, refs/mirrors/originA/ref2, refs/mirrors/originB/ref2, refs/mirrors/originB/ref3. 5. The commit metadata for commit1 has a refs key of [ ('originA', 'ref1') ]. commit2 has [ ('originA', 'ref2') ]. commit3 has [ ('originB', 'ref2'), ('originB', 'ref3') ]. Pulling (origin ID, ref name) from a P2P server using commits --- 1. If the user has not pulled a ref from this origin before, they must configure a new remote with the appropriate GPG keyring and a name matching the origin ID. The remote configuration does not have to include an upstream URI for the origin, but that would allow pulls from the origin in future (and OSTree currently requires a URI to be specified). 2. ostree_repo_find_remotes() is called with the origin ID and ref name, and resolves some URIs to pull from and an appropriate order/parallelisation to try them in. 3. As part of this resolution process, the summary file is pulled from each potential remote. No signature file is pulled. If the origin ID metadata key in the summary file matches the requested origin ID, and the requested ref name is in the summary’s refs map, that’s a match. Otherwise, if the requested origin ID is in the origin refs map, and its refs map contains the requested ref name, that’s a match. 4. The commit metadata which is pointed at by each potential remote’s summary file is pulled from each potential remote. It must contain a key listing the requested origin ID and ref name and be signed using the GPG key configured for that origin. 5. The commit data is pulled from the server to a local branch named for the refspec (which is ‘$origin_id:$ref_name’, as the locally configured remote name is the origin ID). Pulling (origin ID, ref name) from a P2P server using static deltas --- 1–3. As for §(Pulling (origin ID, ref name) from a P2P server using commits). 3.5. The inline signature for the static delta map for the given origin ID (the main static delta map if the given origin ID is the main origin ID for that summary file) in the summary file must be verified using the GPG key for that origin. 4. The static delta which is pointed at by the summary file is pulled from the server, as before, along with the commit metadata for its from and to commits. The from and to commits must contain a key listing the requested origin ID and ref name and be signed using the GPG key configured for that origin. The static delta itself is unsigned. Pulling (origin ID, ref name) from a USB stick --- Just like pulling it from a P2P server, including the initial setup of the remote configuration. The archive-z2 repository is read from the USB stick using file:// URIs. The same verification happens. Installing a flatpak application from an internet server --- 1. Add the remote configuration using a .flatpakref or .flatpakrepo file. If these files include an origin ID, that’s used as the remote name or added as a key to the local configuration (if the flatpak/OSTree versions are new enough to support this). 2. Pulling continues as for §(Pulling (ref name) from an internet server using commits/static deltas), pulling the appstream/$arch branch before the app’s branch. 3. The summary file from the origin is cached locally. Its signature is ignored, unless the flatpak/OSTree versions are old. The local cache is now open to a MITM attack from a local user (but only as much as it’s open to a network MITM attack). Installing a flatpak application from a P2P server --- 1. Add the remote configuration using a .flatpakref or .flatpakrepo file. These files must include an origin ID, that’s used as the remote name or added as a key to the local configuration. 2. Pulling continues as for §(Pulling (origin ID, ref name) from a P2P server using commits/static deltas), pulling the appstream/$arch branch before the app’s branch. Philip Cookies: 🍪🍪🍪
Attachment:
signature.asc
Description: This is a digitally signed message part