On Fri, 2017-03-31 at 14:46 -0400, Colin Walters wrote:
On Thu, Mar 30, 2017, at 08:03 PM, Philip Withnall wrote:How rigorous is the structure for naming refs?One can't *parse* them as there's no requirement for where e.g. the architecture lives, or whether the ref even has one. But I think we've established a precedent that ensures they're sufficiently unique.
Right, so we can assume they’re unique, opaque identifiers, which can typically (but not necessarily) share some common prefix (though that is seeming less useful if it can’t be guaranteed). Would it make sense to document some recommendations for how we strongly suggest refs should be named? (Does documentation like this exist already?)
That said...there are some deep questions here, like for the `org.gnome.Calculator/x86_64/stable` flatpak, what happens if e.g. Endless ships a flatpak of it, and then one adds the GNOME upstream remote? (Or vice versa)
I don’t know what flatpak currently does in that situation, although I believe it strongly binds each app to a single remote, so it probably would not be happy with a second remote. I think in the world I’m aiming for, adding the GNOME upstream remote would just give libostree one more place to pull from *if* the OSTrees hosted by Endless and GNOME were signed by the same key. If they were not, adding the second remote should result in an error, and the remotes should be treated as if they’re hosting completely different apps.
Now, we could try to say that only flatpaks delivered via GNOME can be org.gnome but without an enforcement mechanism we're certainly going to end up in situations where there are some conflicts.
I think we want to base this around key ownership, actually, and bind that to domain ownership separately. i.e. Any flatpak which is signed with the GNOME key is considered a GNOME flatpak (regardless of which domain the OSTree repository is actually hosted on); and conversely a gnome.org repository could host flatpaks signed with a non-GNOME key, which would not be considered to be GNOME flatpaks. (This situation would require two separate OSTree repositories on the same domain, signed with different keys.) I think that binding the keys to domain ownership would be part convention (when you create a key, name it after a domain you have control over) and part crypto (something like [OpenPGP Web Key Service](https://tools.ietf.org/id/draft-koch-openpgp-webkey-service-02 .txt)).
(Right? Actually, I need to dive into this part of flatpak more...)
Me too.
Would it be possible to use the common prefix of the refs as an advert? That would avoid having to generate some kind of UUID for the repository, but should also mean the DNS-SD records are relatively small, assuming that each repository typically contains high-related refs which share a common prefix.Perhaps we could advertise a [Bloom filter](https://en.wikipedia.org/ wiki/Bloom_filter) of the set of refs?
Seems reasonable. For a false positive rate of 1%, that’s around 10b per key. If we cap the bloom filter at 600b (which is roughly half the maximum size of an mDNS packet), that’s space for 60 refs. If we let the false positive rate go to 10%, that’s 5b per key, or 120 refs in total. Both of those numbers seem in the right ballpark for the number of refs in a repository, given that this is an optimisation.
In that case, the process for working out which peers to download from would be (ignoring internet and locally-mounted repositories for the... Yep, agree with 1..5 there!
Thanks, I guess the next step is working out the API. What we’ve ended up with is: a function which takes an OstreeRepo and 1 or more refs to update, and which does steps 1..5 (except actually pulling from the repos), returning a set of URIs to pull from, which can be treated as mirrors. I’m thinking of implementing that behind an interface, which can have multiple implementations: one for existing HTTPS remotes, one for Avahi, one for USB. Note that this function doesn’t need to take a remote_name argument except for looking up the details for HTTPS remotes. None of the keys in a [remote "name"] section in /ostree/repo/config are relevant to Avahi or USB lookups (for which GPG would be mandatory). I’ll start implementing this and see how the API pans out. One question: I’m thinking of making the API async (GAsyncResult) so many queries can be run in parallel in the main loop easily. However, the rest of the libostree API seems to be almost entirely synchronous. What’s your preferred approach here? If I made the API asynchronous, it could still be run synchronously in a separate GMainContext in a worker thread if needed.
Step 4 could be optimised by including the commit timestamp as additional metadata for each ref in the summary file, so we can order the commit checksums before downloading them. I don't know if summary files routinely contain these timestamps already (it doesn't look like the EOS ones do).Yeah; we can add metadata to the summary file but - my instincts are telling me that what we have in 1..5 even without introducing bloom filters is going to work fine on e.g. an 802.11b wireless network with 30 machines.
OK. In any case, this seems like a good basis to work from, with the metadata in the summary file as a nice way of extending things to optimise.
That would work well, and would complement what I put in step 2 about having the summary timestamp in a DNS-SD record.I think we may want both the timestamp of the summary *and* the timestamp of the newest commit. But that said, again I'd like to get to a first pass we know works and will be free of any embarassingly bad flaws, and revisit later optimizations based on more real-world results.
Agreed.
I'm not sold on the idea of an expected update time. That sounds like the kind of policy decision which should be made at a higher level in the stack (like gnome-software). When libostree is asked to resolve a repository to pull from, I think it should make a best effort at doing that.Right. The reason I mentioned this is I'd like to standardize an attribute for this, since at least on the Fedora/CentOS/RHEL side we have regular cadences, and it'd be useful to show in the UI.
Ah, that makes sense. I’m in favour of standardised metadata for the expected update time (which I guess could also be considered a TTL for cache data), just as long as the policy decisions on when to take it into account are made at a higher level in the stack than libostree.
Yeah, definitely. One more thing to note is that DNS-SD records are not secure ...Right, but if you have a hostile machine on the same local network, they can do lots of ugly other stuff like DNS/DHCP spoofing too unless the network is specifically configured to prevent it.
Indeed. At this point, I think we need to make sure that the solution we come up with is *possible* to make secure, but is not necessarily bulletproof to begin with. Philip
Attachment:
signature.asc
Description: This is a digitally signed message part