Re: Minimal changes version of new summary format for flatpak



On Fri, 2020-10-09 at 13:14 -0400, Colin Walters wrote:


On Thu, Oct 8, 2020, at 11:09 AM, Alexander Larsson via ostree-list
wrote:
I've been doing some research on the summary dicussions from
yesterdays meeting, and it looks like the minimal changes we need
for
flatpak to do its own summary format is pretty small, while being
very
flexible.

Nice!  That said I wanted to also dig a bit more into a thread that
was raised in the discussion: What if we think of this as structuring
things so that a summary is unnecessary unless you want global state
(list of all refs for example).

Well, that is really exactly what this code is. This + the tools to be
able to reimplement ostree_repo_regenerate_summary().

First of all, flatpak already always resolves refs to commit ids
before calling ostree to pull, so the summary file is only needed
by ostree for these things:
 * Finding deltas
 * List all refs if we are mirroring
 * Quick access to "mode" and "tombstone-commits" options,
otherwise
   it has to download the config file separately.

Right...hm, this is global state but it should be cacheable for a
long amount of time.  Once we use etags we could probably go back to
fetching the repo/config but just cache it?  And perhaps if we have a
cached version, go ahead and do further requests (i.e. avoid a
blocking roundtrip for this).

I'm not against using etags to minimize data transfer. However, I
question the efficiency of it in the case of the config file. For the
flathub case the config file is 315 bytes, whereas I estimate the total
size of the etags headers are ~100 bytes. These are all inside the
bounds of what will fit in one ip packages, so they are essentially the
same in terms of download time.

Of course, if we decide that such global state should be "basically
static" we can be a bit more crude and say just download it at most
once a day. Then we can skip an entire roundtrip, which would actually
be useful.

This seems like a minor detail in the whole thing though. Ostree has
always been downloading the config file on every pull until 2 months
ago (8d09a1a8eaaedcc8fb1fc111a61616e9f7ad9a4e) and it wasn't a huge
problem.

So, my proposal is that:

 * We merge support for deltas outside the summary. Unless you 
   specially configure it ostree still adds deltas to the summary
for 
   backwards compat.

Here's an alternative idea: this is out of band metadata keyed by a
commit - but we already have detached metadata for commits.  Why not
store delta metadata with that?  If we properly implement etags (per
https://github.com/ostreedev/ostree/pull/2205 ) then it'd be
cachable.  Now this metadata isn't very useful on the client once
it's downloaded, but eh...I don't think it will be very large.

Storing the delta indexes in the detached metadata was actually my
initial plan. On the surface it seems like an obvious solution.
However, there are some details that make it less than ideal:

The information about deltas is essentially an index, which we
regenerate each time we update the summary (or similar). At update time
we enumerate all deltas, loop over all the commits that have a delta
going to it and then replace the information for that. So far
everything is fine. However, then we need to clear out old information
about removed deltas, which turns into quite a large operation as we
have to scan *all* commitmetas to see which ones happened to have old
delta information. This is where having isolated files for the deltas
help a lot as you can just readdir before and remove the files that we
didn't update.

In terms of distribution, it makes things a lot easier to say that
everything in the objects directory just is forever immutable in CDN
and then we can do different kind of cache invalidation in the various
delta dirs. I know this is not theoretically true, as the detached
commitmeta files can change, but in practice once they are initially
signed this never happens.

In terms of code changes, the way deltas are currently chosen during
ostree_repo_pull() is quite different than how and when the commitmetas
are downloaded. So getting deltas from those will be a larger
restructuring of the pull code.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]