Re: Summary deltas



On Mon, 2020-08-24 at 16:00 -0600, Dan Nicholson wrote:
On Tue, Feb 18, 2020 at 1:58 AM Alexander Larsson via ostree-list
<ostree-list gnome org> wrote:
So, flatpak is getting rather big these days, and one of the places
where this shows up is the summary file churn. The current flathub
summary file is 5.2 megabyte uncompressed. We serve it content-
encoded
gzip at about 1.4 megabyte. However, *any* change to the set of
apps
on flathub will cause a completely new summary file which all
clients
have to download.

There are some inefficiencies in how summaries are stored (for
example
they list all the deltas which you don't typically need). However,
fixing that is a lot more work and an incompatible change. And,
there
are much more low-hanging fruit.

Today I did a quick test and ran bsdiff on two consecutive flathub
summary files after a single app got updated:

-rw-r--r--. 1 alex alex 5433240 18 feb 09.29 flathub-1
-rw-r--r--. 1 alex alex 5433240 18 feb 09.29 flathub-2
-rw-rw-r--. 1 alex alex    2111 18 feb 09.30 flathub-1to2.bsdiff

Due to the way summary files are stored (uncompressed, sorted, etc)
they naturally diff very well. It would be very easy for flathub to
store deltas from say the 100 latest summary files to the current
one,
which would make the summary update *much* more efficient.

What about putting the summary as a file in the ostree-metadata
commit? You're already generating it and then you can just use static
deltas without inventing any new object types or processing. And it
fixes all the issues of races between fetching and updating of the
detached summary signature file.

For a client to fetch remote metadata in a backwards compatible way,
it would fallback to fetching the summary if the ostree-metadata
commit didn't exist or didn't contain the summary file. On the server
side, you'd have to continue publishing the standalone summary file
for old clients.

I'm not sure I understand exactly what you mean by storing the summary
file in the ostree-metadata commit. The summary file is how you find
the commit for a given ref, including the ostree-metadata ref. So, you
need the summary first.

But also, I think we should avoid overusing the ostree-metadata ref too
much. It was originally created for p2p, but flatpak has more or less
stopped using it, at least for non-local remotes. 

The problem with it is that accessing it is quite complex compared to
the "load-single-file" approach of summary files. You basically have to
do a full ostree pull operation to get at it, which is a non-atomic
operation that modifies the state of the local repo, which may be in
use by some other process. 

For example, things get really complex when we have a p2p operation
with multiple peers with different versions of the ostree-metadata
branch, because you can't pull them all into one repo at the same time.
Whereas, with summary files you just load the file and keep it in
memory while working on it.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]