Summary deltas



So, flatpak is getting rather big these days, and one of the places
where this shows up is the summary file churn. The current flathub
summary file is 5.2 megabyte uncompressed. We serve it content-encoded
gzip at about 1.4 megabyte. However, *any* change to the set of apps
on flathub will cause a completely new summary file which all clients
have to download.

There are some inefficiencies in how summaries are stored (for example
they list all the deltas which you don't typically need). However,
fixing that is a lot more work and an incompatible change. And, there
are much more low-hanging fruit.

Today I did a quick test and ran bsdiff on two consecutive flathub
summary files after a single app got updated:

-rw-r--r--. 1 alex alex 5433240 18 feb 09.29 flathub-1
-rw-r--r--. 1 alex alex 5433240 18 feb 09.29 flathub-2
-rw-rw-r--. 1 alex alex    2111 18 feb 09.30 flathub-1to2.bsdiff

Due to the way summary files are stored (uncompressed, sorted, etc)
they naturally diff very well. It would be very easy for flathub to
store deltas from say the 100 latest summary files to the current one,
which would make the summary update *much* more efficient.

The way I imagine this would work is to compute the checksum of the
summary file and put that checksum, as well as the checksum of the
previous summary into the summary.sig file. We then store locally all
summaries in a "summaries" directory, named by the checksum. We can
then easily generate how many summary deltas we want based on this,
and the client can (if it wants) use its knowledge of the summary file
checksum it has and the new one to pick up a delta and apply, or just
download the regular summary file.

Colin, what do you think of doing something like this? Do you have any
other plans in this area?

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                Red Hat, Inc
       alexl redhat com         alexander larsson gmail com



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]