Re: Repo scalability issues and solutions



On Tue, 2020-08-25 at 22:45 +0100, Jonathan Dieter wrote:
On Tue, 2020-08-25 at 14:38 +0200, Alexander Larsson via ostree-list
wrote:
<snip>
Add deltas for incremental updates of summaries
===============================================

This only helps with the network transfer, but OTOH, it does so
extremely well. A simple bsdiff of the entire summary is probably
the
easiest way to do this.

However, for this to work we need to be able to identify the
summary
version you have (and which is on the server), and to store
multiple
versions of if. The easiest way to do so is to store them by
sha256,
just like objects.  Then you have some top-level summary index file
that list the sha256 of the current summary file. In fact, it
probably
will have a list of summary files (for the per-arch summaries),
which
is good because this single file will allow atomically updating all
the sub-summary files in one change.


Does the above make sense to everyone? Do we have any other ideas
how
we could do better? Do we have some important feature we would like
in
the new format?

Note: While some of these changes apply to ostree, some apply just
to
flatpak. However, I want to synchronize the changes so that we only
have to do a single format-change.

What about using zchunk?  It basically allows you to download just
the
differences between a local older version of a file and a remote
newer
version.  It's what we're using in Fedora for metadata for the last
couple of releases.

(Disclaimer: I wrote it, so I'm obviously biased)

I didn't know about zchunk, but I took a look. It looks like a fine
solution for some things but I don't think it is a good fit here.

First of all, ostree is already a widely deployed system that has a
certain design. The changes I talk about are minor restructurings of
the details of that, not completely dropping in some external library
that does things in a different way.

Secondly, zchunk fundamentally depends on ranged downloads, and that is
not something we currently require of our http servers, nor is it a
good idea as it can interfere with things like CDNs.

Third, the rolling checksum chunking that zchunk does is similar to
what we already do in the fallback case for ostree static deltas. But
those perform significantly worse than the primary delta approach we
use, which is bsdiff based. So I don't think zchunk will perform as
well as what we currently use.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]