Re: Repo scalability issues and solutions

On Tue, 2020-09-15 at 16:42 +0200, Alexander Larsson wrote:
On Tue, 2020-09-15 at 09:49 -0400, Colin Walters wrote:

Some of your proposed changes make total sense, but I think given
possibility to "greenfield" things here more it's probably worth
thinking about higher level and more long-term changes.  We don't
want to accumulate too many formats because the pull code is
incredibly complicated and under-documented.  Data formats are

As I've been slowly implementing these things over the last weeks
ended up simplifying what changes we do on the ostree side. Currently
I'm ending up with these changes to the actual summary file format:

 * Split out deltas into their own per-target-commit index file
 * Add new shorter key name for per-ref `ostree.commit.timestamp`
 * Add a version key in the metadata

These are pretty simple optional features that doesn't really change
the code that accesses summary format much.

Then I've created a summary index, that basically is a summary of
summaries, one default one with all the refs, and optionally some
partial summaries for named subsets. These summaries are accessed by
checksum, so that they cache well, can easily be delta:ed, etc.

This does somewhat complicate the pull logic, but not that much
(also the work I've been doing has been cleaning up the pull code a
bit). I'll try to finish the branch with this work tomorrow and get
in a state where it can be reviewed and discussed. 

Ok, I now got a branch where the above is working:
 (Note: This is on top of the delta-indexes PR)

It doesn't yet do deltas, but the code is file formats are prepared for
it, and the code is shared such that it will be easy to add it.

I think this shows that the changes needed are not *that* scary, and I
think this before any discussion/meeting this should be looked at as a
more detailed proposal.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]