Re: Minimal changes version of new summary format for flatpak



On Mon, 2020-10-12 at 17:03 -0400, Colin Walters wrote:
On Mon, Oct 12, 2020, at 4:54 AM, Alexander Larsson wrote:

Of course, if we decide that such global state should be "basically
static" we can be a bit more crude and say just download it at most
once a day. Then we can skip an entire roundtrip, which would
actually
be useful.

Right, that's what I mean.

Yeah, that makes a lot of sense.

Storing the delta indexes in the detached metadata was actually my
initial plan. On the surface it seems like an obvious solution.
However, there are some details that make it less than ideal:

The information about deltas is essentially an index, which we
regenerate each time we update the summary (or similar). At update
time
we enumerate all deltas, loop over all the commits that have a
delta
going to it and then replace the information for that. So far
everything is fine. However, then we need to clear out old
information
about removed deltas, which turns into quite a large operation as
we
have to scan *all* commitmetas to see which ones happened to have
old
delta information. This is where having isolated files for the
deltas
help a lot as you can just readdir before and remove the files that
we
didn't update.

If we changed _ostree_repo_static_delta_delete() to
also update the commitmeta, we would avoid a global scan right?

Hmm, there are two options here. Either the delete of a delta will
trigger a rescan of all the deltas to see which ones are still going to
that commit. This makes deleting a delta O(n_deltas) and deleting them
all O(n^2). Alternatively we could read the old index and do
incremental changes, removing only the deleted delta from it. The later
approach is faster, less robust. 

In general updating the delta index information immediately like this
(vs at a single re-index time like the summary update) will cause much
more churn for the individual delta-index/commitmeta files. We can
compare it for instance to updating the summary each time we create a
ref.

This is also a practical issue for us in flathub. We have code code to
distribute the generation of deltas to other machines, because its
kinda expensive to do. If delta index update would have to happen
immediately on delta import that code would need to manually trigger
the right re-indexing (and we might need to have some API to make that
work).

In terms of distribution, it makes things a lot easier to say that
everything in the objects directory just is forever immutable in
CDN
and then we can do different kind of cache invalidation in the
various
delta dirs. I know this is not theoretically true, as the detached
commitmeta files can change, but in practice once they are
initially
signed this never happens.

Yes, agree it would be nice if everything in objects/ was immutable.
And it's true nothing else is using commitmeta today.

We could introduce a new commitmeta/ directory and hardlink them
there?

So, old clients would use the old directory over http, and possibly get
a staler copy, but they would not care because they didn't use delta
info from it? I guess that would fix that, but at this point how is
this different from completely separated from the commitmetas?


But I think the core question here is - today deltas are pretty
special
cased.  This continues that (which is fine).  But definitely one
thing
I'd change doing deltas again is moving them into objects/ as just
another type.
That would also have argued for things like having the index also be
an object, etc.

I agree that this is the core question. However, I think it is kinda
natural and make sense that both deltas and summary info is handles
specially and independently from the objects. Both deltas, delta info,
and summary info is really independent metadata about the repository,
not really part of the content. I.e. if you were to talk about the
abstract model of an ostree repository you would use terms like
branches, commits, file objects, parents, etc, but not deltas. 

Additionally, this information is kinda tied to the specific layout of
the archive mode repos, as we don't do deltas on bare repos. One could
imagine other repo modes where similarly summaries and deltas would not
really be involved, yet we would contain the commitmeta files which
would have information about repos of a different mode.

Or to say this another way - if we make commitmeta more useful
we have a general mechanism to have per-commit dynamic metadata
which might be useful for something else in the future?

So, on a less abstract level this makes sense. Do we have ideas of
other potential dynamic metadata use?

OTOH the fact you've already written the code
for delta indexes also weighs towards that for sure.

To be clear I am inclined to merge, but these sorts of format changes
will add up to substantial maintenance burden long term if we
accumulate too many, so let's be sure we've got this right and it
will be worth it.

Sure.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]