Re: Minimal changes version of new summary format for flatpak





On Mon, Oct 12, 2020, at 4:54 AM, Alexander Larsson wrote:
I'm not against using etags to minimize data transfer. However, I
question the efficiency of it in the case of the config file. For the
flathub case the config file is 315 bytes, whereas I estimate the total
size of the etags headers are ~100 bytes. These are all inside the
bounds of what will fit in one ip packages, so they are essentially the
same in terms of download time.

Of course, if we decide that such global state should be "basically
static" we can be a bit more crude and say just download it at most
once a day. Then we can skip an entire roundtrip, which would actually
be useful.

Right, that's what I mean.


This seems like a minor detail in the whole thing though. Ostree has
always been downloading the config file on every pull until 2 months
ago (8d09a1a8eaaedcc8fb1fc111a61616e9f7ad9a4e) and it wasn't a huge
problem.

Agree, it's mostly orthogonal to this, though round trips can add up on high latency links.

Storing the delta indexes in the detached metadata was actually my
initial plan. On the surface it seems like an obvious solution.
However, there are some details that make it less than ideal:

The information about deltas is essentially an index, which we
regenerate each time we update the summary (or similar). At update time
we enumerate all deltas, loop over all the commits that have a delta
going to it and then replace the information for that. So far
everything is fine. However, then we need to clear out old information
about removed deltas, which turns into quite a large operation as we
have to scan *all* commitmetas to see which ones happened to have old
delta information. This is where having isolated files for the deltas
help a lot as you can just readdir before and remove the files that we
didn't update.

If we changed _ostree_repo_static_delta_delete() to
also update the commitmeta, we would avoid a global scan right?

In terms of distribution, it makes things a lot easier to say that
everything in the objects directory just is forever immutable in CDN
and then we can do different kind of cache invalidation in the various
delta dirs. I know this is not theoretically true, as the detached
commitmeta files can change, but in practice once they are initially
signed this never happens.

Yes, agree it would be nice if everything in objects/ was immutable.
And it's true nothing else is using commitmeta today.

We could introduce a new commitmeta/ directory and hardlink them
there?

In terms of code changes, the way deltas are currently chosen during
ostree_repo_pull() is quite different than how and when the commitmetas
are downloaded. So getting deltas from those will be a larger
restructuring of the pull code.

Also agree.

But I think the core question here is - today deltas are pretty special
cased.  This continues that (which is fine).  But definitely one thing
I'd change doing deltas again is moving them into objects/ as just another type.
That would also have argued for things like having the index also be an object, etc.

Or to say this another way - if we make commitmeta more useful
we have a general mechanism to have per-commit dynamic metadata
which might be useful for something else in the future?

OTOH the fact you've already written the code
for delta indexes also weighs towards that for sure.

To be clear I am inclined to merge, but these sorts of format changes will add up to substantial maintenance 
burden long term if we accumulate too many, so let's be sure we've got this right and it will be worth it.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]