Re: ostree.sizes metadata issues?



On Mon, 2020-08-24 at 06:58 -0600, Dan Nicholson wrote:
On Mon, Aug 24, 2020, 3:56 AM Alexander Larsson via ostree-list <
ostree-list gnome org> wrote:
I notices that the endless ostree.sizes metadata support landed,
and
I'm interested in using these to improve the flatpak download size
estimations.

However, I seem to remember that there was some kind of issue with
these the last time they were tried. Wasn't there a size limit for
the
commit metadata or something that got overrun by large commits and
made
this not work? Was something done to fix this?

It definitely works and I doubt any flatpaks will be larger than the
OS we ship. I fixed several smaller issues when I was upstreaming it.
The issue is that it makes the commit metadata really large for large
commits. IIRC, our OS commit objects are like 4MB or something like
that. So, it can make things a little show since ostree will always
fetch the commit metadata when pulling. For smaller commits I don't
think it would be much of an issue, but for runtimes you'd probably
see it.

I don't remember the exact details, but OSTree has this:

/**
 * OSTREE_MAX_METADATA_SIZE:
 *
 * Default limit for maximum permitted size in bytes of metadata objects fetched
 * over HTTP (including repo/config files, refs, and commit/dirtree/dirmeta
 * objects). This is an arbitrary number intended to mitigate disk space
 * exhaustion attacks.
 */
#define OSTREE_MAX_METADATA_SIZE (10 * 1024 * 1024)

And I remember endless running into this at some point back when it was originally using ostree.sizes.

The other thing that's a little funky is that you need to do a
metadata only pull to get it, which leaves the repo in a state where
it thinks it has a partial commit and skips deltas as was found
before. I think there are ways to handle it and flatpak already does
something like this for a different reason I can't recall, but it's
something I thought needed a better solution for in ostree.

I've always wanted to have flatpak use it, but I recall that you were
opposed to it.

We did initially support it, but we stopped using it because it wasn't
very effective. For example, per your above OS commit size of 4M, if we
assume that runtimes are around half that, then for an update operation
we need to download 2MB extra per updated runtime before we can even
display the list of things to download (which has the estimated size).

Our current approach of assuming nothing is shared is not great for
size estimation, but its a lot slimmer than that. 

Maybe there is some in-between option? For example, we could store a
mapping of the size difference between a commit an a number of its
ancestors in a simple commit-id -> download size mapping. It will not
be perfect, because you might have additional objects already
available, but it will be a lot better than the current worst-case
scenario. And it would be a lot cheaper to both download and compute
the download size than with the full object list.






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]