Re: Sizes data in flatpaks



On Mon, Oct 12, 2020 at 3:04 AM Alexander Larsson <alexl redhat com> wrote:

These are not near the 10MB metadata size limit, but they are still
quite large. For instance, it doesn't seem useful (too slow) to
download these before starting the actual flatpak download operations,
so it will not help produce more useful initial estimates for the
downloads in the flatpak cli prompt. That particular problem is imho a
larger problem than the actual progress bar, as the larger numbers
makes people think that flatpak is more bloated than it actually is.

I agree there. I don't think trying to get accurate numbers before the
pull for the estimation is that helpful. If you're including the size
of the commit object in the size estimate, then it would increase
that.

Another approach is to have an dynamic REST call where you give a list
of ref/commit/local_commit tuples and the server will compute (with
server-side caching) the list of download sizes. This would be
optional, but implemented by e.g. flat-manager and used by flatpak to
amend the download size shown in the table. This way we only do one
extra network roundtrip to get this info, and we only transfer the
information we need (the size) rather than all the details about all
the objects.

This call would be optional and if not implemented by the server, it
could just do whatever it currently does.

In theory I like this idea, however I think it would make things worse
for the client. In order to let the server do an accurate calculation,
the client would need to send its entire list of local objects, which
in many cases would be larger than the list of objects in the desired
commit. Also, at least in the US it's quite typical to have an
asymmetric connection with upload speeds a fraction of download
speeds.

A similar approach would keep the calculation on the client side with
the list of commit objects provided by the server. You could do this
in a separate REST API outside of the summary fetch. Or it could be
done as a separate repo object. The advantage of a separate object is
you don't need a server providing the API, it's still optional on the
server, and you can only fetch it when needed. There only 2 cases you
want to do an object pull is when the server doesn't have a delta or
would prefer an object pull over a non-scratch delta.

In both of these cases, the client would fetch the commit object list
to do accurate progress reporting or fall back to the existing
progress reporting if it doesn't exist. If it does exist, then it
could do accurate progress, but you could also do 2 more clever
things. You could queue everything for pulling immediately rather than
the current scheme of fetching dirtree objects and scanning them to
find more objects to pull. But it would also allow better decisions to
be made in the non-scratch delta case by calculating the total size of
each vs the number of fetches needed. I.e., if you have 20% of the
objects the size of an object pull might be smaller but if it's going
to take 1000 HTTP requests to get there instead of 10, it might take
less time to get the delta at the expense of some wasted bandwidth.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]