Re: Sizes data in flatpaks



On Mon, Oct 12, 2020 at 8:11 AM Alexander Larsson <alexl redhat com> wrote:

On Mon, 2020-10-12 at 06:55 -0600, Dan Nicholson wrote:
On Mon, Oct 12, 2020 at 3:04 AM Alexander Larsson <alexl redhat com>
wrote:


In both of these cases, the client would fetch the commit object list
to do accurate progress reporting or fall back to the existing
progress reporting if it doesn't exist. If it does exist, then it
could do accurate progress, but you could also do 2 more clever
things. You could queue everything for pulling immediately rather
than
the current scheme of fetching dirtree objects and scanning them to
find more objects to pull. But it would also allow better decisions
to
be made in the non-scratch delta case by calculating the total size
of
each vs the number of fetches needed. I.e., if you have 20% of the
objects the size of an object pull might be smaller but if it's going
to take 1000 HTTP requests to get there instead of 10, it might take
less time to get the delta at the expense of some wasted bandwidth.

What about this approach:

If instead of just having this list of the reachable objects from the
commit we make a new (optional) object type, the "mega dirtree" object.
This would contain all the dirtree objects reachable from a dirtree
(typically the root of the commit). In terms of size this would
probably be similar to the list of reachable object ids you've
experiemented with. But then we can immediately write all these objects
out and do a smarter pull operation.

Yeah, that's interesting. Since the dirtree objects contain the
checksums for the children as well as the names, it would be bigger
than the flat list of objects and wouldn't have deduplication of
objects in the listing. You could also put that in the commit object
and save yourself a roundtrip. It would be nice to get all the
checksums and paths at once instead of traversing the commit, though.

I did realize that my table the other day was wrong since I was using
an older ostree from before I fixed several bugs in the sizes
generation. Here's an updated version with current ostree. I added a
couple things this time. You were concerned about the size of the
commit object since flatpak sizes is already a concern with people, so
I wanted to see the size of the commit object relative to both the
download and install size of the objects. I also was curious how much
you could save by compressing the commit object over the network as
most HTTP servers offer. I used zlib level 1 as that's what nginx does
by default.

Ref                                              Objects  Download
Install    Current    With Sizes    Cur Comp    Sizes Comp
---------------------------------------------  ---------  ----------
---------  ---------  ------------  ----------  ------------
runtime/org.freedesktop.Platform/x86_64/19.08      12822  214.2 MiB
602.7 MiB  2.6 KiB    515.6 KiB     1.1 KiB     481.0 KiB
runtime/org.gnome.Platform/x86_64/3.38             21740  313.4 MiB
835.1 MiB  2.4 KiB    872.9 KiB     1.0 KiB     813.2 KiB
app/org.gimp.GIMP/x86_64/stable                    10406  109.4 MiB
313.7 MiB  1.5 KiB    419.5 KiB     933 bytes   392.6 KiB
app/org.mozilla.firefox/x86_64/stable                229  76.2 MiB
208.1 MiB  1.5 KiB    10.1 KiB      908 bytes   9.4 KiB
app/com.spotify.Client/x86_64/stable                1010  11.4 MiB
32.0 MiB   1.8 KiB    40.0 KiB      1.1 KiB     39.1 KiB

One of the bugs I had fixed was that the sizes entries were being
reused between commits since they're stored in the repo struct. It
still makes the objects quite a bit bigger but only the GNOME platform
is approaching 1 MB, which is a fraction of the total size. The
compression helps a bit but not much.

Attachment: add-commit-sizes
Description: Binary data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]