On Mon, Oct 12, 2020 at 8:11 AM Alexander Larsson <alexl redhat com> wrote:
On Mon, 2020-10-12 at 06:55 -0600, Dan Nicholson wrote:On Mon, Oct 12, 2020 at 3:04 AM Alexander Larsson <alexl redhat com> wrote:In both of these cases, the client would fetch the commit object list to do accurate progress reporting or fall back to the existing progress reporting if it doesn't exist. If it does exist, then it could do accurate progress, but you could also do 2 more clever things. You could queue everything for pulling immediately rather than the current scheme of fetching dirtree objects and scanning them to find more objects to pull. But it would also allow better decisions to be made in the non-scratch delta case by calculating the total size of each vs the number of fetches needed. I.e., if you have 20% of the objects the size of an object pull might be smaller but if it's going to take 1000 HTTP requests to get there instead of 10, it might take less time to get the delta at the expense of some wasted bandwidth.What about this approach: If instead of just having this list of the reachable objects from the commit we make a new (optional) object type, the "mega dirtree" object. This would contain all the dirtree objects reachable from a dirtree (typically the root of the commit). In terms of size this would probably be similar to the list of reachable object ids you've experiemented with. But then we can immediately write all these objects out and do a smarter pull operation.
Yeah, that's interesting. Since the dirtree objects contain the checksums for the children as well as the names, it would be bigger than the flat list of objects and wouldn't have deduplication of objects in the listing. You could also put that in the commit object and save yourself a roundtrip. It would be nice to get all the checksums and paths at once instead of traversing the commit, though. I did realize that my table the other day was wrong since I was using an older ostree from before I fixed several bugs in the sizes generation. Here's an updated version with current ostree. I added a couple things this time. You were concerned about the size of the commit object since flatpak sizes is already a concern with people, so I wanted to see the size of the commit object relative to both the download and install size of the objects. I also was curious how much you could save by compressing the commit object over the network as most HTTP servers offer. I used zlib level 1 as that's what nginx does by default. Ref Objects Download Install Current With Sizes Cur Comp Sizes Comp --------------------------------------------- --------- ---------- --------- --------- ------------ ---------- ------------ runtime/org.freedesktop.Platform/x86_64/19.08 12822 214.2 MiB 602.7 MiB 2.6 KiB 515.6 KiB 1.1 KiB 481.0 KiB runtime/org.gnome.Platform/x86_64/3.38 21740 313.4 MiB 835.1 MiB 2.4 KiB 872.9 KiB 1.0 KiB 813.2 KiB app/org.gimp.GIMP/x86_64/stable 10406 109.4 MiB 313.7 MiB 1.5 KiB 419.5 KiB 933 bytes 392.6 KiB app/org.mozilla.firefox/x86_64/stable 229 76.2 MiB 208.1 MiB 1.5 KiB 10.1 KiB 908 bytes 9.4 KiB app/com.spotify.Client/x86_64/stable 1010 11.4 MiB 32.0 MiB 1.8 KiB 40.0 KiB 1.1 KiB 39.1 KiB One of the bugs I had fixed was that the sizes entries were being reused between commits since they're stored in the repo struct. It still makes the objects quite a bit bigger but only the GNOME platform is approaching 1 MB, which is a fraction of the total size. The compression helps a bit but not much.
Attachment:
add-commit-sizes
Description: Binary data