Re: Idea: daily packs

On Tue, 2012-08-21 at 23:15 -0400, Colin Walters wrote:
> On Mon, 2012-08-20 at 11:18 -0400, Colin Walters wrote:
> > So...I'll experiment with trying the 50% heuristic for content by
> > tonight - I suspect doing that and dropping --related from the default
> > command will give us a lot of the win of just wget on a .tar.gz.
> I experimented with this a bit locally; ended up choosing 66% just on
> gut instinct.  I haven't tried doing a full download, but thinking about
> this more, I suspect the problem we're going to hit is a "sudden cliff"
> where most of the pack files stop having 66% of the desired objects.
> If you think about it, short of some sort of "important tree"-aware
> clustering (i.e. pack all objects from
> gnomeos-3.6-{x86_64,i686}-{runtime,devel} together), what's going to
> happen as the repository grows a longer history is that (due to SHA256)
> the objects for the latest tree will get spread out over more packfiles.
> Whether or not we periodically regenerate the packfiles is a factor here
> too.

The basic point of my idea was to repack from scratch with "important
tree" aware clustering every night - with the simple clustering approach
of packing all the objects referenced from the important tree(s) and not
packing anything else.

(Beyond that you could imagine taking the remaining loose objects and
packing them up according to what is referenced by a series of older
trees receding into the past.)

> This patch may still be worth it - however note that we're not fetching
> data pack files asynchronously, and objects from them are added
> serially, without checksums in separate threads etc.  On my laptop
> ostree processes a 25MiB pack file in about two seconds, but still.
> I may still apply this, but I think I'd like to try just gzipping all of
> the loose objects and eating the ~40% increase in disk space on the
> server.  Note if we do this - we also *halve* our HTTP request count
> which is a really big deal.
> Another approach to making packfiles better - since they get worse as
> history grows longer, we could fix that by just trimming history
> aggressively.  These are binaries, not source code - so we can in theory
> regenerate builds whenever we want. 

Why trim history when you can simply trim the history that you include
in pack files?

- Owen

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]