Re: Extending ostree's functionality (adding a metadat index)

From: Colin Walters <walters verbum org>
To: Vivek Dasmohapatra <vivek collabora co uk>
Cc: ostree-list gnome org
Subject: Re: Extending ostree's functionality (adding a metadat index)
Date: Fri, 30 Aug 2013 15:20:35 -0400

On Fri, 2013-08-30 at 18:31 +0100, Vivek Dasmohapatra wrote:

Not quite, because the next commit adds the copy of the sizes from the existing
ref into the new commit's size cache, so that committing from multiple refs 
still produces a size cache.


Ah, I see.  This is a bit magical though in that any other users of the
API will get invalid sizes unless they know to call that.  I do plan to
convert gnome-ostree at some point over to directly using the API rather
than spawning ostree.

Hmm.  Maybe what we could do is have an API to read directly from a
commit into a mtree, like ostree_repo_stage_commit_to_mtree () that
could call this internally.

I suppose another problem arises because the overlay might contain files
with the same paths, so we need to de-dupe by path as well as checksum
so that the size cacheis accurate.


While OstreeMutableTree doesn't at the moment have an API to *delete* a
pathname, it could.  Were we to support that it'd be really hard for an
API user to keep the sizes right.

It's a bit of a correctness-versus-performance thing here.

For gnome-ostree the final tree compose is the slowest part, and that's
with the hacky --link-checkout-speedup where we backreference from the
device/inode pair -> checksum.  Although I'm suspecting it's becoming a
bit of a pessimization at this point because the repository has over a
million objects at this point (all builds since last September)...I
really should prune it =)

What would help for gnome-ostree is to check out the union of all the
components, run triggers like /sbin/ldconfig, then
walk the tree and find *new* files, then commit the components + trigger
new files as a union of trees.  Then we avoid both scanning the entire
repository to build up the devino->checksum cache and re-checksumming
each tree.

This optimization would only work if we assume triggers aren't
overwriting files from the tree, but I think that's a safe assumption.

If we to calculate the sizes by walking the tree during commit, that's a
bit unfortunate in archive-z2 because by default we have to actually
open each .filez and read the header.  On the other hand, I think it's
reasonable to assume that someone will toss sufficient RAM at a builder
that the recent files remain in the page cache.

So...I think I like the "obviously correct" approach of calculating
during commit, but if you feel strongly I could be convinced it's worth
having the more complex code to cache during the commit process.

Follow-Ups:
- Re: Extending ostree's functionality (adding a metadat index)
  - From: Vivek Dasmohapatra

References:
- Extending ostree's functionality (adding a metadat index)
  - From: Vivek Dasmohapatra
- Re: Extending ostree's functionality (adding a metadat index)
  - From: Vivek Dasmohapatra
- Re: Extending ostree's functionality (adding a metadat index)
  - From: Colin Walters
- Re: Extending ostree's functionality (adding a metadat index)
  - From: Vivek Dasmohapatra
- Re: Extending ostree's functionality (adding a metadat index)
  - From: Vivek Dasmohapatra
- Re: Extending ostree's functionality (adding a metadat index)
  - From: Colin Walters
- Re: Extending ostree's functionality (adding a metadat index)
  - From: Vivek Dasmohapatra

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]