Re: Extending ostree's functionality (adding a metadat index)



On Fri, 2013-08-30 at 18:31 +0100, Vivek Dasmohapatra wrote:

Not quite, because the next commit adds the copy of the sizes from the existing
ref into the new commit's size cache, so that committing from multiple refs 
still produces a size cache.

Ah, I see.  This is a bit magical though in that any other users of the
API will get invalid sizes unless they know to call that.  I do plan to
convert gnome-ostree at some point over to directly using the API rather
than spawning ostree.

Hmm.  Maybe what we could do is have an API to read directly from a
commit into a mtree, like ostree_repo_stage_commit_to_mtree () that
could call this internally.

I suppose another problem arises because the overlay might contain files
with the same paths, so we need to de-dupe by path as well as checksum
so that the size cacheis accurate.

While OstreeMutableTree doesn't at the moment have an API to *delete* a
pathname, it could.  Were we to support that it'd be really hard for an
API user to keep the sizes right.

It's a bit of a correctness-versus-performance thing here.

For gnome-ostree the final tree compose is the slowest part, and that's
with the hacky --link-checkout-speedup where we backreference from the
device/inode pair -> checksum.  Although I'm suspecting it's becoming a
bit of a pessimization at this point because the repository has over a
million objects at this point (all builds since last September)...I
really should prune it =)

What would help for gnome-ostree is to check out the union of all the
components, run triggers like /sbin/ldconfig, then
walk the tree and find *new* files, then commit the components + trigger
new files as a union of trees.  Then we avoid both scanning the entire
repository to build up the devino->checksum cache and re-checksumming
each tree.

This optimization would only work if we assume triggers aren't
overwriting files from the tree, but I think that's a safe assumption.

If we to calculate the sizes by walking the tree during commit, that's a
bit unfortunate in archive-z2 because by default we have to actually
open each .filez and read the header.  On the other hand, I think it's
reasonable to assume that someone will toss sufficient RAM at a builder
that the recent files remain in the page cache.

So...I think I like the "obviously correct" approach of calculating
during commit, but if you feel strongly I could be convinced it's worth
having the more complex code to cache during the commit process.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]