Re: Extending ostree's functionality (adding a metadat index)
- From: Colin Walters <walters verbum org>
- To: Vivek Dasmohapatra <vivek collabora co uk>
- Cc: ostree-list gnome org
- Subject: Re: Extending ostree's functionality (adding a metadat index)
- Date: Fri, 30 Aug 2013 15:20:35 -0400
On Fri, 2013-08-30 at 18:31 +0100, Vivek Dasmohapatra wrote:
Not quite, because the next commit adds the copy of the sizes from the existing
ref into the new commit's size cache, so that committing from multiple refs
still produces a size cache.
Ah, I see. This is a bit magical though in that any other users of the
API will get invalid sizes unless they know to call that. I do plan to
convert gnome-ostree at some point over to directly using the API rather
than spawning ostree.
Hmm. Maybe what we could do is have an API to read directly from a
commit into a mtree, like ostree_repo_stage_commit_to_mtree () that
could call this internally.
I suppose another problem arises because the overlay might contain files
with the same paths, so we need to de-dupe by path as well as checksum
so that the size cacheis accurate.
While OstreeMutableTree doesn't at the moment have an API to *delete* a
pathname, it could. Were we to support that it'd be really hard for an
API user to keep the sizes right.
It's a bit of a correctness-versus-performance thing here.
For gnome-ostree the final tree compose is the slowest part, and that's
with the hacky --link-checkout-speedup where we backreference from the
device/inode pair -> checksum. Although I'm suspecting it's becoming a
bit of a pessimization at this point because the repository has over a
million objects at this point (all builds since last September)...I
really should prune it =)
What would help for gnome-ostree is to check out the union of all the
components, run triggers like /sbin/ldconfig, then
walk the tree and find *new* files, then commit the components + trigger
new files as a union of trees. Then we avoid both scanning the entire
repository to build up the devino->checksum cache and re-checksumming
each tree.
This optimization would only work if we assume triggers aren't
overwriting files from the tree, but I think that's a safe assumption.
If we to calculate the sizes by walking the tree during commit, that's a
bit unfortunate in archive-z2 because by default we have to actually
open each .filez and read the header. On the other hand, I think it's
reasonable to assume that someone will toss sufficient RAM at a builder
that the recent files remain in the page cache.
So...I think I like the "obviously correct" approach of calculating
during commit, but if you feel strongly I could be convinced it's worth
having the more complex code to cache during the commit process.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]