Re: Extending ostree's functionality (adding a metadat index)



On Fri, 2013-08-23 at 12:44 +0100, Vivek Dasmohapatra wrote:

Ok, been hacking on this for a bit, and have a branch which implements
this feature. The branch adds:

  - index of the filez object checksums + archived size + unpacked size

First, how about instead of "index" (which might confuse people who know
git since it has nothing to do with git's "index"), we call it something
like "cachedsizes"?  Or "sizelist"?  Below I'll call it "sizelist", but
I'm happy to choose other names[1].

Now, you mentioned this before but I forgot to reply; I don't think this
should be a new object type. 

The object types are things that are "core" to the data model, where as
this data is entirely derivative/duplicative; we're carrying it in the
repo as an addition/optimization.  It should be possible to safely
delete it and the only loss is that pull no longer gives you a reliable
progress bar.  (Which is the current status quo)

So I think that the .sizelist or whatever should simply be something
that ostree-repo-pull.c knows about, but ostree-core.c maybe only has
functions to retrieve the filename.

While on the "static deltas" branch I haven't added the code to actually
retrieve them over HTTP yet, look at the changes to ostree-core.c:

https://git.gnome.org/browse/ostree/commit/?h=wip/delta&id=47ccde3c365d90ffe0ada85f7c28b81b2c23b8ee

That just has the filename bits.  Then I added a new header:

https://git.gnome.org/browse/ostree/diff/src/libostree/ostree-repo-static-delta-private.h?h=wip/delta&id=47ccde3c365d90ffe0ada85f7c28b81b2c23b8ee

With the intent that only ostree-repo-pull.c includes this.

Concretely, now that libostree exists as a public shared library, I
don't think we should expose the sizelist existence/internals there to
potential consumers like PackageKit.

  - said index is downloaded on pull if it is available
  - the index is entirely optional, pull doesn;t care if it isn't there

Right.

  - pull has a metadata flag which pulls only the .commit and .index

And then we're just printing it?  I'll look at your code, but I think
what we want here is to cache the .sizelist for a given .commit, so we
don't redownload it each time the upgrade UI queries it.

Concretely if userspace code (PackageKit?) is using ostree pull to poll
for updates, it would be unfortunate if each time we polled (once a
day?) we noticed there was still a new commit, and redownloaded
the .sizelist.

  - the `summary' command shows the size of a commit (archived & unpacked)
    - if invoked on a remote ref, summary pulls the metadata if it is not
      available.

Ok, I think the end goal we are shooting for here though is that there's
some higher level system management tool like e.g. PackageKit which
should be consuming a C API.

Given that what ot-builtin-summary.c is doing here is parsing
the .sizelist only to add up the individual entries, how about we add an
API like:

gboolean
ostree_repo_get_commit_disk_usage (OstreeRepo  *repo,
                                   const char  *checksum,
                                   guint64     *out_uncompressed_size,
                                   guint64     *out_compressed_size,
                                   GCancellable *cancellable,
                                   GError     **error);
                      
Then ot-builtin-summary could do the first part of what it's doing now,
with the partial metadata fetch.  Then it could simply invoke the above
API, rather than parsing the internals of the sizelist itself.

  - there are tests for the index creation and metadata pull, followed by
    a pull of the rest of the content

Yay, tests!

Obviously the ideal scenario for us is if this gets absorbed by
upstream ostree - if it's not suitable for inclusion for some reason
please let me know and I'll see what I can do about any problem(s).

So some of your patches I am just going to cherry-pick immediately.
Like:
http://cgit.collabora.com/git/user/vivek/ostree.git/commit/?id=1c7a7a009af79c0357dbaafdc16a0e954314f73a
is obviously useful and I will want it for static deltas too.  However,
can you match the existing commit style of:

"topicname: What changed

Why"

For example, I would write the commit message here as:

"fetcher: Return NOT_FOUND error code for 404 or 410

This will be used by the pull code for optional data."

https://live.gnome.org/GnomeLove/SubmittingPatches has a more extensive
explanation of the style ostree uses.

I tend to include at least a very brief "why" for every commit, since
while sometimes it's obvious in context, a lot of times if you come back
5 years later it's a lot less obvious.
 
Thanks for working on this!

[1] As they say, there are two hard problems in computer science; cache
    invalidation, naming things, and off by one errors.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]