Dual cache key calculation and build planning modes



Hi all,

First post to the mailing list, thought this was a good opportunity to
discuss something here.

So we had an initial, vague draft here[0] regarding dual cache key
calculation modes, but the design has changed after meeting with Emmet
in Manchester a few months ago.

We discussed this briefly a while back when Jürg came back from
vacation but it's worthwhile to go over the details here on the list I
think.

Initially we just wanted a way to build locally and calculate the keys
in a different way, so that things would only be rebuilt if their
sources of build instructions have changed but not their dependencies,
while this essentially remains the same; Emmet made a good case that
just because we want this behavior, we should not have to lose the
strong cache key which represents exactly what it is that we built, I
agree with this and it a better design.

So for this email I'll set forth a premise and examine the cases where
it would have to be taken into account.


Premise
~~~~~~~
The premise is simply that every element has 2 cache keys generated at
all times.

  Strong Keys
  ~~~~~~~~~~~
  A strong cache key which is the same cache key we know today

  Weak Keys
  ~~~~~~~~~
  A weak cache key which is the element's inputs; independant of the
  cache keys of it's dependencies.

  As a detail, the names of the immediate build dependencies I think
  should be included in the weak cache key, which is to say that if we
  explicitly change an elements build dependencies, it forces a rebuild
  under weak cache key calculation mode.



So what will this mean ?


Cache key lookups
~~~~~~~~~~~~~~~~~
We will have to be able to lookup and determine the presence of an
artifact either by it's weak or strong cache key.


Resolving build plan
~~~~~~~~~~~~~~~~~~~~
This will have to learn about weak vs. strong planning modes, and will
effect any of the BuildStream commands which use a build plan (for
instance `bst fetch`, `bst show` and of course `bst build`).

When resolving a build plan, the presence of an element's weak artifact
is sufficient to be skipped in the build plan.


Artifact caching
~~~~~~~~~~~~~~~~
This is related somewhat to the cache key lookups, but I think
essentially it makes sense to push the same commit to two separate refs
at cache time.

So one invariant is that, in an artifact cache, there is always only
ever one ref for the strong key.

This is different from the weak key refs in the artifact cache, which
act as real branches, and may have multiple artifacts stored under the
name of a weak cache key ref.

Something interesting to note here is that with GNOME Continuous, which
uses the weak key only approach for determining what should be rebuilt
(or approximately), the build may report false failures at times, which
will by themselves disappear after a subsequent rebuild (which will in
our terminology, by pure accident produce a strong cache key).

For this reason, we have to at least be able to consider weak cache key
age. 

  A.) We need to be able to always build against the newest artifact
      for a given weak cache key, where "newest" is ideally an artifact
      that is closer to the strong key.

  B.) We probably need to have a way to force rebuild "old" weak
      artifacts, otherwise we never get this behavior where a false
      failure eventually falls back on it's feet and succeeds.

      Probably this just means building in strong cache key calculation
      mode every once and a while ?


Artifact Metadata
~~~~~~~~~~~~~~~~~
We should encode both strong and weak cache keys into the artifact
metadata.

Asides from the value of knowing the artifact's provenance, this will
provide a lookup mechanism that may be essential for BuildStream.

For instance, when downloading an artifact, either by it's strong or
weak key, we will need to examine this metadata at download time to
determine which keys to populate in the local cache with this artifact
(i.e. if you download an artifact by it's strong key, and you later
build in weak planning mode, you want to be able to use the downloaded
artifacts).


Artifact Publishing and Downloads
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
We should consider what impacts it will have for the user to have the
ability to share weak artifacts with their peers.

It's possible that with a swarm of builders which use primarily weak
cache key build planning mode... they might not see much benefit of
sharing artifacts at all, if we only ever upload the strong keys.

On the other hand, it's possible that even with one or two continuous
automated build machines contributing to the cache and building in
strong build planning mode, users who use weak cache key calculation
mode can at least benefit by downloading the artifacts they need.

In any case it looks like, at the very least we should be able to
lookup an artifact by it's weak key on the remote artifact cache, even
if the remote cache only ever contains artifacts built in strong cache
key calculation mode.

Any thoughts on this ?

Is it overly complex ? Or are there huge gaping holes in this plan ?

Cheers,
    -Tristan


[0]: https://wiki.gnome.org/Projects/BuildStream/Roadmap/CacheKeyModes



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]