[BuildStream] Cache Key stability [Was: Execution environments]



Hi Sander,

Forking subject line as we're a bit off the original topic...

On Fri, 2018-11-02 at 11:07 +0000, Sander Striker wrote:


On Fri, Nov 2, 2018 at 8:53 AM Tristan Van Berkom <tristan vanberkom codethink co uk> wrote:
On Thu, 2018-11-01 at 23:21 +0000, Sander Striker via BuildStream-list wrote:
Hi Jürg,

[...]
 
This requires a cache key version bump doesn't it?  If we are only assuming
backward compatibility in terms of consuming from existing caches, we
would generate the cache key both with the old and the new version.  We
check if an artifact exists under the new key, if not we check under the old key.
If not present we build the artifact.  We then store the artifact under the new
version key.

We currently do not make guarantees about the cache key stability
beyond the promise that cache keys will not change from one major
release to the next (e.g. they have not changed in 1.2, and they will
not change for the duration of 1.4).

Which is to say cachekeys will not change until 2.0?  Or did you mean "will not change from one _minor_ 
release to the next"?

Technically I mean "minor" release, semantic versioning-wise. I tend to
loosely say "major" here because I consider API breaks to be a failure
of our mission and thus far have been hoping to never need to roll out
a 2.0 version.

So yes, so far cache key breakages only ever happen in a "minor"
release but are guaranteed to remain the same in every "micro" release
in that "minor" release line.

There is a sensible plan for supporting cache key stability, however it
requires that we revision the artifact version in such as way as to
cope with artifacts which are formatted differently.

i.e.:

* When implementing features, we necessarily change how an artifact is
  stored (most notably the metadata changes, but introduction of build
  trees is another tangible example).

  BuildStream would have to continue to understand every previous
  artifact version, from the point of implementing this stability
  onwards.

* Projects would need to inform BuildStream which artifact version they
  intend to use.

* BuildStream would also have to lock down some new features to only be
  available since a given artifact version, and error out gracefully.

While this is very much a long term goal for us, I have been advising
against implementing this in the short term due to the immense amount
of churn and feature additions we've been seeing in the last year.

I would advise to wait until the dust settles a bit more, when
BuildStream is less of a fast moving target, we can implement this with
much less overhead.

I think that's ok still, we can revisit post-1.4.  To spell it out:
it currently means a client upgrade could result in 0% cache hits. 
And as multiple versions of clients exist, they cannot benefit from
eachothers cached results.

Yes it means that, but it means something a bit more than that as well.

Essentially cache key stability means that one can keep distributing
and trusting the same binary artifacts for many years, assuming a
project where some binary artifacts are more fast moving than others.

If you have a 100% bit-for-bit reproducible build for a given artifact,
it will be possible to create a new artifact for it and confidently
state that it is already validated (e.g., it is exactly the same binary
which has been controlling the landing gear in your Boeing 747 for 15
years, thus it need not be revalidated).

If your build is not 100% bit-for-bit reproducible, then it is a shame
that your build tool is mandating that you need to rebuild and
revalidate something which has been trusted for 15 years.

I've had this conversation many times in the last two years and have
mostly been against the complexity that cache key stability necessarily
causes the codebase, the above argument about requiring revalidation of
trusted artifacts I think is the most important and winning argument.

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]