Re: [BuildStream] cache key changes



Hi,

I might make sense to clean house in this area indeed, maybe it's time
to drop the ability of `project` and `context` to potentially
contribute to the cache key (they are mainly there for extensibility,
but maybe by now we know with reasonable certainty that these entities
have no business contributing to the cache keys).

I have a couple of strong objections to the particulars here though,
comments inline:

On Fri, 2019-07-05 at 15:00 +0100, Darius Makovsky via buildstream-list 
wrote:
[...]
* `artifact-version` is replaced by `core-version`

I think this is wrong. If this is only a name change, I think it's more
clear as `artifact-version`, as this is clearly *not* tied to the core
BuildStream release version and very intentionally so.

If this is a proposal to tie artifact cache keys directly to the core
release version, then I think this is very wrong, as it would incur
cache key changes where there is no need for them.

In an ideal world:

* The core artifact version almost never changes

* We add features to BuildStream, YAML format enhancements, plugin
  features, etc, always in ways which do not impact cache keys

* In the edge cases where we absolutely need to change the format
  of the artifact itself in order to implement a new feature (i.e.
  we need to add or change machine readable metadata in the artifact),
  then we bump the artifact version at that time

This is basically what we have now, but *further* than this, it will be
desirable to support older versions of artifacts at the expense of
newer features not working in BuildStream. This would be a feature we
would want to implement after all the dust settles and BuildStream
development slows down, it would allow project authors to specify which
"artifact version" they want to use for their project (possibly
limiting the featureset accessible to users of that project).

This is to say that you should be ideally able to upgrade BuildStream
all the time and keep using your already built artifacts, you may have
an expensive procedure for validating these artifacts so you want to
keep them as long as possible and not trigger rebuilds and incur
revalidation where it is not needed.


* `element` is replaced by:
    - `element-plugin-key`
* `element-plugin-name` is added
* `element-plugin-version` is added

Again, for elements and sources, we don't want to force the format or
release version to be tied to the artifact version, we want to allow
plugins to provide long term artifact cache key stability.

Asides from this, we also want projects to be able to decide how they
trust their plugins to be stable in this way - such that a project
could decide that they *want* to bump cache keys for this plugin if
anything about the plugin has changed (perhaps by checksumming the
python file itself), but this should be optional.



As a separate point, it appears that adding the plugin names to the
cache keys is a very important missing piece (as plugins with similar
configuration formats could be otherwise interchangeable), but I should
point out that a plugin name is not sufficient to specify a given
plugin's identity.

Plugins are project bound, and what might be known as the "git" plugin
for one project can be something entirely different for another project
being loaded across junctions, BuildStream ensures that projects should
have full control on what plugins they load, and that those plugins can
coexist in the same session safely.

So, while it is certainly good to include the plugin name, it might be
important to include the "plugin origins" configuration from the
project in the cache keys as well, coupled together, these two pieces
of information should ideally uniquely identify the plugins in use.

If we include the plugin origins, we should probably *keep* the project
specific key from Project.get_unique_key(). Delegate that contribution
to the cache key to Project would both be more elegant than trying to
do it all in Element, and also more optimal, as the Project could hash
it's own checksum once and only contribute a string to every cache key.

Cheers,
    -Tristan




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]