Revisions Draft RFC [Re: Proposed public data fields for dpkg build and deploy elements]



Forking this thread.

This started as a reply to Jonathan about how to handle format changes
and migrations for public data, and evolved into a complete draft of
how we should handle versioning information in general in BuildStream.

On Thu, 2017-07-06 at 17:49 +0100, Jonathan Maw wrote:
On 2017-07-06 12:32, Tristan Van Berkom wrote:

[...]
Okay, I have some concerns about how this format for packaging
scriptlets will work with other packaging systems, but to avoid
blocking your progress, lets try to address these concerns separately.

Instead, can you outline a plan for revisioning of this data ?

What I'm mainly concerned with, is if we eventually want to use this
data for another packaging system but share the same scriptlets to
deploy to multiple packaging systems, what happens then ?

  A.) The dpkg-deploy element should probably understand what revision
      the scriptlets were written for, and behave in a backwards
      compatible way.

  B.) A potential new package deployment element, say RPM, which cannot
      handle the old version, should be able to bail out with an error
      when detecting an incompatible 'package-scripts' syntax

I take it you're referring to versioning of the public data, so that we 
can handle
changes to the format sensibly?

I haven't given it much thought. I suppose there are two options:

1) Add a version number to the "bst" domain.

    i.e. public.bst.version is defined in the defaults, and when we alter 
the format for public data, that version goes up.

2) Add version numbers to the subdomains in "bst".

    i.e. there'd be public.bst.split-rules.version, 
public.bst.dpkg-data.version and public.bst.package-scripts.version 
defined in the defaults.

I'd prefer version 2), since that way someone who wants to add something 
completely separate doesn't have to comprehend the entire history of 
public data formats.

So I've been giving this some thought and haven't arrived as any
conclusions yet, but I can see this as a potential repeating pattern,
also there are more than just format revisions to consider with
BuildStream.

Element 'config' sections will probably have to be revisioned; the base
buildstream format will have to be revisioned, and now it seems that
public data will have to be revisioned.

At what granularity we revision public data is difficult to say, but
at face value it seems to make sense to only revision public data at
the domain level but not at the subdomain level.

I think this activity constitutes more thought process than actual work
(enforcing and supporting revisions will be a hand full of relatively
trivial patches), so I'm drafting an initial pass on revisions here.


BuildStream Software Version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I have not quite decided on when it is appropriate to bump the major
version, but every 6 months there should be a public release of a
stable BuildStream version with a minor point revision bump (e.g. 1.2
becomes 1.3 if a feature was added in a release cycle).

Starting from BuildStream 1.0, any API additions or keyword argument
additions in the public API should include an annotation of what
version of BuildStream they were added in (e.g. "Since: 1.2")

Source and Element plugins must advertise the minimal bound BuildStream
dependency - for convenience, any plugins residing in the BuildStream
source base will just advertise the current version of BuildStream as
their dependency.

After the release of BuildStream 1.0, all public python API surfaces
must be considered stable and remain backwards compatible moving
forward.


BuildStream Format Versions
~~~~~~~~~~~~~~~~~~~~~~~~~~~
The format versions are different as they provide a guarantee of what
features are available to a project.

This is rendered a bit more complex by the fact that third party
plugins are allowed to exist, this means that the core BuildStream
format (i.e. conditional statements, dependency declarations, variants,
etc) needs to have one format revision, and all plugins need to have an
individual revision as well.

To simplify things; I would propose that we keep a single BuildStream
format revision and we have all plugins which are hosted in the
BuildStream source repository use the same revision number.

A project will be able to assert a minimal bound revision for
BuildStream and for any plugins it uses in the project.conf, if the
installation of BuildStream has an old revision for the overall format,
or for any of the loaded plugins; BuildStream will abort and tell the
user that they need a newer installation for the given project.

Again, from BuildStream 1.0; any additions and enhancements to the BuildStream format must remain backwards 
compatible with older versions.

The policy for bumping the BuildStream format revision is, if any features have been added to the base 
format, or to any of the plugins, or if a new plugin has been added over the course of a release cycle (which 
should be 6 months following the GNOME release schedule), then the format revision must be bumped once 
immediately; there should be only at most 1 revision bump in a given release cycle.


Public Data
~~~~~~~~~~~
Public data is a bit tricky, but I think the most straight forward way of dealing with this is to say that:

  o Public Data in the "bst" domain is revisioned with the main
    BuildStream format revision

  o If Public Data is not in the "bst" domain, then it is specific to
    a given element type which consumes that data, and as such it is
    revisioned with the format version of the given element plugin.


BuildStream Artifact Versions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The artifact version is a different beast altogether, and is only ever bumped when the underlying code causes 
the build output to differ for a given calculated cache key - OR - if the cache key calculation algorithm has 
changed in any way.

Note that incrementing the BuildStream version itself, OR incrementing the format version, does not by itself 
constitute bumping the artifact version.

This provides two separate guarantees:

  A.) The same cache key for a given artifact will always produce the
      same output (bit-for-bit identical ideally, if we have
      sufficiently reproducible builds).

  B.) Building the same project with a version of BuildStream that
      carries the same artifact version, will produce the same
      cache key.

Here again, unfortunately we have to consider third party plugins so it cannot be one single BuildStream 
artifact version. This is because third party plugins loaded in the pipeline may change over time in ways 
that can both potentially effect how the output is created, or effect how a cache key is calculated for the 
given element (via Element.get_unique_key()).

So, this means that the artifact version is actually a single master revision, plus a dictionary of plugin 
artifact revisions.

There are two avenues we can follow regarding this revision:

  o For convenience, only ever bump a single artifact version for any
    and all BuildStream first class citizen plugins (plugins which are
    maintained as a part of BuildStream).

    In this case we can ignore the plugins which are maintained as a
    part of BuildStream and have a more comprehensive artifact version.

  o It is generally undesirable to bump the artifact version over time,
    because it means you need to go get an old version of BuildStream
    if you want to produce the same cache key and output for the same
    project, years later.

    In this light we could chose to revision the artifact version of
    each plugin separately. If for example the cmake build element
    artifact version is bumped, it need not have any effect on projects
    which do not use cmake.


Specifically Regarding Artifact Revisions in Plugins
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Another thing to consider is how we add features which effect cache key calculation to indivitual Element 
plugins which provide the Element.get_unique_key() implementations.

If for example, an Element adds a new feature to it's format, this constitutes a format bump, however it 
still must remain backwards compatible with older formats.

So, Element.get_unique_key() is written properly, the artifact version for this element need not be bumped, 
as only *usage* of the new feature would cause the cache key to change; but projects which do not use the new 
feature would still produce the same output and can still produce the same cache key.

An example here... of a bad Element.get_unique_key() implementation:

  ===============================
  return {
    "foo": self.get_foo(),
    "bar": self.get_bar(),
    "new-feature": self.get_new_feature_configuration()
  }
  ===============================

An example of a backward compatible Element.get_unique_key() implementation which may not require an artifact 
revision bump:

  ===============================
  unique_key = {}
  unique_key["foo"] = self.get_foo()
  unique_key["bar"] = self.get_bar()

  if self.get_new_feature_configuration() is not None:
    unique_key["new-feature"] = if self.get_new_feature_configuration()

  return unique_key
  ===============================


Sorry for the long email, but there are a lot of details to consider as usual, any thoughts or comments 
appreciated.

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]