Storing Source references (e.g. commit shas) separately



Hi All,

This is a proposal to change the structure of the BuildStream project
format regarding Source references (normally git commit shas), such
that they can be stored separately. This would have to be opt-in by the
project.conf which would desire to use this new proposed approach.

I've added Michael on CC as he initially raised this issue on IRC.

Below is a more detailed proposal including a problem statement,
general solution and outline of implementation details.

Cheers,
    -Tristan


Problem Statement
~~~~~~~~~~~~~~~~~
It appears that we have overlooked some issues in our endeavors to make
BuildStream convenient in cases where one desires to always build the
"latest of this branch" - a regular case for a group of developers
working on components of a common, integrated system.

With the knowledge that users would want to build the latest and
greatest, we added `bst build --track` and made `--track-save` an
explicit option in order to avoid modifying the elements with new
source references.

The intention here is to avoid having to doctor your ref-less project
whenever you want to pull new updates of the BuildStream project itself
with, e.g. `git pull --rebase` (if you are storing your project in
git).

So far so good - but we failed to look at the bigger picture here.


  A.) If you want to open a workspace, BuildStream needs to know what
      to use, we dont know what to stage into a workspace in the
      absence of a reference.

  B.) If you've run a build with `bst build --track` options and
      without saving the refs, congratulations. You cannot test it.

      o Since the refs were never stored, you cannot run a `bst shell`
        on what you've built, let alone be confident that its the exact
        binary output of what you just built (a later `bst track`
        invocation can pull new references which occurred upstream
        after your build completed).

      o Neither can you run `bst checkout` to obtain the output of what
        you just built, without the references of what it was.



Proposed Solution
~~~~~~~~~~~~~~~~~
What I'm proposing to fix this, is to store the source references in
a separate YAML dictionary stored beside `project.conf`. For the sake
of discussion, let's call this `project.refs`.

This is an interesting solution because BuildStream project maintainers
(i.e. those who maintain BuildStream projects in YAML) can decide
whether or not to revision this file; or only revision the file on
tagged release commits.


  Opt-in Nature
  ~~~~~~~~~~~~~
  For back compat, the default behavior should not be changed. This
  can be opted in with a simple setting in the project.conf.


  Behavior when enabled
  ~~~~~~~~~~~~~~~~~~~~~
  When the feature is enabled, Source references will be read and
  written to `project.refs` instead of their respective element files.

  Should the project element files *also* contain tracked source
  references, a warning should be issued for those at load time
  explaining that the element `.bst` references will be ignored.


Implementation Details
~~~~~~~~~~~~~~~~~~~~~~

  Format of project.refs
  ~~~~~~~~~~~~~~~~~~~~~~
  This file will contain a simple YAML dictionary using the element
  name as the key, the value of which is a list of dictionaries
  corresponding to the ordered list of sources for a given element.

  At the toplevel, we reserve some namespace for future expansion (I
  expect this approach to also allow for solving the issue of depending
  on third party projects via junction elements, where the referred
  project itself is in a ref-less state - this would benefit from also
  storing referred project related refs in this file as well).

  Example:
  ~~~~~~~~

  # The dictionary of source references
  references:

    # foo.bst has one git source
    foo.bst:
    - ref: 02349cfbbf6c5c1242681aa50b828f841e0e3a42

    # bar.bst has two tarball sources
    bar.bst:
    - ref: 0b78b483c179f6998a0df582aea3d77340bb1e9d887b52ed8fae677d535fd19d
    - ref: 185f0f175a90bcfc55cf3cf6ceff8d447a6269492c0ca1a1fc0748ea2c181363


  Source API
  ~~~~~~~~~~
  Since the ref of a Source can be loaded from multiple places, it is
  not possible to implement this without requiring that Source plugins
  implement some mechanism for loading a reference from a `node` that
  is specified by the BuildStream core (where `node` in our terminology
  refers to python dictionary loaded from YAML).

  For this, I propose an additional `Source.load_ref()` method to
  compliment the existing `Source.get_ref()` and `Source.set_ref()`,
  the latter of which is already suitable for serializing the reference
  to a core specified `node`.

    # load_ref():
    #
    # Loads and returns the reference for this Source from a
    # specified YAML node.
    #
    # Args:
    #    node: The YAML node to load the ref from
    #
    # Returns:
    #    The source reference, suitable for Source.set_ref()
    #

  This will be painless and easy to implement for any existing Source
  plugins.


  BuildStream core changes
  ~~~~~~~~~~~~~~~~~~~~~~~~
  Some of the obvious, and more tricky parts of the core changes:

    o We need to detect whether `project.refs` is enabled early in
      the load from `project.conf` settings.

    o While loading each individual Source from the `.bst` files,
      we need to additionally call `Source.load_ref()` if that is
      expected.

    o In order to provide a useful warning about ignored references
      in the element `.bst` files, we can also use `Source.load_ref()`
      on the Source YAML representation at load time to see if it
      returns a ref that will be ignored.

    o Calling Source.load_ref() should check for the `ImplError`
      exception to ensure that projects using this feature are
      supported by the plugins in play - causing an early error
      in case the plugin does not yet support `load_ref()`

    o To retain round-tripping and preservation of possible user
      modifications in `project.refs`, we need to store the
      appropriate origin node (see source.py), this is used to
      keep track of the original loaded dictionary so we can
      later use that in Source.set_ref().

    o In the TrackQueue(), after successfully obtaining a new
      ref to serialize; care must be taken to keep an updated
      version of `project.refs` in memory (avoid undoing the
      result of a previous track job by overwriting it with the
      old ref).

      The file will be updated many times in a single tracking
      session, this is already done in the main process when the
      actual tracking work completes, so modifications to the
      YAML which result from tracking are already serialized
      (they already dont happen in parallel child tasks).

    o And of course... Tests.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]