Storing Source references (e.g. commit shas) separately
- From: Tristan Van Berkom <tristan vanberkom codethink co uk>
- To: BuildStream <buildstream-list gnome org>
- Cc: Michael Catanzaro <mcatanzaro gnome org>
- Subject: Storing Source references (e.g. commit shas) separately
- Date: Sun, 14 Jan 2018 21:39:05 +0900
Hi All,
This is a proposal to change the structure of the BuildStream project
format regarding Source references (normally git commit shas), such
that they can be stored separately. This would have to be opt-in by the
project.conf which would desire to use this new proposed approach.
I've added Michael on CC as he initially raised this issue on IRC.
Below is a more detailed proposal including a problem statement,
general solution and outline of implementation details.
Cheers,
-Tristan
Problem Statement
~~~~~~~~~~~~~~~~~
It appears that we have overlooked some issues in our endeavors to make
BuildStream convenient in cases where one desires to always build the
"latest of this branch" - a regular case for a group of developers
working on components of a common, integrated system.
With the knowledge that users would want to build the latest and
greatest, we added `bst build --track` and made `--track-save` an
explicit option in order to avoid modifying the elements with new
source references.
The intention here is to avoid having to doctor your ref-less project
whenever you want to pull new updates of the BuildStream project itself
with, e.g. `git pull --rebase` (if you are storing your project in
git).
So far so good - but we failed to look at the bigger picture here.
A.) If you want to open a workspace, BuildStream needs to know what
to use, we dont know what to stage into a workspace in the
absence of a reference.
B.) If you've run a build with `bst build --track` options and
without saving the refs, congratulations. You cannot test it.
o Since the refs were never stored, you cannot run a `bst shell`
on what you've built, let alone be confident that its the exact
binary output of what you just built (a later `bst track`
invocation can pull new references which occurred upstream
after your build completed).
o Neither can you run `bst checkout` to obtain the output of what
you just built, without the references of what it was.
Proposed Solution
~~~~~~~~~~~~~~~~~
What I'm proposing to fix this, is to store the source references in
a separate YAML dictionary stored beside `project.conf`. For the sake
of discussion, let's call this `project.refs`.
This is an interesting solution because BuildStream project maintainers
(i.e. those who maintain BuildStream projects in YAML) can decide
whether or not to revision this file; or only revision the file on
tagged release commits.
Opt-in Nature
~~~~~~~~~~~~~
For back compat, the default behavior should not be changed. This
can be opted in with a simple setting in the project.conf.
Behavior when enabled
~~~~~~~~~~~~~~~~~~~~~
When the feature is enabled, Source references will be read and
written to `project.refs` instead of their respective element files.
Should the project element files *also* contain tracked source
references, a warning should be issued for those at load time
explaining that the element `.bst` references will be ignored.
Implementation Details
~~~~~~~~~~~~~~~~~~~~~~
Format of project.refs
~~~~~~~~~~~~~~~~~~~~~~
This file will contain a simple YAML dictionary using the element
name as the key, the value of which is a list of dictionaries
corresponding to the ordered list of sources for a given element.
At the toplevel, we reserve some namespace for future expansion (I
expect this approach to also allow for solving the issue of depending
on third party projects via junction elements, where the referred
project itself is in a ref-less state - this would benefit from also
storing referred project related refs in this file as well).
Example:
~~~~~~~~
# The dictionary of source references
references:
# foo.bst has one git source
foo.bst:
- ref: 02349cfbbf6c5c1242681aa50b828f841e0e3a42
# bar.bst has two tarball sources
bar.bst:
- ref: 0b78b483c179f6998a0df582aea3d77340bb1e9d887b52ed8fae677d535fd19d
- ref: 185f0f175a90bcfc55cf3cf6ceff8d447a6269492c0ca1a1fc0748ea2c181363
Source API
~~~~~~~~~~
Since the ref of a Source can be loaded from multiple places, it is
not possible to implement this without requiring that Source plugins
implement some mechanism for loading a reference from a `node` that
is specified by the BuildStream core (where `node` in our terminology
refers to python dictionary loaded from YAML).
For this, I propose an additional `Source.load_ref()` method to
compliment the existing `Source.get_ref()` and `Source.set_ref()`,
the latter of which is already suitable for serializing the reference
to a core specified `node`.
# load_ref():
#
# Loads and returns the reference for this Source from a
# specified YAML node.
#
# Args:
# node: The YAML node to load the ref from
#
# Returns:
# The source reference, suitable for Source.set_ref()
#
This will be painless and easy to implement for any existing Source
plugins.
BuildStream core changes
~~~~~~~~~~~~~~~~~~~~~~~~
Some of the obvious, and more tricky parts of the core changes:
o We need to detect whether `project.refs` is enabled early in
the load from `project.conf` settings.
o While loading each individual Source from the `.bst` files,
we need to additionally call `Source.load_ref()` if that is
expected.
o In order to provide a useful warning about ignored references
in the element `.bst` files, we can also use `Source.load_ref()`
on the Source YAML representation at load time to see if it
returns a ref that will be ignored.
o Calling Source.load_ref() should check for the `ImplError`
exception to ensure that projects using this feature are
supported by the plugins in play - causing an early error
in case the plugin does not yet support `load_ref()`
o To retain round-tripping and preservation of possible user
modifications in `project.refs`, we need to store the
appropriate origin node (see source.py), this is used to
keep track of the original loaded dictionary so we can
later use that in Source.set_ref().
o In the TrackQueue(), after successfully obtaining a new
ref to serialize; care must be taken to keep an updated
version of `project.refs` in memory (avoid undoing the
result of a previous track job by overwriting it with the
old ref).
The file will be updated many times in a single tracking
session, this is already done in the main process when the
actual tracking work completes, so modifications to the
YAML which result from tracking are already serialized
(they already dont happen in parallel child tasks).
o And of course... Tests.
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]