[BuildStream] Plans for workspaces and incremental builds

From: Darius Makovsky <darius makovsky codethink co uk>
To: buildstream-list gnome org
Subject: [BuildStream] Plans for workspaces and incremental builds
Date: Wed, 02 Oct 2019 12:13:45 +0100

Recently I've been thinking about workspaces and how they currently work

versus how they should work in the future. One of the main goals is tofacilitateremote execution (RE) builds of workspaced sources in addition to localbuild support.

I've had some initial thoughts about this.

In order to support RE, workspaces will be staged via the sourcecache.This

will fundamentally change the nature of workspaces from their current

implementation such that test expectations should be revisited: ascheduledprocess no longer affects the directory on the local filesystem (wsdir).(Thischange was committed in !1563[1].) In this context a process issomething

encapsulating any rule-based change (such as a build).

`f(x) = x' = T_x`

Consequently, the post-process wsdir key is identical to the pre-processwsdirkey and the concept of key stability can be removed: WS keys do notrequireresetting and post-process recalculation and meaningful keys areobtained at

staging.

In order to support incremental builds it will be necessary to have amechanismto produce the difference of source trees (`h(x,y) = d`) and apply adifference

(`h^-1(x,d) = y`). It will also be necessary to track a previous
state of the workspace.

Currently only successful builds are tracked in the workspace (via the

persisting workspace metadata) but I think this must change to track thelastWS key regardless of the success of the process. Assuming that thepreviousdigest is stored then the associated build tree is recoverable via thecache.

The scheme for incremental builds could then be expressed as:

1. Given current workspace state `y`, and stored input state `x => T_x`
2. Verify that `h^-1(x, T_x) == T_x`  If this verification fails, then

incremental build cannot continue and we should fall back to `f(y) =T_y`

3. Compute the delta between `x` and `y`: `h(x,y) = d`
4. Apply that delta to the previous build's output: `h^-1(T_x, d) => y'`
5. Apply the process to that new input state: `f(y') = T_y'`

Assuming that `f()` represents a sane build system, we can believe thattheapplication of `f()` to `y'` will produce a build tree functionallyequivalent,if not identical, to `f(y)` (`T_y = T_y'`). The verification step in 2may failif, for example, a build system chooses to remove one of its inputs aspart of

the build process.

In addition to storing the source digest of the previous wsdir on eachprocessit will be useful to store the dependency hash and the artifact ref(necessaryfor application of the source difference). If the dependency hashchanges

between processes then a complete build will be required rather than an
incremental build.

I would like to get the opinions of the list on this before movingfurther ahead.There is a development branch removing the concept of cache keystability and key

recalculation[2] which currently seems to only fail
`tests/integration/shell.py::test_workspace_visible`. In summary:

* remove unstable cache key concept
* do not reset or recalculate workspace cache keys
* store source digest, dependency hash, and artifact ref for workspaces
* introduce mechanism to diff and apply trees
* add logic to decide to continue or abort incremental builds

[1] https://gitlab.com/BuildStream/buildstream/merge_requests/1563

[2]https://gitlab.com/BuildStream/buildstream/tree/traveltissues/benchmark-3


Best Regards,
Darius

Follow-Ups:
- Re: [BuildStream] Plans for workspaces and incremental builds
  - From: Darius Makovsky
- Re: [BuildStream] Plans for workspaces and incremental builds
  - From: William Salmon

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]