[BuildStream] Proposal: Reduce the amount of work that's done at places where we call Element._update_state()



Hi,

I have been looking at ways to reduce the amount of time we spend in Element._update_state(), because it is responsible for a significant amount of time. e.g. in the last round of performance profiling, it took 16-20% of the time spent in `bst show`.

To that end, I have spent some time tracing around how _update_state is called (https://gitlab.com/BuildStream/buildstream/issues/902#note_153368417), and produced the following tasks:


* via `Loader._get_loader()`:
- [x] Only calculate the weak cache key <-- Implemented in https://gitlab.com/BuildStream/buildstream/merge_requests/1251
* via `Element._schedule_tracking()`:
- [x] Remove call to _update_state() <-- Implemented in https://gitlab.com/BuildStream/buildstream/merge_requests/1251
* via `Pipeline.resolve_elements()`:
  - [ ] Don't attempt to schedule assembly
* via `Element._set_required()`:
  - Conceivably all of _update_state could happen. Nothing to split out.
* via `Element._schedule_assemble()`:
- [ ] Only needs to wipe workspaced elements and calculate cache keys/cached state.
* via `Element._tracking_done()`:
  - Conceivably all of _update_state could happen. Nothing to split out.
* via `Element._assemble_done()`:
- [ ] Cache key calculation (even weak keys, because workspaced elements and BST_STRICT_REBUILD)
  - [ ] Scheduling assembly when in strict mode
* via `Element._fetch_done()`:
- [x] Only update source state <-- https://gitlab.com/BuildStream/buildstream/merge_requests/1251
* via `Element._pull_done()`:
- [ ] Strict and strong cache key calculation only (everything else has already been done)

Broadly, the purpose of these changes should be to improve performance and readability, and while this is easy for the ones already implemented, this becomes harder for the other tasks.

e.g. `Element._update_state()` via `Pipeline.resolve_elements()` - In _update_state(), the logic for scheduling assembly is interleaved with calculating cache keys, because when buildstream isn't being run in strict mode, we can decide whether to schedule assembly based on the artifacts' weak cache key, instead of waiting for the strict cache key to become available (i.e. in non-strict mode we don't need to know the state of an element's dependencies to know whether it's ready to be built).

To avoid making an unmaintainable mess, `Pipeline.resolve_elements()` and `Element._update_state()` should both call common functions.

The best idea I can come up with to do this is:

1. Move the logic to calculate (and check if cached under that key) the weak (`self.__weak_cache_key`), strict (`self.__strict_cache_key`), and strong (`self.__cache_key`) cache keys into separate methods, e.g. `_get_weak_cache_key`, `_get_strict_cache_key`, `_get_strong_cache_key`. These methods return the cache key, and calculate it if it's not already set.

2. Replace the cache key generation logic in `Element._update_state()` with the new methods.

3. `Pipeline.resolve_elements()` will:
   a. call `Element.__update_source_state()`
b. return early if the element's consistency is Consistency.INCONSISTENT c. call _get_weak_cache_key, _get_strict_cache_key and _get_strong_cache_key in order, returning early if any of those fail.

Does the above task list, and the suggested implementation of another task match people's expectations of how to reduce the amount of work that _update_state is doing?

Thanks,

Jonathan

--
Jonathan Maw, Software Engineer, Codethink Ltd.
Codethink privacy policy: https://www.codethink.co.uk/privacy.html


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]