Re: [BuildStream] Proposal: Reduce the amount of work that's done at places where we call Element._update_state()

From: Tristan Van Berkom <tristan vanberkom codethink co uk>
To: Jonathan Maw <jonathan maw codethink co uk>, buildstream-list gnome org
Subject: Re: [BuildStream] Proposal: Reduce the amount of work that's done at places where we call Element._update_state()
Date: Tue, 02 Apr 2019 16:17:03 +0900

Hi Jonathan,

Thanks for the detailed analysis.

On Mon, 2019-04-01 at 14:59 +0100, Jonathan Maw via buildstream-list wrote:

Hi,

I have been looking at ways to reduce the amount of time we spend in 
Element._update_state(), because it is responsible for a significant 
amount of time.
e.g. in the last round of performance profiling, it took 16-20% of the 
time spent in `bst show`.


I think we all know that _update_state() itself is a bottleneck, I
think we know this mostly from intuition, we know that it is being
called overly frequently and we know that it does too much in one
place, and we also observe that it takes a large portion of the load
process.

However, 16-20% is quite meaningless here, it is quite fairly possible
that after all optimizations are done to the best of abilities, it will
still take 16-20% of load time and not be a problem anymore.

I would very much like to observe how exponential the algorithms are
which we are analyzing, i.e. I want to see time-per-record in a graph
which proves that the time-per-record spent in _update_state() is
increasing depending on how many elements are being loaded, and it
would be interesting to also know if this time-per-record increases in
horizontal graphs, where I expect it to currently increase mostly only
for vertical graphs (i.e. dependency chains vs one stack depending on
all elements).

In short, my point here is that we should not be counting 16-20% as any
kind of indication that there is a problem (we know there is a problem,
but not because of the number 16-20%).

To that end, I have spent some time tracing around how _update_state is 
called

(https://gitlab.com/BuildStream/buildstream/issues/902#note_153368417),

and produced the following tasks:


* via `Loader._get_loader()`:
   - [x] Only calculate the weak cache key <-- Implemented in 
https://gitlab.com/BuildStream/buildstream/merge_requests/1251
* via `Element._schedule_tracking()`:
   - [x] Remove call to _update_state() <-- Implemented in 
https://gitlab.com/BuildStream/buildstream/merge_requests/1251


I think both of these changes are wrong, as explained in these comments:

    https://gitlab.com/BuildStream/buildstream/merge_requests/1251/diffs#note_156279890
    https://gitlab.com/BuildStream/buildstream/merge_requests/1251/diffs#note_156281026

* via `Pipeline.resolve_elements()`:
   - [ ] Don't attempt to schedule assembly
* via `Element._set_required()`:
   - Conceivably all of _update_state could happen. Nothing to split out.
* via `Element._schedule_assemble()`:
   - [ ] Only needs to wipe workspaced elements and calculate cache 
keys/cached state.
* via `Element._tracking_done()`:
   - Conceivably all of _update_state could happen. Nothing to split out.
* via `Element._assemble_done()`:
   - [ ] Cache key calculation (even weak keys, because workspaced 
elements and BST_STRICT_REBUILD)
   - [ ] Scheduling assembly when in strict mode
* via `Element._fetch_done()`:
   - [x] Only update source state <-- 
https://gitlab.com/BuildStream/buildstream/merge_requests/1251
* via `Element._pull_done()`:
   - [ ] Strict and strong cache key calculation only (everything else 
has already been done)

Broadly, the purpose of these changes should be to improve performance 
and readability, and while this is easy for the ones already 
implemented, this becomes harder for the other tasks.


I appreciate your attention to detail here as it appears you have
examined what is actually required to be done in each case where
_update_state() is called, and considered what could be "split out" and
avoided in each of these cases.

I don't think the above should be considered a task list at all though,
lets interpret the above as a valuable case study to assist us in
making the right changes to address them (as you venture to do below).

e.g. `Element._update_state()` via `Pipeline.resolve_elements()` - In 
_update_state(), the logic for scheduling assembly is interleaved with 
calculating cache keys, because when buildstream isn't being run in 
strict mode, we can decide whether to schedule assembly based on the 
artifacts' weak cache key, instead of waiting for the strict cache key 
to become available (i.e. in non-strict mode we don't need to know the 
state of an element's dependencies to know whether it's ready to be 
built).


This is the best observation here; what in fact is interleaved ?

Also, a very good question here is: What is the real purpose of
_schedule_assemble() ?

It appears that this is just caching a value which we want to avoid
repeatedly resolving in the BuildQueue's Queue.status() implementation,
and that this cached state *might* go away entirely after Benjamin's
scheduling refactor:

    https://mail.gnome.org/archives/buildstream-list/2019-March/msg00034.html

Lets assume for the purpose of this discussion that we still need this
caching of "whether an element is ready to be assembled" state.

To avoid making an unmaintainable mess, `Pipeline.resolve_elements()` 
and `Element._update_state()` should both call common functions.

The best idea I can come up with to do this is:

1. Move the logic to calculate (and check if cached under that key) the 
weak (`self.__weak_cache_key`), strict (`self.__strict_cache_key`), and 
strong (`self.__cache_key`) cache keys into separate methods, e.g. 
`_get_weak_cache_key`, `_get_strict_cache_key`, `_get_strong_cache_key`.
These methods return the cache key, and calculate it if it's not already 
set.

2. Replace the cache key generation logic in `Element._update_state()` 
with the new methods.

3. `Pipeline.resolve_elements()` will:
    a. call `Element.__update_source_state()`
    b. return early if the element's consistency is 
Consistency.INCONSISTENT
    c. call _get_weak_cache_key, _get_strict_cache_key and 
_get_strong_cache_key in order, returning early if any of those fail.


I want to point out that what you are suggesting in (3) here, is
_exactly_ what _update_state() is already doing.

Does the above task list, and the suggested implementation of another 
task match people's expectations of how to reduce the amount of work 
that _update_state is doing?


Not exactly for me, but certainly close. Let's take a step back and
look at the bigger picture.

What do we know ?

* We know that the logic of whether an element is ready to be assembled
  or not is interleaved with the calculation of cache keys.

  We probably know that an element *cannot* be scheduled to be
  assembled until at least all of it's dependency cache keys are
  resolved, and it *cannot* be scheduled to be assembled until it's
  own keys have been resolved unless it is a workspaced element in
  which case it's cache key can only be discovered after a build.

  Does this mean that we can safely move the logic which decides
  whether an element can be scheduled to be assembled to a location
  which runs *after* the cache key logic runs ?

* We also know that the logic of whether an element is *required*
  is controlled by whether an element requires a build dependency
  which could not be pulled from a remote, and that similarly to
  the state of whether an element is ready to be assembled, this
  is controlled as a side effect of _update_state().

* We know that weak, strong and strict cache keys are being calculated
  at the same time, when perhaps they don't need to be.

* We know that the logic of how to treat cache keys differs depending
  on certain circumstances, these include:

  - If we are not in strict mode, we want to delay calculation of the
    cache key which will be used until after a pull is complete, unless
    we already have a cached artifact who's strong cache key matches
    our strict cache key for this session.

    Note: I have a feeling that we are not properly delaying reverse 
          dependency builds on discovery of the ultimately used
          artifact in this case, we might be scheduling assemble
          prematurely here

  - If an element is workspaced, the cache key is also calculated
    differently

What current refactors are already in flight ?

  - Benjamin has recently landed a patch which causes _update_state()
    to be called on reverse dependencies instead of at Queue.status()
    time, so now we have the ability to perform reverse dependency
    walks in the build graph:

    https://gitlab.com/BuildStream/buildstream/merge_requests/1070

  - The natural next step to the above, will be to remove all calls
    to Element.dependencies() when calculating cache keys.

    In the current state after Benjamin's refactor, we already
    recurse into reverse dependencies only when we have a new cache
    key to contribute to the reverse dependency's cache key.

    So the next step here will be to *hand over* our freshly
    calculated key to the reverse dependencies directly, such
    that the reverse dependency will never need a loop to accumulate
    they keys of it's dependencies.


When looking at the above, what I am thinking is:

* We need to identify what exactly is the difference between
  _set_required() and _schedule_assemble().

* We need to safely move _set_required() and _schedule_assemble()
  outside of cache key calculation logic

* We *might* need to split up cache key calculation logic into separate
  phases, instead of calculating weak/strong/strict in the same
  go.

  This one is a bit less certain, I am not sure if we gain anything
  by conditionally resolving these keys separately, or handing the
  responsibility of "calculating the keys" to a single function which
  conditionally resolves these keys separately.

Now, to speak the fact that we know that cache keys are calculated
differently in different scenarios, I still wonder if we can delegate
this to a separate object who's responsibility is to calculate keys.

I think we know at load time, even as early as Element instantiation
time, what technique will be employed and what cache key behaviors will
be used for a given element, with this in mind, could we then do the
following ?

  * Create a CacheKey abstract class

  * Decide which CacheKey implementation to use for each element at
    load time

  * Delegate some methods to the CacheKey object

I think it would be ideal to move all of this out of element.py, and
having separate subclasses to handle the different scenarios of how a
cache key needs to be calculated would allow us to avoid the spaghetti
of conditional statements which is currently _update_state().

Does this make sense ?

Cheers,
    -Tristan

Follow-Ups:
- Re: [BuildStream] Proposal: Reduce the amount of work that's done at places where we call Element._update_state()
  - From: Jonathan Maw

References:
- [BuildStream] Proposal: Reduce the amount of work that's done at places where we call Element._update_state()
  - From: Jonathan Maw

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]