Re: [BuildStream] Proposal: Reduce the amount of work that's done at places where we call Element._update_state()



On 2019-04-02 08:17, Tristan Van Berkom wrote:
Hi Jonathan,

Thanks for the detailed analysis.

[snip]

When looking at the above, what I am thinking is:

* We need to identify what exactly is the difference between
  _set_required() and _schedule_assemble().

The difference seems to be:
_set_required():
* The target elements are required.
* All runtime dependencies of a required element are also required.
* All build-depends of an element that is scheduled for assembly are also required (though the latest we can know whether an element is scheduled for assembly is after attempting to pull it). * All workspaced elements that aren't cached are required (since a workspace needs to be cached before we know its cache key).
* An element that is required and not cached will attempt to be pulled.
* A non-cached required element that failed to be pulled will be scheduled for assembly.

_schedule_assemble():
* All required elements that aren't cached and can't be pulled are required. * A workspaced element that is not cached is automatically scheduled for assembly, because it can't be pulled.

* We need to safely move _set_required() and _schedule_assemble()
  outside of cache key calculation logic

I agree whole-heartedly to this.

* We *might* need to split up cache key calculation logic into separate
  phases, instead of calculating weak/strong/strict in the same
  go.

  This one is a bit less certain, I am not sure if we gain anything
  by conditionally resolving these keys separately, or handing the
  responsibility of "calculating the keys" to a single function which
  conditionally resolves these keys separately.

I am also not sure here. For example, I will need to spend some time to work out whether we need the weak cache key at all when running in strict mode.

Now, to speak the fact that we know that cache keys are calculated
differently in different scenarios, I still wonder if we can delegate
this to a separate object who's responsibility is to calculate keys.

I think we know at load time, even as early as Element instantiation
time, what technique will be employed and what cache key behaviors will
be used for a given element, with this in mind, could we then do the
following ?

  * Create a CacheKey abstract class

  * Decide which CacheKey implementation to use for each element at
    load time

  * Delegate some methods to the CacheKey object

I think it would be ideal to move all of this out of element.py, and
having separate subclasses to handle the different scenarios of how a
cache key needs to be calculated would allow us to avoid the spaghetti
of conditional statements which is currently _update_state().

Does this make sense ?


This makes sense to me. I also would have replied sooner, but I took some time to see if I could come up with answers to these open questions, but that took longer than expected.

Thanks,

Jonathan

--
Jonathan Maw, Software Engineer, Codethink Ltd.
Codethink privacy policy: https://www.codethink.co.uk/privacy.html


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]