Re: [BuildStream] Proposal: A small number of subprocesses handling jobs



On 2019-03-04 10:01, Tristan Van Berkom via buildstream-list wrote:
On Mon, 2019-03-04 at 10:40 +0100, Jürg Billeter wrote:
On Mon, 2019-03-04 at 18:07 +0900, Tristan Van Berkom wrote:
> On Mon, 2019-03-04 at 18:02 +0900, Tristan Van Berkom via
> buildstream-list wrote:
> > Hi,
>
> [...]
> > I would personally rather be reluctant about imposing this explicit
> > parallelism knowledge on the plugin API, and seek other
> > justifications (performance ?) before making plugins aware of what is
> > processed where.

Command batching already breaks that serial plugin API, though. My
proposal would simply take this further to cover staging.

One possibly significant difference is that plugins don't have to use
command batching, while my proposal would likely no longer allow
plugins to ignore it.

> So here is another idea...
>
> What if we elaborated and generalized further on state synchronization,
> such that:
>
> * We had a dedicated process pool
> * Plugins are run in the process pool
> * The Element/Source API used in a worker pool would obtain state from
>   the main process on demand, such that we load state lazily on demand
>   by querying the main process.
> * The core's responsibility then is to:
>   * Dispatch jobs in worker threads
>   * Respond to generic get/set requests to the data model in the
>     main process
>   * Update the UI periodically
>
> This approach might incur a lot more context switching between
> processes where plugins load state from the main process, but python
> processing remains parallelized and the whole implementation is
> transparent to the plugin.

On initial thoughts, this doesn't sound very appealing to me with
regards to implementation complexity and maintainability. Or can you
think of a way to implement this without significant extra complexity
in the core?

Firstly, I don't want to avoid significant extra complexity in the
core, I want to avoid *any* complexity in a plugin.

I personally attribute a huge amount of value to how simple the plugin
API is, and am happy to trade 1kloc in the core, under our control, in
order to avoid 1loc in the plugins which users should be able to easily
write themselves.

As an implementation, I would imagine perhaps we have a separate
subprocess to manage actual state and have both the main process and
worker processes be clients to that state-serving process.

The core Element and Source APIs would pass through a separate
ElementProxy and SourceProxy API for storing and loading state,
probably this would amount to a lot of boiler plate code, more than it
would amount to complexity - to fine tune things for performance we
would probably end up making compromises with lazy loading.

It might not be the best approach, the best approach might very well be
the current fork() on demand model we already have (unless it really
*is* costing us too much time while forking, which I think still needs
to be proven).

Cheers,
    -Tristan


Hi,

Today I had a discussion with Daniel, James, Chandan, Gokcen, Angelos and Chiara about the subprocess model, and our conclusions were broadly the same.

We discussed the possibility of a thread-based model, but didn't take it particularly seriously or explore implementation in detail because of the effect the GIL would have on virtual filesystem behaviour.

We discussed the pool of subprocesses, and came to broadly two different ways to go about it:

1. A multiprocessing pool that forks off at the start of the scheduler.
===

* Changes to the element in the lifetime of a job will be captured, and passed through the job's result object when the job finishes. * Changes from a job result will be received by the scheduler and pushed to each worker subprocess. * Mandate that only the element that the job is running for can be changed in the job - A "soft" mandate (changes will not be propagated to the other workers) is enough for normal operation, but a separate mode where such changes are forbidden (or any changes outside the element are thrown away) would be useful for debugging. * A "pristine" subprocess that is unchanged by jobs would be useful for forking new subprocesses, especially if we decide that each worker should have a finite lifetime.

2. An element graph service
===

i.e. one subprocess holds the pipeline and uses some form of IPC to get and set state changes.

This is a valuable long-term goal to work towards, once we have a much better idea of where/when we access the element graph, and have completely encapsulated every time a plugin would access the element graph. At this point, that process will be acutely affected by any slowness in `_update_state()`, and if plugin authors can affect this then we will have to hope/impress/demand that this should have a small time impact. (As an aside, this is currently not the case. git-based plugins implement `validate_cache()` and fork off a git subprocess to find the branch and tag. using libgit2 here would be valuable)

===

Overall, our next actions in this regard will be:

1. Continue studying/simplifying/splitting up `_update_state()`, as the more we understand the ways that the element's state is altered, the better. 2. Work out all the places where elements read/write to the element graph, so we can identify which parts of the API can be extended with state tracking (for returning all state changes in a build result, or calling an element graph service), and consider adding new methods for the places where the graph is affected directly.

Best regards,

Jonathan
--
Jonathan Maw, Software Engineer, Codethink Ltd.
Codethink privacy policy: https://www.codethink.co.uk/privacy.html


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]