Re: [BuildStream] Proposal: A small number of subprocesses handling jobs



On Mon, 2019-03-04 at 10:40 +0100, Jürg Billeter wrote:
On Mon, 2019-03-04 at 18:07 +0900, Tristan Van Berkom wrote:
On Mon, 2019-03-04 at 18:02 +0900, Tristan Van Berkom via
buildstream-list wrote:
Hi,

[...]
I would personally rather be reluctant about imposing this explicit
parallelism knowledge on the plugin API, and seek other
justifications (performance ?) before making plugins aware of what is
processed where.

Command batching already breaks that serial plugin API, though. My
proposal would simply take this further to cover staging.

One possibly significant difference is that plugins don't have to use
command batching, while my proposal would likely no longer allow
plugins to ignore it.

So here is another idea...

What if we elaborated and generalized further on state synchronization,
such that:

* We had a dedicated process pool
* Plugins are run in the process pool
* The Element/Source API used in a worker pool would obtain state from
  the main process on demand, such that we load state lazily on demand
  by querying the main process.
* The core's responsibility then is to:
  * Dispatch jobs in worker threads
  * Respond to generic get/set requests to the data model in the
    main process
  * Update the UI periodically

This approach might incur a lot more context switching between
processes where plugins load state from the main process, but python
processing remains parallelized and the whole implementation is
transparent to the plugin.

On initial thoughts, this doesn't sound very appealing to me with
regards to implementation complexity and maintainability. Or can you
think of a way to implement this without significant extra complexity
in the core?

Firstly, I don't want to avoid significant extra complexity in the
core, I want to avoid *any* complexity in a plugin.

I personally attribute a huge amount of value to how simple the plugin
API is, and am happy to trade 1kloc in the core, under our control, in
order to avoid 1loc in the plugins which users should be able to easily
write themselves.

As an implementation, I would imagine perhaps we have a separate
subprocess to manage actual state and have both the main process and
worker processes be clients to that state-serving process.

The core Element and Source APIs would pass through a separate
ElementProxy and SourceProxy API for storing and loading state,
probably this would amount to a lot of boiler plate code, more than it
would amount to complexity - to fine tune things for performance we
would probably end up making compromises with lazy loading.

It might not be the best approach, the best approach might very well be
the current fork() on demand model we already have (unless it really
*is* costing us too much time while forking, which I think still needs
to be proven).

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]