Re: [BuildStream] Proposal: A small number of subprocesses handling jobs



Hi Tristan,

On Mon, 2019-03-04 at 15:38 +0900, Tristan Van Berkom wrote:
On Mar 4, 2019, at 2:35 PM, Jürg Billeter <j bitron ch> wrote:
Another option was discussed during the gathering.  We could execute
the plugin code itself in the main process/thread and hand off only the
expensive parts to a worker pool.  Not allowing arbitrary plugin code
to run in the worker makes it much simpler and more efficient to pickle
the data that is actually needed by the worker.

A possible implementation approach would be to extend the command
batching concept to also cover staging dependencies and sources, and
using a single batch/context to cover integration and build commands. 
I.e., instead of having the plugin actually do the expensive
operations, the plugin will create an operation list (possibly using a
context manager), which the BuildStream core will execute outside the
control of the plugin.

This sounds like complicating the plugin apis significantly due to
technical difficulties which should be solvable outside of the
plugin.

In general I would much rather go to great lengths in the core in
order to provide a luxuriously simple and attractive api for plugin
authors.

Beyond this, i suspect that what you describe will have similar
performance as moving to a threading model, and think we should try
that before resorting to complicating the plugin API.

Assuming you mean running plugin code in additional threads, I think
that would actually be much more complicated than my suggestion, from
the API point of view.  The complete API would either have to become
thread-safe or have clear documentation what is allowed from what
thread (the GIL doesn't avoid the need for higher level
synchronization).  Due to this, I strongly dislike this approach.  It
brings all the problems of multi-threaded programming with very limited
benefit (GIL).

With a context manager-based or similar approach, my suggestion
wouldn't make the API much more complicated beyond the complexity
introduced by the already existing command batching (although I suspect
we'll need a larger API change/break).

That said, allowing plugins to generate only a single batch/context
might be somewhat limiting. While this limit would allow more
flexibility in BuildStream core and likely be beneficial for remote
execution, supporting multiple batches/contexts should definitely be
possible.  Maybe using Python async/await syntax for convenience.

With the GIL in place, we only run python in the main thread, while
long standing I/O and system calls should be parallelized (sounds
similar to what you are suggesting).

Also, (virtual) staging and artifact caching take Python CPU time and
thus, I suspect the benefit of the threading model to be minimal with
the GIL in place.

Cheers,
Jürg



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]