Re: [BuildStream] Proposal: A small number of subprocesses handling jobs





On Mar 4, 2019, at 3:52 PM, Jürg Billeter <j bitron ch> wrote:

Hi Tristan,

On Mon, 2019-03-04 at 15:38 +0900, Tristan Van Berkom wrote:
On Mar 4, 2019, at 2:35 PM, Jürg Billeter <j bitron ch> wrote:
Another option was discussed during the gathering.  We could execute
the plugin code itself in the main process/thread and hand off only the
expensive parts to a worker pool.  Not allowing arbitrary plugin code
to run in the worker makes it much simpler and more efficient to pickle
the data that is actually needed by the worker.

A possible implementation approach would be to extend the command
batching concept to also cover staging dependencies and sources, and
using a single batch/context to cover integration and build commands. 
I.e., instead of having the plugin actually do the expensive
operations, the plugin will create an operation list (possibly using a
context manager), which the BuildStream core will execute outside the
control of the plugin.

This sounds like complicating the plugin apis significantly due to
technical difficulties which should be solvable outside of the
plugin.

In general I would much rather go to great lengths in the core in
order to provide a luxuriously simple and attractive api for plugin
authors.

Beyond this, i suspect that what you describe will have similar
performance as moving to a threading model, and think we should try
that before resorting to complicating the plugin API.

Assuming you mean running plugin code in additional threads, I think
that would actually be much more complicated than my suggestion, from
the API point of view.  The complete API would either have to become
thread-safe or have clear documentation what is allowed from what
thread (the GIL doesn't avoid the need for higher level
synchronization).

Afaics, making the API threadsafe does not complicate the API, it only complicates the implementation, plugin 
implementations would happily not need to lock anything or know they are running in a thread.

They should also not have to understand what code is running in the main thread vs what is subproccesed, I 
hope to avoid pushing any of this understanding onto plugin authors.

 Due to this, I strongly dislike this approach.  It
brings all the problems of multi-threaded programming with very limited
benefit (GIL).

With a context manager-based or similar approach, my suggestion
wouldn't make the API much more complicated beyond the complexity
introduced by the already existing command batching (although I suspect
we'll need a larger API change/break).

That said, allowing plugins to generate only a single batch/context
might be somewhat limiting. While this limit would allow more
flexibility in BuildStream core and likely be beneficial for remote
execution, supporting multiple batches/contexts should definitely be
possible.  Maybe using Python async/await syntax for convenience.

With the GIL in place, we only run python in the main thread, while
long standing I/O and system calls should be parallelized (sounds
similar to what you are suggesting).

Also, (virtual) staging and artifact caching take Python CPU time and
thus, I suspect the benefit of the threading model to be minimal with
the GIL in place.

The python core libraries themselves release the GIL responsibly when calling into C libraries, I suspect 
that with BuildBox integration we should be doing the same.

While threading makes some implementation details tricky, it also simplifies other parts (state 
synchronization would become simpler, queues would not have to inform the data model of changes at all).

Also I think my original question needs answering, how heavy were the builds in the sample which shows that 
spawning a process is unreasonably slow, how do we know this is a non negligible overhead ?

With the knowledge that most builds will themselves spawn many processes anyway, why is it worth making such 
drastic changes ?

Cheers,
    -Tristan




Cheers,
Jürg





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]