Re: [BuildStream] Proposal: A small number of subprocesses handling jobs



On Mon, 2019-03-04 at 16:39 +0900, Tristan Van Berkom wrote:
On Mar 4, 2019, at 3:52 PM, Jürg Billeter <j bitron ch> wrote:
Assuming you mean running plugin code in additional threads, I
think
that would actually be much more complicated than my suggestion, from
the API point of view.  The complete API would either have to become
thread-safe or have clear documentation what is allowed from what
thread (the GIL doesn't avoid the need for higher level
synchronization).

Afaics, making the API threadsafe does not complicate the API, it
only complicates the implementation, plugin implementations would
happily not need to lock anything or know they are running in a
thread.

They should also not have to understand what code is running in the
main thread vs what is subproccesed, I hope to avoid pushing any of
this understanding onto plugin authors.

Implicit element-level locking might work from the plugin API point of
view.  However, considering implementation and long-term maintenance,
I'm still strongly in favor of an async approach with all Python code
running in a single thread (per process).

Also, (virtual) staging and artifact caching take Python CPU time and
thus, I suspect the benefit of the threading model to be minimal with
the GIL in place.

The python core libraries themselves release the GIL responsibly when
calling into C libraries, I suspect that with BuildBox integration we
should be doing the same.

Virtual staging, i.e., combining the files/trees of the build
dependencies, is not planned to be moved to BuildBox.  BuildBox will
require an already merged tree as input, just like remote execution. 
With the recent (+ pending) optimizations, virtual staging has become
much faster, however, it might still be a bottleneck if we have to do
it in the main Python process.

While threading makes some implementation details tricky, it also
simplifies other parts (state synchronization would become simpler,
queues would not have to inform the data model of changes at all).

Moving the state logic to the main process is what simplifies these
parts.  This would be the case for both the threading approach and the
async approach.

Also I think my original question needs answering, how heavy were the
builds in the sample which shows that spawning a process is
unreasonably slow, how do we know this is a non negligible overhead ?

With the knowledge that most builds will themselves spawn many
processes anyway, why is it worth making such drastic changes ?

I agree that this question should be answered, however, my main
motivation is not fork(2) overhead but rather:
 * The already mentioned simplified state handling.
 * Avoid the issue with (e.g., gRPC, OSTree) background threads in the
   main process.
 * Allow long-living gRPC connections, see #810. buildbox-casd would
   mitigate this, though, as shared connection is much less important
   for a local service.
 * Possible future native support for Windows, which doesn't support
   fork(2). Although, I don't see this happening in the foreseeable
   future.

My comments so far have been focusing on Element methods
stage()/prepare()/assemble(). Sources also need to be considered,
though.  Some source implementations are CPU intensive.  They might be
significantly easier to hand off to a worker pool, though, as their API
surface is much more limited, as far as I can tell.

Cheers,
Jürg



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]