[BuildStream] Proposal: A small number of subprocesses handling jobs

From: Jonathan Maw <jonathan maw codethink co uk>
To: buildstream-list gnome org
Subject: [BuildStream] Proposal: A small number of subprocesses handling jobs
Date: Fri, 22 Feb 2019 17:04:27 +0000

Hi all.

I've been looking at optimisations, and it seems that there is asignificant amountof time spent in forking off to new processes. (jennis' profile showed126s in a 576s

build).

I propose that we can reduce this by instantiating a small number ofsubprocesses

and having them perform jobs, instead.

There is a summary at the bottom if you're not interested in thedetails.


# Changes in detail

This is my first time looking at the scheduler in detail, so myunderstandingmay be incorrect, so in the places I propose to make changes, I willoutline

how I think it works now, then how I would change it.

## Jobs

### Now

Currently, a Job is a combination of a subprocess' messaging frameworkto sendmessages to the parent process, the handler of all multiprocessinglogic, and

the handler for doing the actual work.

The main process runs an event loop for the entire duration thatsubprocessesare running, and the loop is subscribed to the queue in`Job._parent_start_listening()`

The child process does not receive any messages from the main processfor

its entire duration.

### What I'd change

I would separate the Job class into a Job that contains the actual worklogic,and a WorkerSubprocess that handles the subprocess handling andmessaging.

Beyond this, I'm less certain. Currently there is no way to send amessage

to a Job, it is created with all the information it needs.

My thought right now is for the Scheduler to create aMultiprocessing.Queuethat every WorkerSubprocess is subscribed to, where the Scheduler putsJobs

into the queue, and the WorkerSubprocesses pop jobs from the queue when
they're ready.

## Resources

### Now

`buildstream/_scheduler/resources.py` contains the Resources object.This keepstrack of how many of a resource can be allocated, and which ones arecurrently

allocated.
`buildstream/_scheduler/queues/queue.py` is responsible for reserving
resources and dispatching jobs.

Resources currently supports four kinds of resource:
* CACHE, i.e. whether a job needs to access the artifact cache.
* DOWNLOAD, i.e. whether a job needs to download something.
* PROCESS, i.e. whether a job is processor-intensive.
* UPLOAD, i.e. whether a job needs to upload something.

### What I'd change

I would add a new resource type, SUBPROCESS, which all jobs need toclaim.This is a little bit silly, as every kind of job needs a subprocess, butit

makes use of a common code path.

# The big problem

The big problem with moving to this kind of model is that we need to
synchronise the state of the pipeline to the worker subprocesses.

Previously, that happened automatically as the creation of a newsubprocess

creates a copy of the process' current memory, and so the state was

synchronised to the start of the job, and a job did not need toresynchronise

during its lifetime.

This is a problem because there are parts of the pipeline which do notremainstatic. I don't have a complete list of all the ways a pipeline changes,but I

know of:
* An element's ref is not known until its sources have been tracked.

* An element's cache key is not known until its ref is known, and thecache

  keys of all its dependencies.
* An element's public data may be altered at runtime.

Any information passed to an existing subprocess has to be pickled and
unpickled, so ideally this would need to be as little as possible.

I am not sure how I would go about providing this information, assumingI cantrack down and isolate all the parts of the pipeline that change duringa

build.

The most efficient would probably to have every Job report the exactstatechanges in their completion callbacks, every WorkerSubprocess has aqueue to

read inbound state changes from, and the Scheduler push state changes to
every worker other than the one that sent it.

It would be good to have a prototype of this new model as soon aspossible,serialising, deserialising and updating the state is a lot of workwithin

python, whereas forking does this automatically at a low-level.

# Summary

In summary:
1. I propose creating a WorkerSubprocess that pulls Jobs from a queue
   populated by the Scheduler.

2. Keeping the pipeline state synchronised is a Hard problem that I'mnot

   confident I have an answer to.
3. It is very important to test whether this actually saves us time.


--
Jonathan Maw, Software Engineer, Codethink Ltd.
Codethink privacy policy: https://www.codethink.co.uk/privacy.html

Follow-Ups:
- Re: [BuildStream] Proposal: A small number of subprocesses handling jobs
  - From: Benjamin Schubert
- Re: [BuildStream] Proposal: A small number of subprocesses handling jobs
  - From: Tristan Van Berkom

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]