Re: [BuildStream] Proposal: A small number of subprocesses handling jobs

From: Benjamin Schubert <ben c schubert gmail com>
To: Jonathan Maw <jonathan maw codethink co uk>
Cc: buildstream-list gnome org
Subject: Re: [BuildStream] Proposal: A small number of subprocesses handling jobs
Date: Tue, 26 Feb 2019 13:42:13 +0000

Hey!

I like the high level idea (less processes). I have some comments on the method though, please see inline!

<snip>

## Jobs

### Now

<snip>

### What I'd change

I would separate the Job class into a Job that contains the actual work
logic,
and a WorkerSubprocess that handles the subprocess handling and
messaging.

Beyond this, I'm less certain. Currently there is no way to send a
message
to a Job, it is created with all the information it needs.

My thought right now is for the Scheduler to create a
Multiprocessing.Queue
that every WorkerSubprocess is subscribed to, where the Scheduler puts
Jobs
into the queue, and the WorkerSubprocesses pop jobs from the queue when
they're ready.

It seems this would reinvent a multiprocessing.Pool from Python. Would there be a problem with using one (or multiple) pools?

## Resources

### Now

`buildstream/_scheduler/resources.py` contains the Resources object.
This keeps
track of how many of a resource can be allocated, and which ones are
currently
allocated.
`buildstream/_scheduler/queues/queue.py` is responsible for reserving
resources and dispatching jobs.

Resources currently supports four kinds of resource:
* CACHE, i.e. whether a job needs to access the artifact cache.
* DOWNLOAD, i.e. whether a job needs to download something.
* PROCESS, i.e. whether a job is processor-intensive.
* UPLOAD, i.e. whether a job needs to upload something.

### What I'd change

I would add a new resource type, SUBPROCESS, which all jobs need to
claim.
This is a little bit silly, as every kind of job needs a subprocess, but
it
makes use of a common code path.

Most of those resources are used in order to ensure we are not spawning too many threads. I think we can simplify the model:

1) have a pool with ${DOWNLOAD} processes, which would handle pull and fetch jobs

2) have a pool with ${UPLOAD} processes which would handle push jobs

3) have a pool with ${PROCESS} processes which would handle build jobs

This would mean we would end up with only a CACHE resource.

The Cache resource is special, since many different jobs are using it, compared to the previous where specialized jobs are using them.

This is how I understand how it works, please let me know if I'm somehow mistaken.

This resource is there in order to prevent any jobs (pull/build) from accessing the cache while a cleanup operation is taking place.

The way the scheduler uses it, is by looking after each job if a cleanup operation should be scheduled. If so, it will prevent every other job to be scheduled until the cache cleanup can run (by registering an exclusive interest on this resource).

That means that every already scheduled job will finish before the cache cleanup will take place.

There are two ways I see, we could handle this.

The easiest from our point of view (but the hardest from the cache's cleaner) would be to have the cache cleaning able to work without an exclusive lock. This would also give us the best speed.

Another way of handling this without having to require a "resource", would be having jobs that require write access to the cache to lock/unlock a shared semaphore, and the cache-cleaner job acquiring all of them before running, which would block all other jobs. (Might require an even before the semaphore depending on the underlying implementation, to prevent starvation of the cache-cleaner).

This has the disadvantage that it would potentially block fetch() jobs. However, casd would remove this problem, so it could be a temporary solution that is acceptable.

We would that way get rid of the resources altogether.

# The big problem

The big problem with moving to this kind of model is that we need to
synchronise the state of the pipeline to the worker subprocesses.

Previously, that happened automatically as the creation of a new
subprocess
creates a copy of the process' current memory, and so the state was
synchronised to the start of the job, and a job did not need to
resynchronise
during its lifetime.

This is a problem because there are parts of the pipeline which do not
remain
static. I don't have a complete list of all the ways a pipeline changes,
but I
know of:
* An element's ref is not known until its sources have been tracked.
* An element's cache key is not known until its ref is known, and the
cache
keys of all its dependencies.
* An element's public data may be altered at runtime.

Any information passed to an existing subprocess has to be pickled and
unpickled, so ideally this would need to be as little as possible.

I am not sure how I would go about providing this information, assuming
I can
track down and isolate all the parts of the pipeline that change during
a
build.
The most efficient would probably to have every Job report the exact
state
changes in their completion callbacks, every WorkerSubprocess has a
queue to
read inbound state changes from, and the Scheduler push state changes to
every worker other than the one that sent it.

Most jobs would need acccess to the their element, which we could send directly. That would have all of its dependencies up to date (we might need to cleanup internal state though). It might be that in the end it is a non-problem. My best bet here is benchmarking.

It would be good to have a prototype of this new model as soon as
possible,
serialising, deserialising and updating the state is a lot of work
within
python, whereas forking does this automatically at a low-level.

# Summary

In summary:
1. I propose creating a WorkerSubprocess that pulls Jobs from a queue
populated by the Scheduler.
2. Keeping the pipeline state synchronised is a Hard problem that I'm
not
confident I have an answer to.
3. It is very important to test whether this actually saves us time.

1. I would rather try to stick to a multiprocessing.Pool as much as possible.

2. agreed. However, we know the public interface needed for elements in jobs. We could have a special pickler omitting everything that starts with "_". This would have the neat advantage of forcing users to respect the public API.

3. Absolutely

Thanks for the thorough write up!

Cheers,

Ben

--
Jonathan Maw, Software Engineer, Codethink Ltd.
Codethink privacy policy: https://www.codethink.co.uk/privacy.html
_______________________________________________
BuildStream-list mailing list
BuildStream-list gnome org
https://mail.gnome.org/mailman/listinfo/buildstream-list

References:
- [BuildStream] Proposal: A small number of subprocesses handling jobs
  - From: Jonathan Maw

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]