Re: [BuildStream] Scheduler optimization



On 2019-03-28 09:30, Benjamin Schubert via buildstream-list wrote:
Hey everyone,

TLDR: I want to refactor the queues and scheduler from their current
model to a push based model.
I would need feeback on the approach to see if I'm not missing something.
I would also need to know if there is any objections to my plan since
this would be a
significant refactoring effort.


I'll explain why and how in the rest of the email

---

I've been looking at the scheduler and the queues and from what I can see in some benchmarks and profiles, they are becoming an important bottleneck,
especially with a large number of workers.

A few issues have been raised over time about that:
- https://gitlab.com/BuildStream/buildstream/issues/824
- https://gitlab.com/BuildStream/buildstream/issues/943

Currently, the queues go over every element until they find enough elements to fill up all resources. While this works fine with few workers, since the probability of finding a job that is ready early in the queue is quite high, the performance degrades when more workers are added, as the probability to have
many jobs ready early decreases.

The problem becomes even worse when we are using remote execution with many
workers.

I therefore think we should be pushing items in the queue as soon as they are
needed.

This is in details what I think should be done:

1) Don't store a list of all elements in each queue.
- Currently, every queue contains everything, and we need to go over it
      everytime.

2) Remove Queue.status() method, allowing to check if an element is ready
    - Currently, this method is called on every element of the queue,
in sequence,
      until the queue has enough elements to use all the resources.

3) When setting an element for tracking, put it in the `track` queue.

4) When calling "set_required" on element, put the element in the fetch_queue.
    - Currently, what happens is that the element is already in the
queue, and when
      "set_required" is called, a boolean is toggled, making the
status of the element
      go from WAITING to READY, and allows the queue to pick it.

5) When an element is put in the build queue, if all its dependencies
are not ready,
add the element to a list in every dependency that is not ready. Whenever an element becomes done in this queue, go over this list and for every ready
element put it in the queue.
    - Currently, what happens is that the element is already in the
queue, and whenever
      it's status is checked, it will check if all its dependencies
are available before
      saying that it is ready.


All other queues should just get the elements from the previous queue pushed
to them when ready, as they don't have dependencies on other elements.

One concern that was raised in the past was that we would lose the ordering
of the build and that would decrease performance.

We could prevent this by using a priority queue and sort elements when adding
them. We would therefore still keep the stable ordering we have now.


Does this plan seem sensible? Am I missing something? Any feedback is welcome!
I plan to start working on this next week unless anyone has any
objections to the
plan.

Cheers,
Benjamin

Hi Benjamin,

The proposal looks good to me, though I'm wondering about how you're adding the elements to the BuildQueue.

If I understand you correctly, you intend for every element that's done being fetched to be added to the BuildQueue. Currently, the fetch queue has an option to skip all cached elements, which adds the element to the skipped_elements list for the frontend to read. AIUI here you'd have to also directly add the element to the BuildQueue.

Is there a particular reason you prefer this way of doing it to deferring the logic of adding elements to the BuildQueue to Element._schedule_assembly()?

Cheers,

Jonathan

--
Jonathan Maw, Software Engineer, Codethink Ltd.
Codethink privacy policy: https://www.codethink.co.uk/privacy.html


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]