Re: [BuildStream] Scheduler optimization

From: Jonathan Maw <jonathan maw codethink co uk>
To: buildstream-list gnome org
Subject: Re: [BuildStream] Scheduler optimization
Date: Thu, 28 Mar 2019 16:39:24 +0000

On 2019-03-28 09:30, Benjamin Schubert via buildstream-list wrote:

Hey everyone,

TLDR: I want to refactor the queues and scheduler from their current
model to a push based model.
I would need feeback on the approach to see if I'm not missingsomething.
I would also need to know if there is any objections to my plan since
this would be a
significant refactoring effort.


I'll explain why and how in the rest of the email

---
I've been looking at the scheduler and the queues and from what I cansee insome benchmarks and profiles, they are becoming an importantbottleneck,
especially with a large number of workers.

A few issues have been raised over time about that:
- https://gitlab.com/BuildStream/buildstream/issues/824
- https://gitlab.com/BuildStream/buildstream/issues/943
Currently, the queues go over every element until they find enoughelements tofill up all resources. While this works fine with few workers, sincetheprobability of finding a job that is ready early in the queue is quitehigh,the performance degrades when more workers are added, as theprobability to have
many jobs ready early decreases.
The problem becomes even worse when we are using remote execution withmany
workers.
I therefore think we should be pushing items in the queue as soon asthey are
needed.

This is in details what I think should be done:

1) Don't store a list of all elements in each queue.
- Currently, every queue contains everything, and we need to goover it
      everytime.
2) Remove Queue.status() method, allowing to check if an element isready
    - Currently, this method is called on every element of the queue,
in sequence,
      until the queue has enough elements to use all the resources.

3) When setting an element for tracking, put it in the `track` queue.
4) When calling "set_required" on element, put the element in thefetch_queue.
    - Currently, what happens is that the element is already in the
queue, and when
      "set_required" is called, a boolean is toggled, making the
status of the element
      go from WAITING to READY, and allows the queue to pick it.

5) When an element is put in the build queue, if all its dependencies
are not ready,
add the element to a list in every dependency that is not ready.Whenever anelement becomes done in this queue, go over this list and for everyready
element put it in the queue.
    - Currently, what happens is that the element is already in the
queue, and whenever
      it's status is checked, it will check if all its dependencies
are available before
      saying that it is ready.
All other queues should just get the elements from the previous queuepushed
to them when ready, as they don't have dependencies on other elements.
One concern that was raised in the past was that we would lose theordering
of the build and that would decrease performance.
We could prevent this by using a priority queue and sort elements whenadding
them. We would therefore still keep the stable ordering we have now.
Does this plan seem sensible? Am I missing something? Any feedback iswelcome!
I plan to start working on this next week unless anyone has any
objections to the
plan.

Cheers,
Benjamin


Hi Benjamin,

The proposal looks good to me, though I'm wondering about how you'readding the elements to the BuildQueue.

If I understand you correctly, you intend for every element that's donebeing fetched to be added to the BuildQueue.Currently, the fetch queue has an option to skip all cached elements,which adds the element to the skipped_elements list for the frontend toread. AIUI here you'd have to also directly add the element to theBuildQueue.

Is there a particular reason you prefer this way of doing it todeferring the logic of adding elements to the BuildQueue toElement._schedule_assembly()?


Cheers,

Jonathan

--
Jonathan Maw, Software Engineer, Codethink Ltd.
Codethink privacy policy: https://www.codethink.co.uk/privacy.html

References:
- [BuildStream] Scheduler optimization
  - From: Benjamin Schubert

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]