Re: [BuildStream] Proposal: Moving UI into a subprocess

From: Phil Dawson <phil dawson codethink com>
To: Tristan Van Berkom <tristan vanberkom codethink co uk>, Benjamin Schubert <contact benschubert me>
Cc: buildstream-list gnome org
Subject: Re: [BuildStream] Proposal: Moving UI into a subprocess
Date: Fri, 17 May 2019 17:15:34 +0100

Thanks both for the responses. I've replied inline.

On 17/05/2019 13:59, Benjamin Schubert wrote:

<snip>

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, 17 May 2019 10:31, Tristan Van Berkom via buildstream-list <buildstream-list gnome org> wrote:

<snip>

Let's determine if this is a problem

     If you are raising this, I presume it might already be a problem
     indeed, but please can we have some details ?

     [...]
     > In Daniel's latest batch of profile analysis [1]. He noted that UI
     > rendering was consuming around 50% of the straight line cpu time of the
     > parent `bst` process. As such, it may make sense to delegate the UI
     > rendering to a subprocess to avoid blocking the scheduling of useful
     > work while rendering UI events.


     I think this is a non sequitur, it is not necessarily true that:

      * If the main process is spending half of it's time in logging...
      * It is a problem and we should fix it.

I can add more data here. I'm using BuildStream in docker through bash.exe
on a Windows machine. The IO and screen update it slow, meaning when I run
bst, 40% of the time is spent in writing to the screen. I usually end up
running something like:

`bst command 2>&1 | tee`

in order to get a quicker BuildStream run.

Doing so _does_ increase my build times in a matter that I can see it.
Both problems from Daniel's profiles and mines would be aleviated by moving
the UI in it's own process.

     We need to know that this is in fact a problem before going to such
     extremes as proposed in this mail.


     Spending half of the main processes's work in logging doesn't mean that
     we are bottlenecking on logging, it only means that half of the work
     that is being done is being done in logging.

     Consider that we are currently improving how cache keys get calculated,
     there is a plan in place to push resolved keys onto the reverse
     dependencies such that cache key calculation becomes cumulative, with a
     goal of eliminating the need to ever call Element.dependencies() in
     cache key resolution algorithms entirely, causing the algorithm to be
     nearly linear if possible.

     The result of this will mean that:

      * The remaining logging being done in the main process will be more
        than 50% of the work.

      * The logging will be *less* of a bottleneck than it was before, even
        if it constitutes more than half of the work being done, and even
        without improving logging performance.


     Let's try some less aggressive measures first

When running BuildStream, the main process uses 100% of a processor. There
is therefore a real bottleneck here. The scheduler shouldn't need that much.
I'd be much happier if the scheduler could use 100% of it's own process, having
the UI updated separately.

I also think that when adding Remote Execution in the picture, with many more
processes to orchestrate, we will still have this bottleneck, evne with all
the other optimisations.

Thanks Benjamin for providing this extra detail.

Tristan, is this enough to convince you that having the UI in a separateprocess to the scheduler is worth pursuing (implementation details aside)?

I already replied with some hints on how we can improve logging
performance here[0], I'm certain we can do better without resorting to
splitting up roles and responsibilities into multiple processes.

Thanks for those, I indeed think they should be implemented. I however don't
think this is enough.


+1 here.

First, consider - is it really that important that we update the status
bar and spit out log lines at a sub-second frequency ?

-   In a session that has a lot of ongoing tasks, naturally has a more
     expensive status bar to update (more tasks in the status bar)

-   Similarly in a session that has a lot of ongoing tasks, we get a lot
     of start/stop messages, plus a lot of status messages, often we see
     many messages appear in the same second.

     Compounded together, of course the logging is going to get
     exponentially more expensive with a lot of ongoing tasks.

     Instead of unconditionally updating the UI at every incoming message,
     we can limit the frequency at which we update the terminal whilst
     preserving the integrity and ordering of incoming messages.

Agreed, this doesn't fix anything though, when using BuildStream, my UI is already
not updating more than once a second.

     When the scheduler is already running, we already have a ticker which
     we use to unconditionally update the status bar, that would be a good
     time IMO to:

     -   Render all of the messages received since the last "tick"
     -   Update the status bar only once at the end

         This would have the additional benefit of reducing a lot of "flicker"
         we can sometimes experience when overly updating the UI and processing
         a lot of messages (in these times, we definitely update the status bar
         way more than what is needed).

         One thing to keep in mind in this approach, is we should still handle
         the error messages immediately upon receiving them, at least in the
         case when we know that it will result in pausing all ongoing tasks and
         launching an interactive prompt.

         I think we should definitely improve this area regardless of other more
         extreme measures we might take as it will improve the UI experience and
         reduce overall work anyway, and then we should revisit my point above
         and reevaluate again whether it is still a bottleneck.

         Split up work in multiple processes


     If:
       * We have reduced the work in calculating state, allowing the main
         process to breathe easier
       * Reduced the redundant work we do in logging by logging on a fixed
         timer rather immediately than with every incoming message, further
         allowing the main process to breathe easier
       * It is *still* demonstrably a bottleneck

     Then we should consider splitting up the work into separate processes.

I don't think those are mutually exclusive and do think both are needed.

Again, thanks for the detailed suggestions. I agree that these would beworth doing in addition to having the UI in a separate process.

     However, the way I see it putting logging into a subprocess is just
     about the worst way I can imagine for going about this - running
     `Scheduler.run()` in a subprocess separated from frontend activities
     would make a lot more sense.

Completely agree here

I agree that this does seem like a better abstraction. Thanks forcalling me out here :)

<snip>

     One thing to point out is that no matter how we slice things, we are
     still going to have the issue of synchronizing element state across
     processes, as state mutates over the course of a session, and the
     logger needs to display state.

I would hope that the worker process could eventually give all the necessary
information to the frontend process, that would remove any problem arrising from
this and would remove lots of state synchronisation.

As far as state synchronisation goes, as far as I understand, thefollowing bits of state are needed by the front-end.

1) The element lookup table, this will be populated after load time andwon't change during a scheduler run, so I don't think we'd anysynchronisation here.

2) Element names. Again, these are available after load time and won'tchange during a scheduler run, so I don't think we'd need to communicatethis back to the main process.

3) Element cache keys. I'm not sure if the cache keys as displayed bythe front-end can change during a scheduler run. I feel like theyprobably do, but I'm not sure.


4) The status of the scheduler queues to be displayed on the status bar.

5) Cache usage.

Am I missing anything here?


Thanks again for taking the time to respond.

Cheers,

Phil

--
Phil Dawson, Software Engineer                          Codethink Ltd
https://www.codethink.co.uk/
We respect your privacy. See https://www.codethink.co.uk/privacy.html

Follow-Ups:
- Re: [BuildStream] Proposal: Moving UI into a subprocess
  - From: Tristan Van Berkom

References:
- [BuildStream] Proposal: Moving UI into a subprocess
  - From: Phil Dawson
- Re: [BuildStream] Proposal: Moving UI into a subprocess
  - From: Tristan Van Berkom
- Re: [BuildStream] Proposal: Moving UI into a subprocess
  - From: Benjamin Schubert

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]