Re: [BuildStream] Proposal: Moving UI into a subprocess



Hey all,

TLDR;

I agree with Tristan about keeping the frontend in the frontend.
Having the Scheduler.run() put in a subprocess is a much nicer abstraction.

I agree that this split needs to happen now. It is a bottleneck and impacting
some users in multiple ways.

More details inline


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, 17 May 2019 10:31, Tristan Van Berkom via buildstream-list <buildstream-list gnome org> wrote:

<snip>

Let's determine if this is a problem

    If you are raising this, I presume it might already be a problem
    indeed, but please can we have some details ?

    [...]
    > In Daniel's latest batch of profile analysis [1]. He noted that UI 
    > rendering was consuming around 50% of the straight line cpu time of the 
    > parent `bst` process. As such, it may make sense to delegate the UI 
    > rendering to a subprocess to avoid blocking the scheduling of useful 
    > work while rendering UI events.


    I think this is a non sequitur, it is not necessarily true that:

     * If the main process is spending half of it's time in logging...
     * It is a problem and we should fix it.


I can add more data here. I'm using BuildStream in docker through bash.exe
on a Windows machine. The IO and screen update it slow, meaning when I run
bst, 40% of the time is spent in writing to the screen. I usually end up
running something like:

`bst command 2>&1 | tee`

in order to get a quicker BuildStream run.

Doing so _does_ increase my build times in a matter that I can see it.
Both problems from Daniel's profiles and mines would be aleviated by moving
the UI in it's own process.

    We need to know that this is in fact a problem before going to such
    extremes as proposed in this mail.


    Spending half of the main processes's work in logging doesn't mean that
    we are bottlenecking on logging, it only means that half of the work
    that is being done is being done in logging.

    Consider that we are currently improving how cache keys get calculated,
    there is a plan in place to push resolved keys onto the reverse
    dependencies such that cache key calculation becomes cumulative, with a
    goal of eliminating the need to ever call Element.dependencies() in
    cache key resolution algorithms entirely, causing the algorithm to be
    nearly linear if possible.

    The result of this will mean that:

     * The remaining logging being done in the main process will be more
       than 50% of the work.

     * The logging will be *less* of a bottleneck than it was before, even
       if it constitutes more than half of the work being done, and even
       without improving logging performance.


    Let's try some less aggressive measures first


When running BuildStream, the main process uses 100% of a processor. There
is therefore a real bottleneck here. The scheduler shouldn't need that much.
I'd be much happier if the scheduler could use 100% of it's own process, having
the UI updated separately.

I also think that when adding Remote Execution in the picture, with many more
processes to orchestrate, we will still have this bottleneck, evne with all
the other optimisations.


I already replied with some hints on how we can improve logging
performance here[0], I'm certain we can do better without resorting to
splitting up roles and responsibilities into multiple processes.

Thanks for those, I indeed think they should be implemented. I however don't
think this is enough.

First, consider - is it really that important that we update the status
bar and spit out log lines at a sub-second frequency ?

-   In a session that has a lot of ongoing tasks, naturally has a more
    expensive status bar to update (more tasks in the status bar)

-   Similarly in a session that has a lot of ongoing tasks, we get a lot
    of start/stop messages, plus a lot of status messages, often we see
    many messages appear in the same second.

    Compounded together, of course the logging is going to get
    exponentially more expensive with a lot of ongoing tasks.

    Instead of unconditionally updating the UI at every incoming message,
    we can limit the frequency at which we update the terminal whilst
    preserving the integrity and ordering of incoming messages.

Agreed, this doesn't fix anything though, when using BuildStream, my UI is already
not updating more than once a second.

    When the scheduler is already running, we already have a ticker which
    we use to unconditionally update the status bar, that would be a good
    time IMO to:

    -   Render all of the messages received since the last "tick"
    -   Update the status bar only once at the end

        This would have the additional benefit of reducing a lot of "flicker"
        we can sometimes experience when overly updating the UI and processing
        a lot of messages (in these times, we definitely update the status bar
        way more than what is needed).

        One thing to keep in mind in this approach, is we should still handle
        the error messages immediately upon receiving them, at least in the
        case when we know that it will result in pausing all ongoing tasks and
        launching an interactive prompt.

        I think we should definitely improve this area regardless of other more
        extreme measures we might take as it will improve the UI experience and
        reduce overall work anyway, and then we should revisit my point above
        and reevaluate again whether it is still a bottleneck.

        Split up work in multiple processes


    If:
      * We have reduced the work in calculating state, allowing the main
        process to breathe easier
      * Reduced the redundant work we do in logging by logging on a fixed
        timer rather immediately than with every incoming message, further
        allowing the main process to breathe easier
      * It is *still* demonstrably a bottleneck

    Then we should consider splitting up the work into separate processes.

I don't think those are mutually exclusive and do think both are needed.


    However, the way I see it putting logging into a subprocess is just
    about the worst way I can imagine for going about this - running
    `Scheduler.run()` in a subprocess separated from frontend activities
    would make a lot more sense.

Completely agree here

<snip>

    One thing to point out is that no matter how we slice things, we are
    still going to have the issue of synchronizing element state across
    processes, as state mutates over the course of a session, and the
    logger needs to display state.

I would hope that the worker process could eventually give all the necessary
information to the frontend process, that would remove any problem arrising from
this and would remove lots of state synchronisation.

Thanks,

Benjamin


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]