Re: [BuildStream] Proposal: Moving UI into a subprocess

From: Tristan Van Berkom <tristan vanberkom codethink co uk>
To: Phil Dawson <phil dawson codethink com>
Cc: buildstream-list gnome org
Subject: Re: [BuildStream] Proposal: Moving UI into a subprocess
Date: Thu, 23 May 2019 16:44:42 +0900

Hi,

So this is just a tentative theory at this point, from what I
understand we have not yet explored how to handle the problems outlined
in "The Interactive Shell" below in your mail.

We should spend some time in discovery, brutally hacking BuildStream in
a side branch to setup a similar scenario, and come up with a solution
to how we intend to handle this, before committing to a design decision
and starting to really implement this.


On Wed, 2019-05-22 at 14:48 +0100, Phil Dawson wrote:

# Tldr: 

Based on discussion that happened on IRC, there seems to be a
vague agreement towards the approach (originally suggested by Tristan
in [0]) of having the UI run in the main `bst` process, with `Stream`
running in a subprocess, assuming we can make signal handling and the
interactive shell both work together nicely. 

I've attempted to articulate in a bit more detail what I think this
approach will look like, and a couple of possible approaches to handle
the interactive shell.


Please do not call this "the Stream running in a subprocess", reading
through your text I think you do understand that in this model, the
Stream would be spawning processes to deal with the entire function
bodies of it's main entry points, but there is a risk this will be lost
in translation.

It is the process where the build graph state is managed and scheduling
is done (and potentially also the loading), call it the "scheduling"
process if you like.

# The process model

The front-end would remain in the parent process. Each of the main
`Stream` entry points would be spawned into a separate process. This
subprocess would call `setsid` to ensure that signals are received and
handled only by the front-end process. I'll add some more detail on
this bellow.


Right, the gist of this is that basically:

  * Stream remains the main frontend facing API for doing tasks
  * Callbacks about what is happening continue to be marshalled through
    Stream, as the single point of contact for the frontend
  * The frontend does not have to care about processes at all, it only
    ever talks to Stream in it's own process
  * Stream abstracts the entire core away from frontend, and provides
    the frontend with a simple API for "doing things" and issuing
    callbacks while those things are getting done.

This is with the exception of Context which also serves as the
frontend's entry point for receiving logging message events.

# Message handling

The message handler in `Context` will send messages over a queue to the
front-end to handle. Currently, the front-end uses global state to gain
some of the information it renders. After the split, all state needed
by the front-end will be passed as explicit parameters of each `Message`
object (with the exception of any BuildStream configuration, which
will be still loaded and available before the `Stream` process is
spawned and therefore available to the front-end without any changes
needed).


Right, good observation about configuration (this brings to mind that
we still have an ugly situation of artifact caches and external
entities parsing user configuration outside of Context(), which should
really be fixed... but probably doesn't interfere with this process
model refactor).

Also, please do not call this the `Stream` process, this can be
mistranslated into "The frontend forks a process in which the stream
runs", which is not the aim :)

# Signal handling

All signals will be caught by the front-end process. The font-end will
be responsible for terminating/suspending/resuming the `Stream` process
as appropriate. Although I'm still a bit hazy on the implementation
details here, I imagine this will work in a very similar way to how we
currently interact with scheduler jobs and will reuse much of the same
code.


It will not be similar, this is what I was trying to explain in our IRC
conversation, at this index:

  https://irclogs.baserock.org/buildstream/%23buildstream.2019-05-20.log.html#t2019-05-20T10:52:22

The nature of the scheduler process is that it runs an event loop,
waking up for very short intervals to process incoming job completion
events, updating state in the build graph, dispatching more jobs, and
goes to sleep as soon as possible.

The nature of job processes is that we run long standing, blocking
workloads, we cannot run an event loop there because we're running a
job.

Essentially this means we can have a much simpler interaction between
the "frontend <--> scheduler" than what we have to do have with the
"scheduler <--> job".

The scheduler process itself does not need to handle signals *at all*
anymore and really shouldn't, it should instead just receive commands
sent to it from the frontend process via it's IPC Queue.

The relationship between the "scheduler <--> job" will hopefully be a
bit simplified, since the scheduler process itself no longer needs to
handle signals, it can run with SIGINT/SIGTSTP/SIGTERM blocked for it's
entire lifetime, and only send SIGTSTP/SIGCONT/SIGTERM/SIGKILL to child
jobs at the appropriate times.

# Interactive UI prompts

The front-end will provide a service which can called to display
interactive prompts. I think this would work something like:

[... a lot of snip ...]

Please no, I hope nothing of the sort is needed.

There should be no sync calls, or "questions asked" to the frontend
from the scheduler process.

Perhaps at the very most; the scheduler might stop queuing new jobs for
a moment (depending on the "on-error" configuration) in the case of a
failed build, and await further instructions from the frontend on what
to do next.

In general, the current model need not change much here:

* The frontend receives messages over Context and Stream

  * It may be the frontend is receiving some job start/stop via the
    scheduler, but keep in mind that this was only because we didn't
    finish the work of sealing away the scheduler behind the Stream
    process.

    Start/Stop messages should be callbacks on the Stream
    object, not callbacks invoked via the scheduelr.

* The frontend observes that a job has failed and that it is time to
  debug a failed build or ask the user what to do.

* The frontend, of it's own accord, explicitly informs the scheduler to
  go to sleep, depending on the configuration of what
  to do "on-error".

* Depending on what the user informs the frontend, the frontend will
  then go on to instruct to the scheduler what to do, because the
  frontend is in control at all times.

The distinction here is that the frontend is not "asked what to do",
the frontend barks out orders via the Stream() APIs and the scheduler
process obeys them.

# The Interactive shell

As pointed out by Tristan, calling `setsid` in the `Stream` process means 
the `Stream` is not in the process group which owns the terminal. As a
result, it wouldn't be easy (or possible?) to have a shell created in
the `Stream` process take over the terminal.

One approach I think would circumvent this limitation would be for
the front-end to provide an API along the lines of
execute_in_foreground, this would do the specified work in the
front-end process, taking over the terminal while it it running. This
would:
* pause the displaying of messages, allowing any new messages to queue
  up.
* execute the given work in the front-end process
* pass any return value back to the calling process
* Continue printing queued messages


I want to ensure that the correct terminology is used so that people
get the correct picture of who is in charge of what:

 * The frontend does not implement services for the sake of the
   scheduler process.

   Instead: It receives callbacks that things occurred over the
            course of calling one of the main Stream() functions.

 * The scheduler process does not make any presumptions that the
   frontend will react in any specific way to the callbacks it
   issues.

   Instead: The behavior of the scheduler is clearly explained
            to the frontend in the Stream() API contract, and the
            scheduler behaves as advertised.

Bottom line: the frontend is always boss, the scheduler process does
what it is told and doesn't ask questions.


Besides this, I think that any "passing terminal control between
processes" ideas need to be proven possible before we can consider this
approach to be plausible.

Another approach would be to simply special case the `Stream.shell`
method, so that it does not run in a subprocess. The shell only
displays loading and staging messages and as so far as I can see will
never be rendering a large number of messages. While I'm not really a
fan of special casing one of the `Stream` methods, this would have the
advantage of reducing the complexity of the implementation.


Right, this is the simple but impossible approach.

The frontend process cannot shell into the elements because in this
design (so far), there is no need to coordinate state which has been
resolved in the scheduler process back to any build graph in the
frontend process, in fact, Element instances live entirely inside the
scheduling process and never need to be known by the frontend.

This simplicity is one of the reasons that this approach is so
attractive, it is also the reason why the main process cannot shell
into anything, because the frontend doesn't know the cache keys which
have been resolved during the session, and it may not even know what an
Element is (we're missing data to launch a shell).

Maybe:

* We need to keep the initial loading process before the initial fork()

* At the time of an interactive shell (or at any time the frontend may
  ever *need* to know state for any other reason), we can have the
  frontend download the state of an element recursively from the
  the scheduling process.

I think this might be the middle ground which allows us to move
forward.

What do people think?


I think we need to resolve this last point before settling on a design,
the opposite design of syncrhonizing state into the frontend process
via state change messaging is not entirely horrible either (but I admit
that this approach is rather growing on me).

Cheers,
    -Tristan

Follow-Ups:
- Re: [BuildStream] Proposal: Moving UI into a subprocess
  - From: Phil Dawson

References:
- [BuildStream] Proposal: Moving UI into a subprocess
  - From: Phil Dawson
- Re: [BuildStream] Proposal: Moving UI into a subprocess
  - From: Tristan Van Berkom
- Re: [BuildStream] Proposal: Moving UI into a subprocess
  - From: Benjamin Schubert
- Re: [BuildStream] Proposal: Moving UI into a subprocess
  - From: Phil Dawson
- Re: [BuildStream] Proposal: Moving UI into a subprocess
  - From: Tristan Van Berkom
- Re: [BuildStream] Proposal: Moving UI into a subprocess
  - From: Phil Dawson

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]