Re: [BuildStream] Proposal: Moving UI into a subprocess



Hi Tristan,

I've replied inline.

On 23/05/2019 08:44, Tristan Van Berkom via buildstream-list wrote:
Hi,

So this is just a tentative theory at this point, from what I
understand we have not yet explored how to handle the problems outlined
in "The Interactive Shell" below in your mail.

We should spend some time in discovery, brutally hacking BuildStream in
a side branch to setup a similar scenario, and come up with a solution
to how we intend to handle this, before committing to a design decision
and starting to really implement this.

Agreed. I wanted to be sure that the direction I go exploring in at least seems sane first though.

<snip>

Please do not call this "the Stream running in a subprocess", reading
through your text I think you do understand that in this model, the
Stream would be spawning processes to deal with the entire function
bodies of it's main entry points, but there is a risk this will be lost
in translation.

It is the process where the build graph state is managed and scheduling
is done (and potentially also the loading), call it the "scheduling"
process if you like.
Yes that is what I meant to articulate. Apologies if the nomenclature I used was ambiguous.

# The process model

The front-end would remain in the parent process. Each of the main
`Stream` entry points would be spawned into a separate process. This
subprocess would call `setsid` to ensure that signals are received and
handled only by the front-end process. I'll add some more detail on
this bellow.
Right, the gist of this is that basically:

   * Stream remains the main frontend facing API for doing tasks
   * Callbacks about what is happening continue to be marshalled through
     Stream, as the single point of contact for the frontend
   * The frontend does not have to care about processes at all, it only
     ever talks to Stream in it's own process
   * Stream abstracts the entire core away from frontend, and provides
     the frontend with a simple API for "doing things" and issuing
     callbacks while those things are getting done.

This is with the exception of Context which also serves as the
frontend's entry point for receiving logging message events.
Yes, that nicely summarises it.

# Message handling

The message handler in `Context` will send messages over a queue to the
front-end to handle. Currently, the front-end uses global state to gain
some of the information it renders. After the split, all state needed
by the front-end will be passed as explicit parameters of each `Message`
object (with the exception of any BuildStream configuration, which
will be still loaded and available before the `Stream` process is
spawned and therefore available to the front-end without any changes
needed).
Right, good observation about configuration (this brings to mind that
we still have an ugly situation of artifact caches and external
entities parsing user configuration outside of Context(), which should
really be fixed... but probably doesn't interfere with this process
model refactor).

Also, please do not call this the `Stream` process, this can be
mistranslated into "The frontend forks a process in which the stream
runs", which is not the aim :)
Okay, in the future I'll talk about "the scheduling process" as you described it above :)

# Signal handling

All signals will be caught by the front-end process. The font-end will
be responsible for terminating/suspending/resuming the `Stream` process
as appropriate. Although I'm still a bit hazy on the implementation
details here, I imagine this will work in a very similar way to how we
currently interact with scheduler jobs and will reuse much of the same
code.
It will not be similar, this is what I was trying to explain in our IRC
conversation, at this index:

   https://irclogs.baserock.org/buildstream/%23buildstream.2019-05-20.log.html#t2019-05-20T10:52:22

The nature of the scheduler process is that it runs an event loop,
waking up for very short intervals to process incoming job completion
events, updating state in the build graph, dispatching more jobs, and
goes to sleep as soon as possible.

The nature of job processes is that we run long standing, blocking
workloads, we cannot run an event loop there because we're running a
job.

Essentially this means we can have a much simpler interaction between
the "frontend <--> scheduler" than what we have to do have with the
"scheduler <--> job".

I think I didn't do a good job of explaining myself here. I'll try again.

When I said it will work very similarly to how it does now, I meant that little to no changes would be required in the communication between the scheduling process and any running jobs when a signal is received. Signals would be received by the front-end, which would then be responsible for telling the scheduling process what to do (quit, terminate, suspend, etc)  in response to the signal. The front-end won't know or care about what's happening with jobs and will rely on the stream (and indirectly the scheduler) to do the correct things.

I think this more or less corresponds to what you've said here?

The scheduler process itself does not need to handle signals *at all*
anymore and really shouldn't, it should instead just receive commands
sent to it from the frontend process via it's IPC Queue.

The relationship between the "scheduler <--> job" will hopefully be a
bit simplified, since the scheduler process itself no longer needs to
handle signals, it can run with SIGINT/SIGTSTP/SIGTERM blocked for it's
entire lifetime, and only send SIGTSTP/SIGCONT/SIGTERM/SIGKILL to child
jobs at the appropriate times.
This makes sense.
# Interactive UI prompts

The front-end will provide a service which can called to display
interactive prompts. I think this would work something like:
[... a lot of snip ...]

Please no, I hope nothing of the sort is needed.

There  should be no sync calls, or "questions asked" to the frontend
from the scheduler process.

Perhaps at the very most; the scheduler might stop queuing new jobs for
a moment (depending on the "on-error" configuration) in the case of a
failed build, and await further instructions from the frontend on what
to do next.

In general, the current model need not change much here:

* The frontend receives messages over Context and Stream

   * It may be the frontend is receiving some job start/stop via the
     scheduler, but keep in mind that this was only because we didn't
     finish the work of sealing away the scheduler behind the Stream
     process.

     Start/Stop messages should be callbacks on the Stream
     object, not callbacks invoked via the scheduelr.

* The frontend observes that a job has failed and that it is time to
   debug a failed build or ask the user what to do.

* The frontend, of it's own accord, explicitly informs the scheduler to
   go to sleep, depending on the configuration of what
   to do "on-error".

* Depending on what the user informs the frontend, the frontend will
   then go on to instruct to the scheduler what to do, because the
   frontend is in control at all times.

The distinction here is that the frontend is not "asked what to do",
the frontend barks out orders via the Stream() APIs and the scheduler
process obeys them.
It looks like I was overcomplicating things somewhat here.

Have I understood correctly, that only difference with how we're doing things now would be that the callbacks to the front-end would be only difference here via an IPC queue rather than a plain method call as they are now?

# The Interactive shell

As pointed out by Tristan, calling `setsid` in the `Stream` process means
the `Stream` is not in the process group which owns the terminal. As a
result, it wouldn't be easy (or possible?) to have a shell created in
the `Stream` process take over the terminal.

One approach I think would circumvent this limitation would be for
the front-end to provide an API along the lines of
execute_in_foreground, this would do the specified work in the
front-end process, taking over the terminal while it it running. This
would:
* pause the displaying of messages, allowing any new messages to queue
   up.
* execute the given work in the front-end process
* pass any return value back to the calling process
* Continue printing queued messages
I want to ensure that the correct terminology is used so that people
get the correct picture of who is in charge of what:

  * The frontend does not implement services for the sake of the
    scheduler process.

    Instead: It receives callbacks that things occurred over the
             course of calling one of the main Stream() functions.

  * The scheduler process does not make any presumptions that the
    frontend will react in any specific way to the callbacks it
    issues.

    Instead: The behavior of the scheduler is clearly explained
             to the frontend in the Stream() API contract, and the
             scheduler behaves as advertised.

Bottom line: the frontend is always boss, the scheduler process does
what it is told and doesn't ask questions.
Okay.


Besides this, I think that any "passing terminal control between
processes" ideas need to be proven possible before we can consider this
approach to be plausible.

Another approach would be to simply special case the `Stream.shell`
method, so that it does not run in a subprocess. The shell only
displays loading and staging messages and as so far as I can see will
never be rendering a large number of messages. While I'm not really a
fan of special casing one of the `Stream` methods, this would have the
advantage of reducing the complexity of the implementation.
Right, this is the simple but impossible approach.

I'm not sure If I'm misunderstanding your point or if I didn't adequately articulate mine.

I'm struggling to see why this is impossible. If the scheduling process is never forked, then it the shell will be being launched in the main process and so will work almost exactly as it currently does, ignoring any changes to the details message propagation from the rest of the proposal.



The frontend process cannot shell into the elements because in this
design (so far), there is no need to coordinate state which has been
resolved in the scheduler process back to any build graph in the
frontend process, in fact, Element instances live entirely inside the
scheduling process and never need to be known by the frontend.

This simplicity is one of the reasons that this approach is so
attractive, it is also the reason why the main process cannot shell
into anything, because the frontend doesn't know the cache keys which
have been resolved during the session, and it may not even know what an
Element is (we're missing data to launch a shell).

Maybe:

* We need to keep the initial loading process before the initial fork()

* At the time of an interactive shell (or at any time the frontend may
   ever *need* to know state for any other reason), we can have the
   frontend download the state of an element recursively from the
   the scheduling process.

I think this might be the middle ground which allows us to move
forward.
This also seems like an avenue worth exploring, especially if the above is indeed impossible.

What do people think?

I think we need to resolve this last point before settling on a design,
the opposite design of syncrhonizing state into the frontend process
via state change messaging is not entirely horrible either (but I admit
that this approach is rather growing on me).

I agree that we need to explore this some more before coming to a final decision. As I said above, I wanted to make sure I had a reasonable grasp on what changes to make and why prior to starting any major experimentation. You've been very helpful with that, so thanks!


Cheers,

Phil


--
Phil Dawson, Software Engineer                          Codethink Ltd
https://www.codethink.co.uk/
We respect your privacy. See https://www.codethink.co.uk/privacy.html



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]