Re: [BuildStream] Proposal: Moving UI into a subprocess



Hi Phil,

On Thu, 2019-05-23 at 18:16 +0100, Phil Dawson wrote:
Hi Tristan,

I've replied inline.

[...]
# Signal handling

All signals will be caught by the front-end process. The font-end will
be responsible for terminating/suspending/resuming the `Stream` process
as appropriate. Although I'm still a bit hazy on the implementation
details here, I imagine this will work in a very similar way to how we
currently interact with scheduler jobs and will reuse much of the same
code.

It will not be similar, this is what I was trying to explain in our IRC
conversation, at this index:

   https://irclogs.baserock.org/buildstream/%23buildstream.2019-05-20.log.html#t2019-05-20T10:52:22

The nature of the scheduler process is that it runs an event loop,
waking up for very short intervals to process incoming job completion
events, updating state in the build graph, dispatching more jobs, and
goes to sleep as soon as possible.

The nature of job processes is that we run long standing, blocking
workloads, we cannot run an event loop there because we're running a
job.

Essentially this means we can have a much simpler interaction between
the "frontend <--> scheduler" than what we have to do have with the
"scheduler <--> job".

I think I didn't do a good job of explaining myself here. I'll try again.

When I said it will work very similarly to how it does now, I meant that 
little to no changes would be required in the communication between the 
scheduling process and any running jobs when a signal is received. 

Indeed, that's not what I understood but it sounds about right.

[...]
# Interactive UI prompts

The front-end will provide a service which can called to display
interactive prompts. I think this would work something like:

[... a lot of snip ...]

Please no, I hope nothing of the sort is needed.

There  should be no sync calls, or "questions asked" to the frontend
from the scheduler process.

Perhaps at the very most; the scheduler might stop queuing new jobs for
a moment (depending on the "on-error" configuration) in the case of a
failed build, and await further instructions from the frontend on what
to do next.

In general, the current model need not change much here:

* The frontend receives messages over Context and Stream

   * It may be the frontend is receiving some job start/stop via the
     scheduler, but keep in mind that this was only because we didn't
     finish the work of sealing away the scheduler behind the Stream
     process.

     Start/Stop messages should be callbacks on the Stream
     object, not callbacks invoked via the scheduelr.

* The frontend observes that a job has failed and that it is time to
   debug a failed build or ask the user what to do.

* The frontend, of it's own accord, explicitly informs the scheduler to
   go to sleep, depending on the configuration of what
   to do "on-error".

* Depending on what the user informs the frontend, the frontend will
   then go on to instruct to the scheduler what to do, because the
   frontend is in control at all times.

The distinction here is that the frontend is not "asked what to do",
the frontend barks out orders via the Stream() APIs and the scheduler
process obeys them.

It looks like I was overcomplicating things somewhat here.

Have I understood correctly, that only difference with how we're doing 
things now would be that the callbacks to the front-end would be only 
difference here via an IPC queue rather than a plain method call as they 
are now?

Approximately yes.

Just to iron out some more specifics about this:

 * The frontend would finally no longer know about the scheduler at
   all, which has been desirable for a long time (the last part of the
   deep core which the frontend unfortunately had knowledge of).

 * The frontend would receive some of the callbacks from stream instead
   of from the scheduler directly

   Note that this part interestingly automatically buys us the Jobs
   abstraction discussed in this email which I already referred to in
   this thread:

   https://mail.gnome.org/archives/buildstream-list/2019-April/msg00063.html

   I.e. the Stream will still need to declare a data structure to hand
   over to the frontend in order to describe an ongoing "job", and the
   frontend app (logger and status bar) will no longer ever know
   anything about a job or scheduler.

   This would automatically be a nice step forward for improving
   logging consistency as described in that thread.

 * The frontend itself (as in the code living in the _frontend/
   directory), doesn't ever deal with the IPC, that is all abstracted
   away by Stream which provides convenient callbacks for the frontend.


In addition to the above, there is the important detail that you
pointed out about "what does the scheduling process do when a job
fails".

Interestingly, in a setup where the frontend and scheduling processes
are separate, there is no implied blocking here.

As I mentioned in the previous reply, probably the scheduler has to
clearly advertise it's behavior in this case to the frontend, maybe it
stops queuing more jobs until instructed what to do by the frontend.

This is because we would want the frontend to have a chance to ask the
user what to do before spawning more jobs.

[...]
Besides this, I think that any "passing terminal control between
processes" ideas need to be proven possible before we can consider this
approach to be plausible.

Another approach would be to simply special case the `Stream.shell`
method, so that it does not run in a subprocess. The shell only
displays loading and staging messages and as so far as I can see will
never be rendering a large number of messages. While I'm not really a
fan of special casing one of the `Stream` methods, this would have the
advantage of reducing the complexity of the implementation.

Right, this is the simple but impossible approach.

I'm not sure If I'm misunderstanding your point or if I didn't 
adequately articulate mine.

I'm struggling to see why this is impossible. If the scheduling process 
is never forked, then it the shell will be being launched in the main 
process and so will work almost exactly as it currently does, ignoring 
any changes to the details message propagation from the rest of the 
proposal.

Ok so I may have been overestimating the challenges here, there are a
few approaches which could work now that I think further on it.

First, keep in mind that we are not talking about `bst shell` here, we
are talking about the scenario where:

  * A build fails
  * The user is asked what to do
  * The user decides to launch a shell to debug the failed build

In this case, the frontend naturally needs to know the up to date build
graph in order to launch that build shell.

However, in this case it may be possible to obtain only the minimal
information needed in order to launch a shell, which consists mostly
of:

  A) The resolved environment variables which are active for the
     failed element

  B) The elements resolved cache key

  C) A directory where the element is already staged

The thing about (C), as I understand it this doesn't exist anymore.

We refactored this now that we cache failed builds, so instead we
reconstruct that environment on demand instead of littering the
filesystem with failed build directories.

That said, maybe we could have something special in place for this,
like maybe we could ask the scheduling process to setup the build
directory for us, and then launch the shell in the frontend ?

[...]
Maybe:

* We need to keep the initial loading process before the initial fork()

* At the time of an interactive shell (or at any time the frontend may
   ever *need* to know state for any other reason), we can have the
   frontend download the state of an element recursively from the
   the scheduling process.

I think this might be the middle ground which allows us to move
forward.

This also seems like an avenue worth exploring, especially if the above 
is indeed impossible.

Right, this is another way we could satisfy the frontend's need to
launch a shell in mid-build, seems like either downloading all state,
or getting the scheduling process to do the work of preparing a staged
build sandbox on the frontend's behalf before shelling into it (as
described above), are plausible avenues for this.

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]