Re: Proposal: Run configure-commands on-demand for workspaced elements



Thanks for the feedback.

I agree that storing count of builds would offer us more flexibility and I
think BuildElement would still be able to decide whether or not to run the
configure-commands need to be run based on that information. As you say, build
counts might also be used in other ways by elements if needed. So, I'm
onboard with that idea. Instead of count, we can also simplify it to a
boolean - "has this been built in the same sandbox before". I'm not sure yet
which of these would be better.

Maybe if we communicated to the element "the amount of times it has
been built in the same sandbox" (and only *maybe* is that sandbox a
workspace... we dont really need to tell the Element api about
workspaces), then elements such as BuildElements can use that
information to deduce that setup operations when prepare the state of
their Sources need not be run a second time.

Amount of times it has been built in the same sandbox would work but we have
to be careful about how exactly we phrase it otherwise it can easily be taken
to mean something else. Regarding the name of the option, how do you feel about
something like `num-sandbox-builds` or `has-sandbox-builds` instead of
`needs-configure`?

BUT, I would say that both of these should be discarded for the initial
proposal, this is already possible by closing and reopening the
workspace on an existing directory - it is desirable to keep the change
as small and atomic as possible, and consider other UX improvements
separately.

Agreed, we can land the feature first and improve the UX separately.

For the concern of backwards compatibility, one need only consider the
absence of the attribute in the same way as a build count of 0, simple.

If I understand this change correctly, I'm afraid this might involve slightly
more disruption than that. The reason I say that is because the structure of
`.bst/workspaces.yml` might not be backward-compatible after this proposed
change.

Currently, the structure looks like this:

     path/to/my/element.bst:
       0: /path/to/workspace

As you can notice, each element basically has a dict that maps indices to their
paths which means that to introduce anything new here, this schema will have to
change. I don't think it's easy to extend the current schema to support other
options as it seems to couple elements very tightly with their workspaces. I
was thinking of putting all the sources in a `sources` section, something like:

     path/to/my/element.bst:
       sources:
         0: /path/to/workspace

This would mean that if you have old-style workspaces.yml file, it might not
parse correctly with the new-style syntax. It should still be possible to
support the old-format and convert them to the new style while writing. But
I was not sure if it would be a good idea to keep such code for a large amount
of time as it might become quite complicated very soon if keep on supporting
old formats.

Let me know what you think about this.

Cheers,
Chandan

On Tue, Nov 14, 2017 at 1:42 PM Tristan Van Berkom
<tristan vanberkom codethink co uk> wrote:

Thanks for the write up Chandan !

On Mon, 2017-11-13 at 19:18 +0000, Chandan Singh wrote:
Overview
========

While I was working on adding support for incremental builds [1], we realized
that incremental builds might not actually work for several projects if the
`configure-commands` of the build elements run each time [2]. This is because
running `configure-commands` might re-generate files such as `config.h` with
new timestamps, which will cause everything else to rebuild as well. Thereby,
eliminating any benefit that incremental builds might have provided.

Proposed Solution
=================

One way we can work around this problem is by not running configure-commands
by default for workspaced elements after the first time they are built. If
user wishes to run the configure-commands again, that can be done on-demand.
To add this functionality, we will need to store some state about whether the
configure-commands have been run yet.

Each time user tries to build an element that has any workspaces open,
BuildStream would use the local state to figure out if the configure-commands
have been run in past for this element. If yes, then the configure-commands
are excluded from the build pipeline. Otherwise, configure-commands are run
as usual but at the end of the build, we store the fact that these commands
have been run in local state.

Changes to .bst/workspaces.yml
-------------------------------

Since we already have a file that stores information about currently open
workspaces - `.bst/workspaces.yml` - it would make sense to use the same file
to the state of configure-commands being run. The current structure of the file
looks something like this:

    path/to/my/element.bst:
      0: /path/to/workspace

I was thinking to add a new `needs-configure` option to this file. With that
option, the config file will look something like this:

    path/to/my/element.bst:
      sources:
        0: /path/to/workspace
      needs-configure: false

Anytime a new workspace is opened for an element, we can set needs-configure
option for that element to true and each time a build finishes successfully,
we mark needs-configure to false for that element. If an element does not has
any workspaces open, then needs-configure is always assumed to be true and
configure-commands are run each time.

We have two options about where to store the information about
configure-commands being run - we can either store that information as a
property of the element or as a property of the workspace. In either case,
the structure of this file would change and whenever the new version of
BuildStream (with these changes) rolls out, it will break people's existing
workspaces, if they have any.

I was wondering what should be our strategy about this kind of change? Should
we have a temporary workaround in place that can convert old format of the file
to the new format, should we just print a helpful error message and expect
users to fix it themselves, or is something that's off-limits and we should
think of other ways of storing this information without breaking current
format?

I think maybe a build count might be more abstract and cover the same
effect, i.e. initially opening a workspace has a build count of 0 (who
knows, one day we might want to consider this metadata for other
reasons than configure commands, better to implement some recorded
workspace state that is abstract from it's initial purpose).

For the concern of backwards compatibility, one need only consider the
absence of the attribute in the same way as a build count of 0, simple.


Changes to bst command(s)
-------------------------

Since the configure-commands will not run by default for workspaced elements,
it will make sense to provide some way of allowing users to re-run those
commands. At present, I have two options in mind, both of which have their own
pros and cons:

Option 1 - `bst build --reconfigure`

This option would either throw an error or do nothing (another thing to decide)
for elements that do not have any workspaces open. For elements that do have
a workspace open, it is going to set the needs-configure option to true. Since
the re-running of configure-commands can be treated as part of the build, it
might be more intuitive for users.

Option 2 - `bst workspace reset --soft`

With this approach, the user would have to soft reset the workspace as a
preparatory step, which would set the needs-configure option in the
`.bst/workspaces.yml` and the subsequent build will run the configure-commands
again.

I'm personally leaning towards option 1 because it also has the advantage that
it is one-step process in case someone wants to re-run the configure-commands
but I'm very interested to hear how others feel about this.

I lean towards option 2 because it stays confined to workspace commands
and doesnt creep in too much in the overall api surface.

BUT, I would say that both of these should be discarded for the initial
proposal, this is already possible by closing and reopening the
workspace on an existing directory - it is desirable to keep the change
as small and atomic as possible, and consider other UX improvements
separately.


Finally, the most important part of this proposal is how we communicate
this state to Elements. Remember that BuildElement is not a special
core thing, it's there for convenience, this is the most sensitive part
of the change we need to be considering.

From what I read in the proposal so far:

  "we can set needs-configure option for that element to true and each
   time a build finishes successfully, we mark needs-configure to false
   for that element"

Well, maybe it's this, again "configure commands" is a thing that is
specific to BuildElement, not abstract enough in my opinion.

Maybe if we communicated to the element "the amount of times it has
been built in the same sandbox" (and only *maybe* is that sandbox a
workspace... we dont really need to tell the Element api about
workspaces), then elements such as BuildElements can use that
information to deduce that setup operations when prepare the state of
their Sources need not be run a second time.

In the case of a BuildElement, it would avoid running the configure
commands redundantly.


Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]