Re: [BuildStream] Allowing duplicate junctions [Was: Be explicit when overriding junction configuration, or else warn/error]

From: Tristan Van Berkom <tristan vanberkom codethink co uk>
To: Jürg Billeter <j bitron ch>, Chandan Singh <chandan chandansingh net>
Cc: buildstream-list gnome org
Subject: Re: [BuildStream] Allowing duplicate junctions [Was: Be explicit when overriding junction configuration, or else warn/error]
Date: Tue, 12 May 2020 20:39:10 +0900

Hi Jürg,

While replying, I've noticed some commonality in my replies to your
various ideas, so I'll prefix with that.

Essentially, I think that the collection of use cases was a good
exercise in order to determine what things might occur in
pipelines, but I would really prefer to not design specifically for
these use cases we've identified, I think doing so runs the risk of
dictating too strongly what can and should be done with BuildStream.

More inline...

On Tue, 2020-05-12 at 11:47 +0200, Jürg Billeter wrote:

Hi Tristan,

On Fri, 2020-05-08 at 16:50 +0900, Tristan Van Berkom wrote:

[...]

    Cross architecture bootstrapping
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    When bootstrapping a runtime for a different architecture, it can
    be interesting to use the same toolchain project configured
    multiple times with different project options defining which host
    and target architectures to build libc/gcc under.

    When combining this ability with remote execution, we can
    streamline the process of bootstrapping a system under any
    architecture which we have runners for on the RE cluster.


A possible solution for this use case is to extend the key used for
conflict detection a bit. Instead of only using the project name as
key, we could include the values of (selected) project options as well.
E.g., the target architecture option would be sensible to include, in
my opinion.


I think for one thing, we don't know what a target architecture option
is; we only know that an option is an "architecture" option, but this
could be translated into a host/sandbox requirement, or used in
compiler build instructions to define a target architecture, it it
could be used for something completely orthogonal to the host
requirement or the architecture on which compiled code is expected to
run: I think we don't have the right to know.

Aside from that, I think it's going to be undesirable to stray into the
territory of comparing junction/project configurations, there is no
reason why I should be allowed to configure the same project with
different project options, but not with different source configurations
(different versions).

Note that we only currently distinguish:

  * This project was instantiated once (possibly the same instance was
    used multiple times by way of "overrides" or by way of the junction
    "target" feature).

  * This project was instantiated multiple times

We don't recognize equality of projects which are explicitly
instantiated multiple times, they might accidentally produce exactly
the same cache keys, but it is no less of an error if they do.

It would be good to preserve this simplicity I think.

    Auxiliary projects which provide static build-only dependencies
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    When one project depends on another project for some static data
    which will be consumed as build-only dependencies, the data
    from the junctioned project is consumed statically as is, and there
    is no concern of runtime dependencies being propagated forward to
    reverse dependency projects which might consume the same junction.


This also includes statically linked libraries where no runtime data is
required (or runtime data is in a private prefix).

"Isolated junctions" seems like a sensible solution for this use case.


Right, I specifically like this because it abstracts away some of the
problem from any reverse dependency projects, and it is not tied to any
specific use case, it only states:

  "I use this subproject internally to produce data which I consume
   verbatim, and it is an error if reverse dependency projects end up
   with runtime dependencies from this internal subproject"

    Separation of tooling and data
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    [...]


I think the ideal solution here would be for BuildStream core to know
when multiple dependency trees are configured to be staged in separate
prefixes/sysroots. BuildStream could then report a conflict if and only
if elements from different versions of the same project are used in the
same prefix/sysroot.

This would probably not be a simple change, though, so we don't want to
block the MR on this but maybe pick this up in the future. A question
is also whether we can perform such dependency-level checks early on
without significant performance penalty.


This is an interesting approach, but it does significantly increase the
cognitive effort involved in understanding the dependency and pipeline
structure - i.e. dependencies would effectively be painted with
"purpose", causing the entire graph to be more complex to visualize.

Moreover, the approach has a limiting aspect; which is to say that:

  "Dependency graphs are staged into dedicated sysroots"

While this is mostly true with the projects we've seen so far, I'm not
sure I'm comfortable forcing this limitation on user's data sets (a
user might stage a bunch of different isolated artifacts into different
locations and do whatever they like with them using their own custom
plugins, I think this freedom has more value than stricter guidelines
would provide).

Deferring the check to assemble/staging time would likely be a lot
simpler and less of a performance risk, however, erroring out that late
is not ideal.

The dependency-based approach might work for the static build-only
dependency case as well, even without the need to explicitly mark
junctions as isolated.


Look more and more at this, I wonder if we could find a way for users
to define their own assertions at the project level, at a (buy in) cost
of load time overhead.

Granted that only the project authors have a legitimate claim over the
meaning of their data and how it is intended to be used, maybe there is
a way to provide them with tooling to write their own assertions.

This is probably not 2.0 material, but something to consider, maybe.

[...]

  Looking at this email so far, I'm tempted to think that we might have
  both of these approaches (declaring a junction as 'isolated' can
  allow hiding a local junction and be more convenient, but failing
  this we can still whitelist junctions in reverse dependencies).


I suspect we may need more time to get to the best solution for each
use case. We don't want to block !1901 on that, though. Could it make
sense to change the error to a fatal warning to unblock !1901,
deferring more advanced configuration/checks?

I.e., allow projects to make the conflict warning/error non-fatal in
project.conf until we've implemented a better solution for each use
case? To be clear, configuring the conflict warning as non-fatal would
cause the relevant junctions/subprojects to be independent (no
automatic coalescing, unlike master). And if there are real file
conflicts, projects would still get overlap warnings/errors.


I am liking this approach for the short term, let's do a small thought
experiment to see how it would look like.


Would be diamond example
~~~~~~~~~~~~~~~~~~~~~~~~

          a
         / \
        b   c
       /     \
     d(1)    d(2)

In the simple "would be diamond" example, the warning must occur in
project (a), as such it is project (a) which consequently decides
whether it is a fatal error for (d) to exist twice.

Project (a) subsequently has the choice to live with this error, or
to cause it to be a diamond by using override features (and possibly
also the "target" feature).

Here I can already see a flaw, though - without project (a) having the
right to specifically whitelist project (d) for duplication in those
specific two junction points, it must make the same fatal/non-fatal
decision for all subprojects, period.


My further examples were going to highlight what happens when you add
projects which depend on (a), and then also indirectly depend on (d),
and try working out which project makes which decision about the error
being fatal (i.e. "in which project does the fatal error occur ?"), but
I think that the inability to whitelist is already a show stopper.


I realize this long email is not conclusive in any way, perhaps
bringing up more questions than answers, but I think we're making
important progress.

Cheers,
    -Tristan

Follow-Ups:
- Re: [BuildStream] Allowing duplicate junctions [Was: Be explicit when overriding junction configuration, or else warn/error]
  - From: Tristan Van Berkom

References:
- [BuildStream] Allowing duplicate junctions [Was: Be explicit when overriding junction configuration, or else warn/error]
  - From: Tristan Van Berkom
- Re: [BuildStream] Allowing duplicate junctions [Was: Be explicit when overriding junction configuration, or else warn/error]
  - From: Jürg Billeter

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]