Re: [BuildStream] Allowing duplicate junctions [Was: Be explicit when overriding junction configuration, or else warn/error]



Hi all,

So today I had another idea for an approach to solve this, and briefly
brought it up with Jürg on IRC[0], the idea is growing on me and I
wanted to share it with the list.

To put it simply: the idea is to allow renaming junctioned projects in
the context of your project, thus avoiding the name collision when
multiple instances of the same project appear in the same pipeline,
this renaming would also be inherited by reverse dependencies.


Format:
~~~~~~~
The way I would propose implementing this would be to add a new
junction configuration for project name aliasing, like so:


  # Example of a junction which does some aliasing
  #
  kind: junction

  sources:
  - kind: git
    url: example.com/foo-project.git

  config:

    # Rename some projects
    project-names:

      # This is a junction to the foo-project, let's rename it because
      # it's only used internally and we don't need to have reverse
      # dependency projects getting errors if they also use foo-project
      # in other ways.
      #
      foo-project: myproject-internal-foo-project

      # We've had an error because of a deeply nested bar-project which
      # two separate subprojects depend on separately.
      #
      # In this case, instead of using overrides to ensure they depend
      # on the same project, we've decided to rename the "bar-project"
      # in this junction so that the same project is loaded twice.
      #
      bar-project: second-bar-project


Advantages:
~~~~~~~~~~~
Some of these listed advantages could of course also be said of some of
the other discussed approaches.

  * It is a simple approach

    Easy enough to implement and would not come with any performance
    impacts, as some approaches which require validation might have.

  * It makes no presumptions about the user's intent

    I think this in itself makes the approach valuable, as this
    won't inadvertently impose limitations on how multiple instances
    of the same project can be used.

  * It gets the basics of the job done.

    This ensures the default behavior of bailing out early with an
    error whenever the same project is used twice, providing a course
    of action to resolve conflicts as they arise (using overrides to
    ensure the same project is shared, or using renames to allow the
    duplication).

  * It also provides the ability of hiding a subproject from reverse
    dependencies.

    I.e. this also covers the use case previously described as:

      "Auxiliary projects which provide static build-only dependencies"

  * Possibility of using the names in the UI.

    Instead of displaying strings such as:

        foo-junction.bst:bar-junction.bst:baz-junction.bst:element.bst

    For deeply nested projects, we could instead display:

        foo-project:element.bst


Caveats:
~~~~~~~~

  * This approach does not add any additional safeguards asides from
    ensuring the user is aware of coexisting projects.

    What is in one way an advantage can also be seen as a caveat in
    this case, because we wouldn't be doing any additional checks on
    whether dependencies cross project boundaries.

  * We give project authors the responsibility of naming subprojects
    in reasonable ways.

    For instance, if rational namespacing is not used by project
    authors, we might end up with unexpected name clashes.


Any thoughts on this idea ?

Cheers,
    -Tristan


[0]: https://irclogs.baserock.org/buildstream/%23buildstream.2020-05-23.log.html#t2020-05-23T07:47:57


On Tue, 2020-05-12 at 20:39 +0900, Tristan Van Berkom wrote:
Hi Jürg,

While replying, I've noticed some commonality in my replies to your
various ideas, so I'll prefix with that.

Essentially, I think that the collection of use cases was a good
exercise in order to determine what things might occur in
pipelines, but I would really prefer to not design specifically for
these use cases we've identified, I think doing so runs the risk of
dictating too strongly what can and should be done with BuildStream.

More inline...

On Tue, 2020-05-12 at 11:47 +0200, Jürg Billeter wrote:
Hi Tristan,

On Fri, 2020-05-08 at 16:50 +0900, Tristan Van Berkom wrote:
[...]

    Cross architecture bootstrapping
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    When bootstrapping a runtime for a different architecture, it can
    be interesting to use the same toolchain project configured
    multiple times with different project options defining which host
    and target architectures to build libc/gcc under.

    When combining this ability with remote execution, we can
    streamline the process of bootstrapping a system under any
    architecture which we have runners for on the RE cluster.

A possible solution for this use case is to extend the key used for
conflict detection a bit. Instead of only using the project name as
key, we could include the values of (selected) project options as well.
E.g., the target architecture option would be sensible to include, in
my opinion.

I think for one thing, we don't know what a target architecture option
is; we only know that an option is an "architecture" option, but this
could be translated into a host/sandbox requirement, or used in
compiler build instructions to define a target architecture, it it
could be used for something completely orthogonal to the host
requirement or the architecture on which compiled code is expected to
run: I think we don't have the right to know.

Aside from that, I think it's going to be undesirable to stray into the
territory of comparing junction/project configurations, there is no
reason why I should be allowed to configure the same project with
different project options, but not with different source configurations
(different versions).

Note that we only currently distinguish:

  * This project was instantiated once (possibly the same instance was
    used multiple times by way of "overrides" or by way of the junction
    "target" feature).

  * This project was instantiated multiple times

We don't recognize equality of projects which are explicitly
instantiated multiple times, they might accidentally produce exactly
the same cache keys, but it is no less of an error if they do.

It would be good to preserve this simplicity I think.

    Auxiliary projects which provide static build-only dependencies
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    When one project depends on another project for some static data
    which will be consumed as build-only dependencies, the data
    from the junctioned project is consumed statically as is, and there
    is no concern of runtime dependencies being propagated forward to
    reverse dependency projects which might consume the same junction.

This also includes statically linked libraries where no runtime data is
required (or runtime data is in a private prefix).

"Isolated junctions" seems like a sensible solution for this use case.

Right, I specifically like this because it abstracts away some of the
problem from any reverse dependency projects, and it is not tied to any
specific use case, it only states:

  "I use this subproject internally to produce data which I consume
   verbatim, and it is an error if reverse dependency projects end up
   with runtime dependencies from this internal subproject"


    Separation of tooling and data
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    [...]

I think the ideal solution here would be for BuildStream core to know
when multiple dependency trees are configured to be staged in separate
prefixes/sysroots. BuildStream could then report a conflict if and only
if elements from different versions of the same project are used in the
same prefix/sysroot.

This would probably not be a simple change, though, so we don't want to
block the MR on this but maybe pick this up in the future. A question
is also whether we can perform such dependency-level checks early on
without significant performance penalty.

This is an interesting approach, but it does significantly increase the
cognitive effort involved in understanding the dependency and pipeline
structure - i.e. dependencies would effectively be painted with
"purpose", causing the entire graph to be more complex to visualize.

Moreover, the approach has a limiting aspect; which is to say that:

  "Dependency graphs are staged into dedicated sysroots"

While this is mostly true with the projects we've seen so far, I'm not
sure I'm comfortable forcing this limitation on user's data sets (a
user might stage a bunch of different isolated artifacts into different
locations and do whatever they like with them using their own custom
plugins, I think this freedom has more value than stricter guidelines
would provide).

Deferring the check to assemble/staging time would likely be a lot
simpler and less of a performance risk, however, erroring out that late
is not ideal.

The dependency-based approach might work for the static build-only
dependency case as well, even without the need to explicitly mark
junctions as isolated.

Look more and more at this, I wonder if we could find a way for users
to define their own assertions at the project level, at a (buy in) cost
of load time overhead.

Granted that only the project authors have a legitimate claim over the
meaning of their data and how it is intended to be used, maybe there is
a way to provide them with tooling to write their own assertions.

This is probably not 2.0 material, but something to consider, maybe.

[...]

  Looking at this email so far, I'm tempted to think that we might have
  both of these approaches (declaring a junction as 'isolated' can
  allow hiding a local junction and be more convenient, but failing
  this we can still whitelist junctions in reverse dependencies).

I suspect we may need more time to get to the best solution for each
use case. We don't want to block !1901 on that, though. Could it make
sense to change the error to a fatal warning to unblock !1901,
deferring more advanced configuration/checks?

I.e., allow projects to make the conflict warning/error non-fatal in
project.conf until we've implemented a better solution for each use
case? To be clear, configuring the conflict warning as non-fatal would
cause the relevant junctions/subprojects to be independent (no
automatic coalescing, unlike master). And if there are real file
conflicts, projects would still get overlap warnings/errors.

I am liking this approach for the short term, let's do a small thought
experiment to see how it would look like.


Would be diamond example
~~~~~~~~~~~~~~~~~~~~~~~~

          a
         / \
        b   c
       /     \
     d(1)    d(2)

In the simple "would be diamond" example, the warning must occur in
project (a), as such it is project (a) which consequently decides
whether it is a fatal error for (d) to exist twice.

Project (a) subsequently has the choice to live with this error, or
to cause it to be a diamond by using override features (and possibly
also the "target" feature).

Here I can already see a flaw, though - without project (a) having the
right to specifically whitelist project (d) for duplication in those
specific two junction points, it must make the same fatal/non-fatal
decision for all subprojects, period.


My further examples were going to highlight what happens when you add
projects which depend on (a), and then also indirectly depend on (d),
and try working out which project makes which decision about the error
being fatal (i.e. "in which project does the fatal error occur ?"), but
I think that the inability to whitelist is already a show stopper.


I realize this long email is not conclusive in any way, perhaps
bringing up more questions than answers, but I think we're making
important progress.

Cheers,
    -Tristan


_______________________________________________
buildstream-list mailing list
buildstream-list gnome org
https://mail.gnome.org/mailman/listinfo/buildstream-list




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]