Re: [BuildStream] Allowing duplicate junctions [Was: Be explicit when overriding junction configuration, or else warn/error]



Hi all,

  Yesterday Ben and I had a long conversation on this topic in general,
in which we conducted some thought experiments on the implications of
some of the suggested approaches, this can be reviewed in full detail
here[0].

We've come to a conclusion on how this should be implemented, which I
will summarize in this email.

Note that the exact semantics and names of keywords to use for the
implementation may change, naming things is hard to get right and we'd
like to make this as clear as possible.

For completeness, I will include the overrides feature which is already
implemented on !1901[1].


Problem Statement
=================
This whole subject revolves around consolidating situations where
multiple projects in the same pipeline might have a junction to the
same project, while also allowing users to explicitly use multiple
instances of the same project possibly configured differently within
the same build pipeline.

Currently, this is achieved by coalescing junction configurations using
the name of the junction element to match junctions, and deferring to
the highest level junction of a given name to have the last word[2].

This coalescing is not only confusing, it is error prone because you
can end up overriding the configuration of a junction in a subproject
without actually knowing that you are doing this. Note that the
documentation says:

  "It is recommended that the name of a junction is set to the same as
   the name of the linked project"

This means that in all likelyhood, if you start depending on a new
project which shares a dependency on another project you also depend
on, you will unknowingly override the subproject's junction;
consequently invalidating any guarantees that your subproject makes
about the artifacts it purports to provide.

The (partially already implemented) solution set forth in this email
prioritizes on the following objectives:

  * Ensure that you can explicitly override a junction configuration
    in any subproject.

    * Ensure that you cannot override a junction configuration without
      knowing it.

  * Ensure that you are allowed to have multiple instances of the same
    subproject, possibly configured differently, in the same build
    pipeline.

    * Ensure that this is only possible with explicit controls.

  * Reduce as much as possible any knowledge that a project needs to
    have about a subproject's implementation details.


Implementation Plan
===================
This implementation plan consist of various elements which come
together to form a solution to the above points.


Full element path addressing
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Currently BuildStream only supports expressing subproject elements with
only one layer of depth, i.e.: "{subproject-junction.bst}:{element.bst}"

This is a product of the junction name coalescing which is going away,
as the assumption that you can always add a junction in the local
project to express a junction at any depth of the project dependency
graph becomes false, it would no longer be possible to express a deeply
nested element or junction without support for fullpaths.

Essentially this means that elements will now be expressable in the
following forms:

  - {element.bst}
  - {subproject-junction.bst}:{element.bst}
  - {subproject-junction.bst}:{subsubproject-junction.bst}:{element.bst}
  - and so on, without any limit on depth.

Furthermore, elements will be expressed in `bst show` output and in the
UI with full paths, because it would otherwise be impossible to
distinguish between elements which appear to be the same but are not
the same.


Overrides feature
~~~~~~~~~~~~~~~~~
Similar to what is already implemented in !1901[1] and discussed
previously, we will have an explicit "overrides" semantic which allows
a project to explicitly override a junction configuration in a
subproject.

This will be configured in project.conf as such:

  # Junction specific configurations here
  #
  junctions:

    # Override the "sub-subproject.bst" junction in the project
    # referenced by "subproject.bst" with the local junction
    # declaration "local-junction.bst"
    #
    overrides:
      "subproject.bst:sub-subproject.bst": local-junction.bst

Note that keys in the overrides dictionary can be expressed as full
element paths.

This feature replaces the current junction name coalescing entirely.


Error out on project name collisions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The loader will now produce a fatal error if the same project is ever
loaded twice in the same build pipeline.

This will be done by observing collisions of project names[3].

With the currently listed features in place, we have met the objective
of being able to override the configuration of any deeply nested
subproject, and we have certainty that we cannot ever accidentally
override a subproject.

The following features will introduce ways of explicitly avoiding the
fatal error of project name collisions at load time.


Allow duplication of projects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When a project encounters an error due to multiple instances of the
same project being loaded, they will already have the following
options:

  * Use "overrides" to override a subproject's junction, consequently
    modifying the subproject.

  * Use a "link" element[4] locally instead of a junction, consequently
    inheriting the subproject's junction configuration

  * Using a combination of both of the above, a project can link to
    a subproject's junction and use that local link to override another
    subproject's junction (effectively overriding one subproject's
    junction with another subproject's junction)

In addition to the above, we will allow this toplevel project to
"duplicate" one the conflicting junctions so that it is allowed to
coexist.

This will be configured in project.conf right beside the overrides:

  # Junction specific configurations here
  #
  junctions:

    # Specify junctions which are allowed to be loaded in parallel,
    # even though they were found to conflict with other subprojects
    # depending on the same projects which these refer to.
    #
    duplicate:
    - local-junction.bst
    - subproject.bst:sub-subproject.bst

This whitelisting allows the toplevel project to make the judgment call
that it is perfectly okay for the same project to be used twice, if for
example they serve different purposes and there will not be any risk of
separate instances of the same element being staged at the same
location in a given build sandbox (which would cause overlap errors in
the end).


Internal subprojects
~~~~~~~~~~~~~~~~~~~~
In various use cases of a subproject, the junctioning project has
knowledge that this given subproject is only used as a build
dependency, this can include cases such as:

  * Bootstrapping, where we might have a self-junction in place which
    builds the same project under a different machine architecture.

  * Depending on projects for compilers or other tooling which is used
    to produce the artifacts this project provides, but is not included
    in the project's output via runtime dependencies.

In cases such as these, it is not desirable to force consumers (reverse
dependency projects) to have knowledge of these internal details, by
forcing these consumers to mark junctions as duplicates if they have a
separate need for the same projects.

In these cases, the original junctioning project can mark a subproject
as "internal" in their junction declaration.

This will be expressed as a list in the project.conf in the same
section:

  # Junction specific configurations here
  #
  junctions:

    # List junctions which are to be considered internal to this 
    # project.
    #
    # Only local junctions are allowed here and nested paths are
    # rejected here.
    #
    internal:
    - compiler.bst
    - font-processor.bst

Wherever it makes sense, it will be preferable to declare a junction as
"internal" rather than whitelisting it as a "duplicate" in a reverse
dependency project, as the overall burden of knowledge is reduced.

NOTE: The reason we should not place this configuration in the junction
      itself, is because junctions can be overridden as previously
      described, or they can be linked[4], so it is important to tie
      this information to the project where the active junction
      configuration resides, and not in the junction declaration
      itself.

      This information pertains to the relationship of a project and
      it's junction, and not the declaration of the junction itself.


Potential error checking for internal subprojects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The expectation is that the artifacts from an "internal" project are
only used in the construction of artifacts in the local project, and as
such there is no danger of any conflict whatsoever with consumers
(reverse dependency projects).

Interestingly, this rule can potentially be enforced with an error in
the case that runtime dependencies bleed into consumer projects,
consider an error message such as:

  "foo.bst: Dependency on element `bar.bst` from junction `bar-junction.bst` internal to project `pony`"

To illustrate this a bit further.

   ------------------------------

      A   (toplevel project)
      |
   ------------------------------
      |
      B   (subproject)
      |
   ------------------------------
      |
      C   (internal subproject)

   ------------------------------

 * In the case that the B->C dependency is only a build dependency,
   then the rule is not broken, and the toplevel project does not
   have any dependency on subproject's internal sub-subproject.

 * In the case that the B->C dependency is a runtime dependency,
   then the toplevel project can receive an error when resolving
   the dependencies of element A (at load time).

I'm not sure at this time that implementing such load time checks would
introduce a significant performance overhead, and I'm not entirely sure
it is a hard requirement to have these checks implemented, I would
expect to land the branch in advance of this error checking and
investigate adding these errors later on.


Summary
=======
We have investigated various angles to the problem, while the renaming
of projects appears to be simpler at surface value, in depth thought
experiments show that this leads to many other changes which don't
necessarily make sense (again, see our IRC conversation[0] for a full
log of our thoughts on that).

This solution is simple enough to implement and less disruptive.

Thoughts ?


Cheers,
    -Tristan


[0]: https://irclogs.baserock.org/buildstream/%23buildstream.2020-06-03.log.html
[1]: https://gitlab.com/BuildStream/buildstream/-/merge_requests/1901
[2]: https://docs.buildstream.build/1.4.3/elements/junction.html#nested-junctions
     (link to documentation in a stable release posterity's sake)
[3]: https://docs.buildstream.build/master/format_project.html#project-name
[4]: https://docs.buildstream.build/master/elements/link.html



On Wed, 2020-06-03 at 08:38 +0000, Benjamin Schubert wrote:
Hey Tristan,

I must say I find this thread hard to understand. I'm going to try to
summarize it, both to ensure I understood everything and potentially
help
others follow.

TLDR: I like the idea of explicitely having junctions users have to
configure
      it instead of having to have knowledge of some specific use
cases.
      I however am not 100% sure about the descibed format.

I also have a few comments inline.


So here is my understanding:

Problem: We want to allow projects to depend on another project
multiple times,
         and accross junctions.

Why it can't be done today: We ensure that the `name` of the project
is unique
in the pipeline, and thus, we won't be able to have multiple times
the same
project.

There are two ways we want to be able to do this:

- Allowing a parent project to tell a junction to use another element
as the
  junction, for example, if we build a project over gnome-build-meta, 
and we
  also have a freedesktop-sdk junction, we might wand gnome-build-
meta to use
  our exact junction of the project, not theirs.

- Allowing a project to rename the project names of its junction, or
their
  junctions, thus removing the name clashes and stating "this is a
different
  project". This would be useful, for example, for "build-only"
junction
  dependencies.

For this, we want to introduce two new constructs in a `junction`
configuration:

- `overrides`, which takes a mapping of element -> element, which
replaces every
  reference to the `key` element by the `value` element, thus solving
problem 1.

- `project-names`, which takes a mapping of name -> name, which
replaces the name
  of the `key` by the `value`, in the context of the junction, thus
solving
  problem 2.

Now my personal opinion:

I find it weird that for `overrides`, we use elements as keys, and
for names, we use
names. I would have expected us to either always use names, or
elements.
I think we should use one and only one of them.

- Names would be more flexible, as we would require less knownledge
about the junctioned
project (where does it store the junction?).

- Elements are more rigid but less error prone and completely
explicit.


I also am not sure how this would work with deeply nested junctions:

- If we want to override, would we specify something like:

```
kind: junction

overrides:
  "myjunction.bst:subjunction.bst": my_own_junction.bst
```

Or how would we construct this?

- Same for names, if a junction has two junctions on the same
project, and renames one of
  them. How would I rename each of them? Would I be targeting the
name in the junction I'm
  overriding?

So if A has a junction on B, which has two junctions on C, namely C
and C2, would I do:

```
kind: junction

config:
  project-names:
    c2: my-own-c
```

?

In summary, I think we need to either use junction names for keys, or
elements, but not
mix them. I personally lean towards elements, as they are more
explicit.

Apart from that, I'm happy with the proposal.

More inline.


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, 12 May 2020 12:39, Tristan Van Berkom <
tristan vanberkom codethink co uk> wrote:

[...]
On Tue, 2020-05-12 at 11:47 +0200, Jürg Billeter wrote:

Hi Tristan,
On Fri, 2020-05-08 at 16:50 +0900, Tristan Van Berkom wrote:

[...]

    Cross architecture bootstrapping
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    When bootstrapping a runtime for a different architecture,
it can
    be interesting to use the same toolchain project configured
    multiple times with different project options defining
which host
    and target architectures to build libc/gcc under.

    When combining this ability with remote execution, we can
    streamline the process of bootstrapping a system under any
    architecture which we have runners for on the RE cluster.


A possible solution for this use case is to extend the key used
for
conflict detection a bit. Instead of only using the project name
as
key, we could include the values of (selected) project options as
well.
E.g., the target architecture option would be sensible to
include, in
my opinion.

I think for one thing, we don't know what a target architecture
option
is; we only know that an option is an "architecture" option, but
this
could be translated into a host/sandbox requirement, or used in
compiler build instructions to define a target architecture, it it
could be used for something completely orthogonal to the host
requirement or the architecture on which compiled code is expected
to
run: I think we don't have the right to know.

Aside from that, I think it's going to be undesirable to stray into
the
territory of comparing junction/project configurations, there is no
reason why I should be allowed to configure the same project with
different project options, but not with different source
configurations
(different versions).


I do agree there, we should not restrict _how_ we can inherit from
junctions,
and only target/arch is not the only way we might want to change
that.


Note that we only currently distinguish:

-   This project was instantiated once (possibly the same instance
was
    used multiple times by way of "overrides" or by way of the
junction
    "target" feature).

-   This project was instantiated multiple times

    We don't recognize equality of projects which are explicitly
    instantiated multiple times, they might accidentally produce
exactly
    the same cache keys, but it is no less of an error if they do.

    It would be good to preserve this simplicity I think.


    Auxiliary projects which provide static build-only
dependencies
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~
    When one project depends on another project for some static
data
    which will be consumed as build-only dependencies, the data
    from the junctioned project is consumed statically as is,
and there
    is no concern of runtime dependencies being propagated
forward to
    reverse dependency projects which might consume the same
junction.


This also includes statically linked libraries where no runtime
data is
required (or runtime data is in a private prefix).
"Isolated junctions" seems like a sensible solution for this use
case.

Right, I specifically like this because it abstracts away some of
the
problem from any reverse dependency projects, and it is not tied to
any
specific use case, it only states:

"I use this subproject internally to produce data which I consume
verbatim, and it is an error if reverse dependency projects end up
with runtime dependencies from this internal subproject"


I can see some value in that, a `private: true` in the `config`
section
forbidding runtime dependencies on the junction might be enough?

Or we could go differently, and state that `private: true`, forbids a
sub-project
from accessing the junction?

But that might be better in another ML thread.

Cheers,

Ben





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]