Re: [BuildStream] Protect against plugin modifications of artifacts



One more round...

On Tue, 2020-06-23 at 11:12 +0100, William Salmon wrote:

[...]
Of course you need to stage the tooling which is used to create output,
that is because BuildStream intentionally enforces that you always use
deterministic input, and the exact build/copy of the binaries used to
create output, is a part of the input.

  > The fact that generating data in python is problematic is not a
reason to avoid fixing illegal writing to the sandbox (which probably
mostly happens *due* to the latter problem), its a reason to ensure that
we *also* ensure that doesn’t happen.
  >

While the addition may have been a error in your eyes, If writing to the
sandbox was not public API I would have be arguing that it should be.
For the reasons above.

It was most definitely an error.

The plugins need to have a way to stage files in locations of their
choosing, before the vdir abstraction was in place, the only way to do
this was by providing a directory argument.

This was abused, and has now lead to generation of artifact data which
is generated non-deterministically, without controlling the inputs of
this output, which is at the heart of BuildStream's promise.

I feel you are conflating two things, are plugins `non-deterministic` 
and should we let plugins alter whats in the sandbox. The answer to 
should plugins be deterministic should clearly be yes and the answer to 
should plugins effect what happens in the sandbox seem clearly yes. If 
plugins are non deterministic then we need to fix that, plugins effect 
the build in a number of ways and if we cant trust plugins then we cant 
trust anything about bst as every element is build with a plugin.

Once plugins are trust able, and they must be or the hole concept of 
cache keys falls apart. Then your hole argument for why we cant put 
things in the sandbox falls apart, given that we must fix plugins so 
they are trust able then I fail to see a issue with plugins putting 
things in to the sandbox.

There are two things you appear to be conflating, which is stability
and correctness of cache keys, and reproducibility of build artifacts.


This whole discussion is about reproducibility, not about cache key
composition (although it was raised as an orthogonal concern, it is not
centric to why we don't use plugins to create output).

For reproducibility, the plugins cannot reasonably be trusted to
compose data reproducibly, the premise of creating reproducible output
in artifacts is to use deterministic inputs: The host version of
installed python, plus the versions of any python libraries which are
running, are not deterministic.

BuildStream ensures as a promise, that output is based on deterministic
inputs, do not conflate this with the calculation of cache keys which
BuildStream uses to identify the inputs.

I've already demonstrated in my first reply to you an example of how
the output of collect_manifest is already non-reproducible, precisely
*because* the host version of python will have an effect on it's
output, I emphasized that this was an *example* (not an invitation to
haggle over whether 'dict' can be trusted in the future, there is no
guarantee that it can), that said, it was a single example to
demonstrate that host python *matters* when constructing output.

Keep the following things in mind:

  * An important set of BuildStream users will be striving to produce
    reproducible output.

    If they achieve reproducibility today, they should be able to take
    the exact same project state, and use the latest version of
    BuildStream 2, in 10 years from now... and they should be able
    to produce the exact same output, bit-for-bit.

  * While BuildStream all by itself cannot guarantee reproducible
    output, our role is to guarantee that even after upgrading your
    host BuildStream installation in 10 years from now, BuildStream
    will repeat the build in *exactly the same way*.

The goal of deterministic building is entirely centric to the
BuildStream mission, and cannot be derailed by a desire to have things
just a little bit more convenient.


So, please keep in mind, plugins are not there to create output, there
is no way we can make plugins trustable for creating output, because we
cannot be in control of the environment in which plugins run.

Cheers,
    -Tristan




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]