Re: [BuildStream] Protect against plugin modifications of artifacts





On 22/06/2020 18:28, Tristan Van Berkom wrote:
Hi,

On Mon, 2020-06-22 at 18:01 +0100, William Salmon wrote:
[...]
Definitely agree.

The thing is, both are highly problematic.


This is my issue, creating PLUGIN_GENERATED_CONTENT is some times highly
desirable / necessary*

So we should A) make it as easy as possible to make sure
PLUGIN_GENERATED_CONTENT is unique to a cache key, B) minimize the risk
of PLUGIN_GENERATED_CONTENT not getting correctly translated in to the
sand box.

Writing via the vdir API to the cas is a much better way to get
PLUGIN_GENERATED_CONTENT in to the sandbox, IMO because it dose not have
to be escaped to get through sh and cat,

But this is the whole point, data which ends up in artifacts should be
created within the sandbox, where the input of that data is
deterministic (i.e. the data _and_ tooling used to create that output).

If it is generated with scripts residing in the sandbox as everything
else is generated as usual, then there is no extra escaping to to be
done.

Further, as mentioned in a previous mail here, the shell invocations
also need not be shell, we can run python or JS or anything in the
sandbox as well, or stage scripts from sources to run in the sandbox,
there is a lot of leeway here.

  dose not need extra deps and
dose not needs manipulation of configuration commands etc.

This indeed.

You are focusing on the convenience of breaking the promise which
BuildStream makes, this is where I cannot get on board.

Yes I am focused on using bst, plugins are a key part of why bst is usable, but i do also care about it being reproducible and having trust in the cache keys. Things get in to the sandbox by a variety of ways, by variable, config etc often as arguments on a programs cli but i really fail to see the difference in providing something as a config file or as part of somethings cli.


Of course you need to stage the tooling which is used to create output,
that is because BuildStream intentionally enforces that you always use
deterministic input, and the exact build/copy of the binaries used to
create output, is a part of the input.

  > The fact that generating data in python is problematic is not a
reason to avoid fixing illegal writing to the sandbox (which probably
mostly happens *due* to the latter problem), its a reason to ensure that
we *also* ensure that doesn’t happen.
  >

While the addition may have been a error in your eyes, If writing to the
sandbox was not public API I would have be arguing that it should be.
For the reasons above.

It was most definitely an error.

The plugins need to have a way to stage files in locations of their
choosing, before the vdir abstraction was in place, the only way to do
this was by providing a directory argument.

This was abused, and has now lead to generation of artifact data which
is generated non-deterministically, without controlling the inputs of
this output, which is at the heart of BuildStream's promise.

I feel you are conflating two things, are plugins `non-deterministic` and should we let plugins alter whats in the sandbox. The answer to should plugins be deterministic should clearly be yes and the answer to should plugins effect what happens in the sandbox seem clearly yes. If plugins are non deterministic then we need to fix that, plugins effect the build in a number of ways and if we cant trust plugins then we cant trust anything about bst as every element is build with a plugin.

Once plugins are trust able, and they must be or the hole concept of cache keys falls apart. Then your hole argument for why we cant put things in the sandbox falls apart, given that we must fix plugins so they are trust able then I fail to see a issue with plugins putting things in to the sandbox.


Cheers,
     -Tristan





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]