Re: [BuildStream] Protect against plugin modifications of artifacts

From: Tristan Van Berkom <tristan vanberkom codethink co uk>
To: William Salmon <will salmon codethink co uk>, dev buildstream apache org, buildstream-list gnome org
Subject: Re: [BuildStream] Protect against plugin modifications of artifacts
Date: Mon, 22 Jun 2020 20:24:39 +0900

Hi William,

Thanks for engaging in this.

On Mon, 2020-06-22 at 11:02 +0100, William Salmon wrote:

Reply in line.

[...]

There are already a number of plugins that put files in the sandbox but 
that do it slightly differently. They run commands that then get 
executed in the sand box but the effect is the same, files get added 
that did not come from a source plugin.

It is very common to use cat and some variable to place files in the 
sandbox that do not come from a "source plugin", see x86image [4] which 
Tristan often mentions. I appreciate that by creating the file in this 
way, the action is recorded in the config but the config can be 
manipulated by the plugin. I have written rather convoluted plugins to 
try and make the x86image much more generic. This made assembling the 
cat command very complex and ultimately lead to the plugin not being 
successful.

To me the issue is, do changes in the artifact have changes in the cache 
key. It is the authors job to make sure that this is true with tools 
such as get_unique_key.

A author could cause changes to a artifact by changing the way they use 
the `blessed` api eg `layout_add`.

Using cat is actually rather tricky and I think we could make much 
better plugins by just adding files to the sandbox directly. The 
x86image is very ridged and so i have had mixed success with a plugin 
wrapped around genimage, freedesktop-sdk use this approach but not the 
plugin a lot but as there configuration files are added with cat they 
end up having to have loads of very different elements.

I think a good thing would be for use to have a flexible image creation 
plugin that has a nice way to create configs and assemble the sandbox. I 
was planing to use the API that Tristan is planning to outlaw to do this 
as I think it is a the best way. As noted above plugins anthers who are 
very familiar with bst2 internals have chose to do this for plugins like 
OCI.

Looking at the new version of some of the plugins you have talked about 
[7] [8] I am confused by the problem. The new api has all file 
manipulation in nice context handlers that control how the new data is 
added in to the cas. I can see that maybe we could change when you can 
do this so that the new files can be baked in to the cache key. Maybe at 
configure time, but I think that would add time to bst show.

As discussed on IRC adding a cleaner API for changing permissions etc 
would be nice but I don't really see why having to include a build dep 
to bring chown/chmod in to a sandbox and coupling this with some config 
is so much better. Or why we can not found a way to make sure that 
tweaking the sandbox can not be done safely.


The main differences here are:

  A.) The guarantees we can provide about running safe sandboxed code
      compared with untrusted local python interpreter code.

      This would lead us down a path of providing ever more carefully
      crafted APIs in the core in order to produce reproducible output
      in the artifact, all the while not being certain what python APIs
      a plugin might use to produce output.

  B.) Simply having a "second way" of doing things complicates things
      for us, as the question of which way to choose arises for a
      plugin author, when the safe way should be the only choice.

  C.) I wasn't thinking of this but indeed, as you pointed out: we
      leave open the chance for human error by implying that the plugin
      author should rev their Plugin.get_unique_key() implementation at
      any time that their code changes.

      Seeing how third party plugins have been maintained in the past
      (i.e. rather sloppily, just getting the job done, fire and
      forget), I don't expect this level of diligence from plugin
      authors... and to be clear: I think this is a *good thing*.

If we take a simple example of even the current collect_manifest
plugin, take a look at the lines where we construct dictionaries:

   
https://gitlab.com/BuildStream/bst-plugins-experimental/-/blob/master/src/bst_plugins_experimental/elements/collect_manifest.py#L191
   
https://gitlab.com/BuildStream/bst-plugins-experimental/-/blob/master/src/bst_plugins_experimental/elements/collect_manifest.py#L309
   
https://gitlab.com/BuildStream/bst-plugins-experimental/-/blob/master/src/bst_plugins_experimental/elements/collect_manifest.py#L322

Here we can see that the output is clearly not reproducible, while it
might be reproducible in more recent versions of python, I'm not
certain we can be sure how long this will last.

If we were running this python code which constructs dictionaries in
the sandbox, at the very least we could set PYTHONHASHSEED=0 in the
environment for projects which use such unstable python code (note that
this behavior of dicts has flipflopped over the years, early on it was
accidentally stable, then intentionally randomized, and now finally
forced to behave like the OrderedDict() type).

This is not an invitation to a debate about whether we can trust python
dictionaries or not in the future, it's just an example of a class of
problem we should not be held accountable for: We simply provide the
sandbox for elements to perform permutations within, and provide some
controls to allow for projects to achieve maximal reproducibility.


To answer another point in the above paragraph: `cat` need not be the
only way to construct data in the sandbox.

Currently, the BuildElement only offers support for running commands
provided as text supplied in it's `config` as shell scripts, but this
is really only because nobody has cared enough to give it more
capability yet, see:

    https://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/buildelement.py#L309

There's no reason why BuildElements (or other elements) need to
restrict themselves to running shell script (it could equally pass text
to /usr/bin/python or /usr/bin/node or whatever).

Indeed, to take a page from rpm spec files, they have had such support
in their %pre / %post scripts for a very long time and I've been hoping
to add it to BuildElement too but haven't found a personal itch that it
would scratch as of yet, still there's nothing preventing this afaics.

But in short, the authors of these plugins should be doing the work to
get their features supported properly upstream, and BuildStream 2.0
should not be held hostage to provide alternative support for a feature
which was never a feature in the first place.


BST should not be held hostage to any one of course but given that I 
have found and helped to fix 8 bugs in the last 2 mounts from trying to 
get bst2 to work with freedesktop-sdk. Several of these bugs have been 
HUGE eg the handling of variables being broken or bst not being able to 
pull a project the size of freedesktop-sdk it seems that freedesktop-sdk 
and buildstream working together is giving both projects a lot of benefit.

It would seem rather prudent to me to check that bst2 can be used with 
some real projects before the release. Freedesktop-sdk seems like the 
most obvious candidate as everyone one can see it and work on it. But I 
don't expect us to need to block on freedesktop-sdk specifically, if we 
have a alternative approach to validate bst before release?


This is a very good point.

To be honest, I raised the issue of blocking here as a matter of
principle, mostly because I've noticed that a lot of effort has been
(mis)placed into adding support for something I believe we shouldn't
even be supporting in the first place, and I expect this is due to a
thinking that we should block progress and refactors on ensuring
plugins still work.

For the most part I agree with this, but I think it simple went
overlooked that in this case, we never really explicitly supported this
behavior, and instead we are blocking on supporting an exploitation of
the `directory` argument, lending more ability to plugins than it
should.

For collect_manifest, there should be ways to work around this without
too much effort, and for the purpose of ensuring BuildStream works in
the meantime, surely we can build freedesktop-sdk without involving the
collect_manifest plugin and still know that it works.


Cheers,
    -Tristan


So, what are your thoughts ?

Best regards,
     -Tristan


[0]: https://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/storage/directory.py
[1]: https://gitlab.com/BuildStream/buildstream/-/issues/235
[2]: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/55
[3]: https://mail.gnome.org/archives/buildstream-list/2018-August/msg00013.html
[4]: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/376
[5]: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/merge_requests/735
[6]: https://gitlab.com/BuildStream/bst-plugins-experimental/-/issues/2

[7]: 
https://gitlab.com/BuildStream/bst-plugins-experimental/-/blob/master/src/bst_plugins_experimental/elements/x86image.yaml#L48
[8]: 
https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/blob/dwinship/bst2-fv/use-virtual-directories-in-plugins/plugins/elements/url_manifest.py#L174
[9]: 
https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/blob/dwinship/bst2-fv/use-virtual-directories-in-plugins/plugins/elements/collect_initial_scripts.py#L50

_______________________________________________
buildstream-list mailing list
buildstream-list gnome org
https://mail.gnome.org/mailman/listinfo/buildstream-list

Follow-Ups:
- Re: [BuildStream] Protect against plugin modifications of artifacts
  - From: William Salmon

References:
- [BuildStream] Protect against plugin modifications of artifacts
  - From: Tristan Van Berkom
- Re: [BuildStream] Protect against plugin modifications of artifacts
  - From: William Salmon

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]