Re: [BuildStream] Protect against plugin modifications of artifacts
- From: Tristan Van Berkom <tristan vanberkom codethink co uk>
- To: William Salmon <will salmon codethink co uk>, dev buildstream apache org, buildstream-list gnome org
- Subject: Re: [BuildStream] Protect against plugin modifications of artifacts
- Date: Mon, 22 Jun 2020 20:24:39 +0900
Hi William,
Thanks for engaging in this.
On Mon, 2020-06-22 at 11:02 +0100, William Salmon wrote:
Reply in line.
[...]
There are already a number of plugins that put files in the sandbox but
that do it slightly differently. They run commands that then get
executed in the sand box but the effect is the same, files get added
that did not come from a source plugin.
It is very common to use cat and some variable to place files in the
sandbox that do not come from a "source plugin", see x86image [4] which
Tristan often mentions. I appreciate that by creating the file in this
way, the action is recorded in the config but the config can be
manipulated by the plugin. I have written rather convoluted plugins to
try and make the x86image much more generic. This made assembling the
cat command very complex and ultimately lead to the plugin not being
successful.
To me the issue is, do changes in the artifact have changes in the cache
key. It is the authors job to make sure that this is true with tools
such as get_unique_key.
A author could cause changes to a artifact by changing the way they use
the `blessed` api eg `layout_add`.
Using cat is actually rather tricky and I think we could make much
better plugins by just adding files to the sandbox directly. The
x86image is very ridged and so i have had mixed success with a plugin
wrapped around genimage, freedesktop-sdk use this approach but not the
plugin a lot but as there configuration files are added with cat they
end up having to have loads of very different elements.
I think a good thing would be for use to have a flexible image creation
plugin that has a nice way to create configs and assemble the sandbox. I
was planing to use the API that Tristan is planning to outlaw to do this
as I think it is a the best way. As noted above plugins anthers who are
very familiar with bst2 internals have chose to do this for plugins like
OCI.
Looking at the new version of some of the plugins you have talked about
[7] [8] I am confused by the problem. The new api has all file
manipulation in nice context handlers that control how the new data is
added in to the cas. I can see that maybe we could change when you can
do this so that the new files can be baked in to the cache key. Maybe at
configure time, but I think that would add time to bst show.
As discussed on IRC adding a cleaner API for changing permissions etc
would be nice but I don't really see why having to include a build dep
to bring chown/chmod in to a sandbox and coupling this with some config
is so much better. Or why we can not found a way to make sure that
tweaking the sandbox can not be done safely.
The main differences here are:
A.) The guarantees we can provide about running safe sandboxed code
compared with untrusted local python interpreter code.
This would lead us down a path of providing ever more carefully
crafted APIs in the core in order to produce reproducible output
in the artifact, all the while not being certain what python APIs
a plugin might use to produce output.
B.) Simply having a "second way" of doing things complicates things
for us, as the question of which way to choose arises for a
plugin author, when the safe way should be the only choice.
C.) I wasn't thinking of this but indeed, as you pointed out: we
leave open the chance for human error by implying that the plugin
author should rev their Plugin.get_unique_key() implementation at
any time that their code changes.
Seeing how third party plugins have been maintained in the past
(i.e. rather sloppily, just getting the job done, fire and
forget), I don't expect this level of diligence from plugin
authors... and to be clear: I think this is a *good thing*.
If we take a simple example of even the current collect_manifest
plugin, take a look at the lines where we construct dictionaries:
https://gitlab.com/BuildStream/bst-plugins-experimental/-/blob/master/src/bst_plugins_experimental/elements/collect_manifest.py#L191
https://gitlab.com/BuildStream/bst-plugins-experimental/-/blob/master/src/bst_plugins_experimental/elements/collect_manifest.py#L309
https://gitlab.com/BuildStream/bst-plugins-experimental/-/blob/master/src/bst_plugins_experimental/elements/collect_manifest.py#L322
Here we can see that the output is clearly not reproducible, while it
might be reproducible in more recent versions of python, I'm not
certain we can be sure how long this will last.
If we were running this python code which constructs dictionaries in
the sandbox, at the very least we could set PYTHONHASHSEED=0 in the
environment for projects which use such unstable python code (note that
this behavior of dicts has flipflopped over the years, early on it was
accidentally stable, then intentionally randomized, and now finally
forced to behave like the OrderedDict() type).
This is not an invitation to a debate about whether we can trust python
dictionaries or not in the future, it's just an example of a class of
problem we should not be held accountable for: We simply provide the
sandbox for elements to perform permutations within, and provide some
controls to allow for projects to achieve maximal reproducibility.
To answer another point in the above paragraph: `cat` need not be the
only way to construct data in the sandbox.
Currently, the BuildElement only offers support for running commands
provided as text supplied in it's `config` as shell scripts, but this
is really only because nobody has cared enough to give it more
capability yet, see:
https://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/buildelement.py#L309
There's no reason why BuildElements (or other elements) need to
restrict themselves to running shell script (it could equally pass text
to /usr/bin/python or /usr/bin/node or whatever).
Indeed, to take a page from rpm spec files, they have had such support
in their %pre / %post scripts for a very long time and I've been hoping
to add it to BuildElement too but haven't found a personal itch that it
would scratch as of yet, still there's nothing preventing this afaics.
But in short, the authors of these plugins should be doing the work to
get their features supported properly upstream, and BuildStream 2.0
should not be held hostage to provide alternative support for a feature
which was never a feature in the first place.
BST should not be held hostage to any one of course but given that I
have found and helped to fix 8 bugs in the last 2 mounts from trying to
get bst2 to work with freedesktop-sdk. Several of these bugs have been
HUGE eg the handling of variables being broken or bst not being able to
pull a project the size of freedesktop-sdk it seems that freedesktop-sdk
and buildstream working together is giving both projects a lot of benefit.
It would seem rather prudent to me to check that bst2 can be used with
some real projects before the release. Freedesktop-sdk seems like the
most obvious candidate as everyone one can see it and work on it. But I
don't expect us to need to block on freedesktop-sdk specifically, if we
have a alternative approach to validate bst before release?
This is a very good point.
To be honest, I raised the issue of blocking here as a matter of
principle, mostly because I've noticed that a lot of effort has been
(mis)placed into adding support for something I believe we shouldn't
even be supporting in the first place, and I expect this is due to a
thinking that we should block progress and refactors on ensuring
plugins still work.
For the most part I agree with this, but I think it simple went
overlooked that in this case, we never really explicitly supported this
behavior, and instead we are blocking on supporting an exploitation of
the `directory` argument, lending more ability to plugins than it
should.
For collect_manifest, there should be ways to work around this without
too much effort, and for the purpose of ensuring BuildStream works in
the meantime, surely we can build freedesktop-sdk without involving the
collect_manifest plugin and still know that it works.
Cheers,
-Tristan
So, what are your thoughts ?
Best regards,
-Tristan
[0]: https://gitlab.com/BuildStream/buildstream/-/blob/master/src/buildstream/storage/directory.py
[1]: https://gitlab.com/BuildStream/buildstream/-/issues/235
[2]: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/55
[3]: https://mail.gnome.org/archives/buildstream-list/2018-August/msg00013.html
[4]: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/376
[5]: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/merge_requests/735
[6]: https://gitlab.com/BuildStream/bst-plugins-experimental/-/issues/2
[7]:
https://gitlab.com/BuildStream/bst-plugins-experimental/-/blob/master/src/bst_plugins_experimental/elements/x86image.yaml#L48
[8]:
https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/blob/dwinship/bst2-fv/use-virtual-directories-in-plugins/plugins/elements/url_manifest.py#L174
[9]:
https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/blob/dwinship/bst2-fv/use-virtual-directories-in-plugins/plugins/elements/collect_initial_scripts.py#L50
_______________________________________________
buildstream-list mailing list
buildstream-list gnome org
https://mail.gnome.org/mailman/listinfo/buildstream-list
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]