Re: [BuildStream] [Summary] Plugin fragmentation / Treating Plugins as Sources
- From: Tristan Van Berkom <tristan vanberkom codethink co uk>
- To: Angelos Evripiotis <angelos evripiotis gmail com>, Thomas Coldrick <othko97 gmail com>
- Cc: Adam Coldrick <adam coldrick codethink co uk>, buildstream-list gnome org
- Subject: Re: [BuildStream] [Summary] Plugin fragmentation / Treating Plugins as Sources
- Date: Thu, 18 Apr 2019 20:05:37 +0900
Hi !
On Wed, 2019-04-17 at 15:57 +0100, Angelos Evripiotis wrote:
Dear list,
I think Angelos is right that this deserves a summary, even if the
clear decisions which often come with a summary are not necessarily
there yet.
It's a bit challenging to summarize all of these details, I'll start by
re-stating my initial objectives, and then I'll try to summarize the
pros and cons of both suggested approaches.
Original objectives
~~~~~~~~~~~~~~~~~~~
We have expressed our intention of fragmenting our plugins out of the
core into separate repositories, and encountered some push back from
downstream package maintainers who would not be happy with maintaining
many different plugin packages, and also pointed out that this is
detrimental to the value of BuildStream which provides a good set of
plugins which are sufficient for many linux based projects.
I feel that for this reason, it is not sensible to fragment the blessed
plugins into a hand full of repositories in the current situation; and
if we want to start fragmenting plugin repositories, we should have a
solution which:
* Avoids any need for distro packaging of plugin files
* Avoids project users needing to know what plugins a project
requires and needing to explicitly install those plugins
in any way
Further than this, I feel that if we *do* have a nice automated
solution for obtaining the plugins, then we should take advantage of
that and fragment our upstream maintained plugins maximally, instead of
fragmenting them into like 4 or 5 repos.
The reasons why I would prefer maximal fragmentation are described well
enough I think in this reply to Adam Coldrick[0].
The solution
~~~~~~~~~~~~
I'll try to summarize here what would be common to both `git` and
`venv` solutions. From what I understand, I think we all agree at this
high level.
* Plugins would be obtained at project load time (if not already
obtained), in an automated fashion, similar to how we currently
fetch junctions in an automated fashion.
* Projects would have the ability to reference an exact version
of a given plugin (or a given set of plugins).
This is particularly relevant for plugins which are not API
stable, where it becomes important for a given project to specify
exactly which version of a plugin they know to work.
Please refer to my initial reply to Chandan[1] (near the top) for
an explanation of why exactly it is dangerous to use the `pip`
plugin origin for any set of plugins that is not maintained in an
API stable way.
NOTE: My initial reply to Chandan misrepresents his `venv`
counter proposal, I did not understand that his proposal
intended the `venv`s to be isolated on a per plugin basis,
because I did not believe that to be possible.
* BuildStream would support a `bst plugins track` command.
In retrospect, I think that supporting this is a little bit
costly, and much less important than the rest of the mechanics,
I would be equally happy to just not support this.
So far, I think we can easily agree that the above grants us both a
safer experience with regards to plugins, and it would all around be a
desirable thing (correct me if I'm wrong :)).
The `git` plugin origin
~~~~~~~~~~~~~~~~~~~~~~~
Essentially, this approach implements the above solution by simply
leveraging the existing code we have to fetch source code, and applying
that to also fetch plugins.
Please refer to the original proposal[2] for a detailed implementation
plan of what exactly this would look like.
Here I will enumerate the negative points raised:
* This proposal forces plugins to be hosted in git repositories.
* This proposal downloads unnecessary git history of plugin
repositories, and adds unnecessary load to upstream git
repositories which host these plugins.
I believe this particular point is moot when you consider that
it is the same for regular source code from git, and that we
already have SourceCache as a mitigation for this.
* This proposal only automates the downloads of plugin files,
and does not automate the installation of third party
python library dependencies.
This is the most relevant part of the discussion, because
we observed that it is in fact not really even safe at this
time for plugins to *have* arbitrary third party library
dependencies at all (it is currently only safe for external
python libraries which we know to really be API stable, so for
instance, a plugin package currently should never be allowed
to "pin" the required version of a dependency).
I raised this detail in my initial reply to Chandan[1], you
can search that email for my point entitled:
"Plugins have external python library dependencies"
Interestingly, Chandan's venv proposal would solve this problem
by actually making it safe for plugins to have third party
python library dependencies.
The `venv` plugin origin
~~~~~~~~~~~~~~~~~~~~~~~~
This approach would aim to implement "the solution" described above by
installing pythonic packages instead of files.
Each plugin "package" would be installed into a venv which would be
managed by BuildStream, and BuildStream would have to load the plugins
in such a way that each plugin would basically link only to it's own
individual venv.
I will start by enumerating the advantages I can see with this
solution:
* Plugins would have freedom to import and depend on any external
library they like, and BuildStream would ensure safety that
multiple plugins with conflicting version requirements on the
same dependency would not cause any clashes.
* If we additionally converted the majority of our Source plugins
to never use host tools, but to always prefer a pure python
implementation wherever possible, this would very much improve
the initial ramp up time for any user using BuildStream, as
they would almost never need to install any host tool.
* We would not need to bless any specific technique for hosting
the plugins, people could have the option to publish plugins
on PyPI, or use their preferred VCS so long as `pip` has support
for installing from that VCS.
And the negative points:
* Most importantly, Angelos has done some research into this
and (as I rather suspected), a solution of this nature does
not appear to be all that viable or trustworthy.
See his posts here[3] and here[4].
* This would imply a lot more bookkeeping to be implemented
in BuildStream.
- Bookkeeping of hashed venvs for any given plugin ref
- Taking the python interpretor version into account in
such hashing, so that we can recreate a new venv when the
user upgrades their host python from python3.6 to python3.7,
for example.
- We would probably want to avoid downloading the same dependency
twice for two separate plugins which require a common dependency,
which might (eventually) mean a local PyPI mirror/cache of sorts.
Essentially this would be a lot of local state to ensure is
always up to date, not to mention we would have to write a lot
of new code when compared to my original `git` origin proposal.
* The plugin tracking implementation would probably be futile.
We would want to support not only hosting of plugins on PyPI,
but also allow use of any VCS which pip supports (one of the
advantages of this proposal), but this would be very complicated
to support for tracking purposes.
I don't think this is a deal breaker because I don't think
tracking of plugins is a very important part of the solution
anyway.
Why use external dependencies ?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I think the context of why a plugin *would* want an external python
library dependency is important to consider in this conversation.
I have a pretty thorough writeup in my reply to Adam Coldrick[5],
for more details, but to summarize my opinions of that:
* Really, the only reason is for Sources to talk to external
tooling and implement Source.fetch() and Source.track().
* Elements create output from input, they should not cause
any side effects, so there is not much reason to desire
an external library here.
* If the venv proposal were to work, it would be great
to maximize use of third party libraries (for the easier
ramp-up time I described above).
* In general, we prefer even that Plugins should always call
BuildStream provided API, even over the stable, standard
python library.
If plugins mostly call only BuildStream crafted APIs, that
gives BuildStream more freedom to add features and support
additional platforms, by reimplementing the functions outlined
by BuildStreams API contract, and Plugins are more portable
this way.
An example of this is the virtual directory APIs we've been
adding, if we never gave the plugin the rights to access
host file paths with the standard library at all, then we
have more freedom to change the core and implement interesting
things like remote execution.
As Sam Thursfield pointed out[6], the dependency on the `requests`
library could instead be satisfied by BuildStream itself providing a
more efficient API for fetching files, so BuildStream would instead
depend on the `requests` library, and have the freedom to later
redefine how this is implemented in the future (I quite like this idea
in fact).
Summary of summary
~~~~~~~~~~~~~~~~~~
So I'd like to hear what people think.
I personally think the venv approach is overkill for our needs, but
have remained open to it because it also presents advantages, as long
as it is actually feasible (which I did not believe at all at the
beginning of this thread, and was optimistic for a time, but now am
doubtful again after Angelos's assessments).
Cheers,
-Tristan
---
[0]: https://mail.gnome.org/archives/buildstream-list/2019-April/msg00039.html
[1]: https://mail.gnome.org/archives/buildstream-list/2019-April/msg00026.html
[2]: https://mail.gnome.org/archives/buildstream-list/2019-April/msg00022.html
[3]: https://mail.gnome.org/archives/buildstream-list/2019-April/msg00047.html
[4]: https://mail.gnome.org/archives/buildstream-list/2019-April/msg00051.html
[5]: https://mail.gnome.org/archives/buildstream-list/2019-April/msg00041.html
[6]: https://mail.gnome.org/archives/buildstream-list/2019-April/msg00050.html
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]