Re: [BuildStream] [Summary] Plugin fragmentation / Treating Plugins as Sources



Hi !

On Wed, 2019-04-17 at 15:57 +0100, Angelos Evripiotis wrote:
Dear list,


I think Angelos is right that this deserves a summary, even if the
clear decisions which often come with a summary are not necessarily
there yet.

It's a bit challenging to summarize all of these details, I'll start by
re-stating my initial objectives, and then I'll try to summarize the
pros and cons of both suggested approaches.


Original objectives
~~~~~~~~~~~~~~~~~~~
We have expressed our intention of fragmenting our plugins out of the
core into separate repositories, and encountered some push back from
downstream package maintainers who would not be happy with maintaining
many different plugin packages, and also pointed out that this is
detrimental to the value of BuildStream which provides a good set of
plugins which are sufficient for many linux based projects.

I feel that for this reason, it is not sensible to fragment the blessed
plugins into a hand full of repositories in the current situation; and
if we want to start fragmenting plugin repositories, we should have a
solution which:

  * Avoids any need for distro packaging of plugin files
  * Avoids project users needing to know what plugins a project
    requires and needing to explicitly install those plugins
    in any way

Further than this, I feel that if we *do* have a nice automated
solution for obtaining the plugins, then we should take advantage of
that and fragment our upstream maintained plugins maximally, instead of
fragmenting them into like 4 or 5 repos.

The reasons why I would prefer maximal fragmentation are described well
enough I think in this reply to Adam Coldrick[0].


The solution
~~~~~~~~~~~~
I'll try to summarize here what would be common to both `git` and
`venv` solutions. From what I understand, I think we all agree at this
high level.

  * Plugins would be obtained at project load time (if not already
    obtained), in an automated fashion, similar to how we currently
    fetch junctions in an automated fashion.

  * Projects would have the ability to reference an exact version
    of a given plugin (or a given set of plugins).

    This is particularly relevant for plugins which are not API
    stable, where it becomes important for a given project to specify
    exactly which version of a plugin they know to work.

    Please refer to my initial reply to Chandan[1] (near the top) for
    an explanation of why exactly it is dangerous to use the `pip`
    plugin origin for any set of plugins that is not maintained in an
    API stable way.

    NOTE: My initial reply to Chandan misrepresents his `venv`
          counter proposal, I did not understand that his proposal
          intended the `venv`s to be isolated on a per plugin basis,
          because I did not believe that to be possible.

  * BuildStream would support a `bst plugins track` command.

    In retrospect, I think that supporting this is a little bit
    costly, and much less important than the rest of the mechanics,
    I would be equally happy to just not support this.

So far, I think we can easily agree that the above grants us both a
safer experience with regards to plugins, and it would all around be a
desirable thing (correct me if I'm wrong :)).


The `git` plugin origin
~~~~~~~~~~~~~~~~~~~~~~~
Essentially, this approach implements the above solution by simply
leveraging the existing code we have to fetch source code, and applying
that to also fetch plugins.

Please refer to the original proposal[2] for a detailed implementation
plan of what exactly this would look like.

Here I will enumerate the negative points raised:

  * This proposal forces plugins to be hosted in git repositories.

  * This proposal downloads unnecessary git history of plugin
    repositories, and adds unnecessary load to upstream git
    repositories which host these plugins.

    I believe this particular point is moot when you consider that
    it is the same for regular source code from git, and that we
    already have SourceCache as a mitigation for this.

  * This proposal only automates the downloads of plugin files,
    and does not automate the installation of third party
    python library dependencies.

    This is the most relevant part of the discussion, because
    we observed that it is in fact not really even safe at this
    time for plugins to *have* arbitrary third party library
    dependencies at all (it is currently only safe for external
    python libraries which we know to really be API stable, so for
    instance, a plugin package currently should never be allowed
    to "pin" the required version of a dependency).

    I raised this detail in my initial reply to Chandan[1], you
    can search that email for my point entitled:

       "Plugins have external python library dependencies"

    Interestingly, Chandan's venv proposal would solve this problem
    by actually making it safe for plugins to have third party
    python library dependencies.


The `venv` plugin origin
~~~~~~~~~~~~~~~~~~~~~~~~
This approach would aim to implement "the solution" described above by
installing pythonic packages instead of files.

Each plugin "package" would be installed into a venv which would be
managed by BuildStream, and BuildStream would have to load the plugins
in such a way that each plugin would basically link only to it's own
individual venv.

I will start by enumerating the advantages I can see with this
solution:

  * Plugins would have freedom to import and depend on any external
    library they like, and BuildStream would ensure safety that
    multiple plugins with conflicting version requirements on the
    same dependency would not cause any clashes.

  * If we additionally converted the majority of our Source plugins
    to never use host tools, but to always prefer a pure python
    implementation wherever possible, this would very much improve
    the initial ramp up time for any user using BuildStream, as
    they would almost never need to install any host tool.

  * We would not need to bless any specific technique for hosting
    the plugins, people could have the option to publish plugins
    on PyPI, or use their preferred VCS so long as `pip` has support
    for installing from that VCS.

And the negative points:

  * Most importantly, Angelos has done some research into this
    and (as I rather suspected), a solution of this nature does
    not appear to be all that viable or trustworthy.

    See his posts here[3] and here[4].

  * This would imply a lot more bookkeeping to be implemented
    in BuildStream.

    - Bookkeeping of hashed venvs for any given plugin ref

    - Taking the python interpretor version into account in
      such hashing, so that we can recreate a new venv when the
      user upgrades their host python from python3.6 to python3.7,
      for example.

    - We would probably want to avoid downloading the same dependency
      twice for two separate plugins which require a common dependency,
      which might (eventually) mean a local PyPI mirror/cache of sorts.

    Essentially this would be a lot of local state to ensure is
    always up to date, not to mention we would have to write a lot
    of new code when compared to my original `git` origin proposal.

  * The plugin tracking implementation would probably be futile.

    We would want to support not only hosting of plugins on PyPI,
    but also allow use of any VCS which pip supports (one of the
    advantages of this proposal), but this would be very complicated
    to support for tracking purposes.

    I don't think this is a deal breaker because I don't think
    tracking of plugins is a very important part of the solution
    anyway.


Why use external dependencies ?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I think the context of why a plugin *would* want an external python
library dependency is important to consider in this conversation.

I have a pretty thorough writeup in my reply to Adam Coldrick[5],
for more details, but to summarize my opinions of that:

  * Really, the only reason is for Sources to talk to external
    tooling and implement Source.fetch() and Source.track().

  * Elements create output from input, they should not cause
    any side effects, so there is not much reason to desire
    an external library here.

  * If the venv proposal were to work, it would be great
    to maximize use of third party libraries (for the easier
    ramp-up time I described above).

  * In general, we prefer even that Plugins should always call
    BuildStream provided API, even over the stable, standard
    python library.

    If plugins mostly call only BuildStream crafted APIs, that
    gives BuildStream more freedom to add features and support
    additional platforms, by reimplementing the functions outlined
    by BuildStreams API contract, and Plugins are more portable
    this way.

    An example of this is the virtual directory APIs we've been
    adding, if we never gave the plugin the rights to access
    host file paths with the standard library at all, then we
    have more freedom to change the core and implement interesting
    things like remote execution.

As Sam Thursfield pointed out[6], the dependency on the `requests`
library could instead be satisfied by BuildStream itself providing a
more efficient API for fetching files, so BuildStream would instead
depend on the `requests` library, and have the freedom to later
redefine how this is implemented in the future (I quite like this idea
in fact).


Summary of summary
~~~~~~~~~~~~~~~~~~
So I'd like to hear what people think.

I personally think the venv approach is overkill for our needs, but
have remained open to it because it also presents advantages, as long
as it is actually feasible (which I did not believe at all at the
beginning of this thread, and was optimistic for a time, but now am
doubtful again after Angelos's assessments).

Cheers,
    -Tristan


---
[0]: https://mail.gnome.org/archives/buildstream-list/2019-April/msg00039.html
[1]: https://mail.gnome.org/archives/buildstream-list/2019-April/msg00026.html
[2]: https://mail.gnome.org/archives/buildstream-list/2019-April/msg00022.html
[3]: https://mail.gnome.org/archives/buildstream-list/2019-April/msg00047.html
[4]: https://mail.gnome.org/archives/buildstream-list/2019-April/msg00051.html
[5]: https://mail.gnome.org/archives/buildstream-list/2019-April/msg00041.html
[6]: https://mail.gnome.org/archives/buildstream-list/2019-April/msg00050.html



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]