[BuildStream] [Proposal] Plugin fragmentation / Treating Plugins as Sources



Hi all,

I'm afraid this is going to be a long and beefy proposal, I have
included a couple of TLDRs inline for those who want to gloss over
this.

This is a more beefy and more specified version of the casual proposal
I raised last November, to leverage the same mechanisms we use for
sources in order to obtain and use plugins:

    https://mail.gnome.org/archives/buildstream-list/2018-November/msg00065.html

The main motivation for this proposal stems from apparent friction
around fragmenting of plugins. On the one hand, it is convenient for
some to have more fragmentation, and on the other hand, it is more
convenient for others to have all the blessed plugins in a central
location such that they are easier to package and discover.

This proposal aims primarily to eliminate the technical problems
surrounding fragmentation of plugin repositories, thus allowing us to
maintain plugins in many separate repositories without causing any
inconvenience.

A very detailed proposal follows below.

Cheers,
    -Tristan


Problem statement(s)
--------------------

Fragmented vs not fragmented
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TLDR: There are advantages to maximally fragmented plugins across
      various repositories, and also advantages to centrally located
      plugin repositories (e.g. bst-plugins-good).

      This proposal attempts to reconcile the above by stating that:
      * Plugins should be split up into domain specific repositories
      * Plugins should not be packaged at all
      * Plugins are source code, and BuildStream will automatically
        obtain them in the same way that we track and fetch any other
        source code, so that it is a painless experience for the user.

In our meeting way back in October, we started talking about removing
plugins from BuildStream core.

One of the main motivators for this was that by having plugins in core,
we create an ecosystem where it is perceived that a plugin should
eventually land in the upstream BuildStream git repository - and that
would symbolize the plugin as being "blessed" and maintained in
BuildStream.

We discussed a few approaches to how we could split plugins, and at the
time we decided that if we were to have a single, separate repository
where BuildStream maintained plugins would reside; it would only move
the same problem of perception to a different repository.

While we did not end up with a concrete plan, general consensus was
that we should fragment plugins into separate domain specific
repositories.

When we finally started discussing how we would fragment plugins,
Mathieu Bridon raised some valid concerns which we had not considered
at the time:

    https://mail.gnome.org/archives/buildstream-list/2019-February/msg00059.html

Essentially we would be making life harder both for downstream package
maintainers and also for users who deserve something that "just works",
without needing to know what various plugin packages one might need to
install before running `bst build`.

My aim with this proposal is to change the landscape such that this
friction is unneeded, and that we can have as much fragmentation of
plugins as we like, without causing any unnecessary pain to downstream
packagers or users.


Stable vs unstable
~~~~~~~~~~~~~~~~~~
TLDR: There are reasons to support both stable and unstable plugins;
      stable plugins allow long term stability of cache keys, and
      unstable plugins allow more freedom of movement.

      This proposal attempts to address the above by stating that:

      * A project can always pin the exact version of a plugin by
        retaining it's 'ref'

      * A project can track new version of the plugin in the same way
        it tracks source refs

      * A project can choose which plugins are "trusted" to be stable,
        and decide whether to trust the plugin's BST_ARTIFACT_VERSION,
        or whether the plugin's 'ref' should be used instead

For the majority of the lifetime of the bst-external repository, we
have been recommending that it should be used as a 'local' plugin and
added to a BuildStream project as a git submodule.

Since these plugins were not stable, it would not be safe to install
them in a common location in advance of having stable API guarantees.

But, people hate submodules, and even in advance of bst-external having
become accidentally declared stable, downstream projects have already
been recommending that people install the plugins on their system:

    https://gitlab.com/freedesktop-sdk/freedesktop-sdk/blob/master/project.conf#L256
    https://gitlab.gnome.org/GNOME/gnome-build-meta/blob/master/project.conf#L302

Further, unstable plugins have not necessarily been following the rules
described here:

    https://docs.buildstream.build/buildstream.plugin.html#buildstream.plugin.Plugin.BST_FORMAT_VERSION
    https://docs.buildstream.build/buildstream.element.html#buildstream.element.Element.BST_ARTIFACT_VERSION

NOTE: For BuildStream core, we don't increment these versions either;
      since they are distributed with the core we deem it safe to only
      control these versions at a global level instead of individually
      for each plugin.

Of course, this has resulted in some unexpected changes in build output
for the same cache key (depending on what version of bst-external was
installed), this has resulted in a recent feature request to allow more
strict control; allowing one to declare that if a given plugin changes
in any way, it should be considered in the cache key:

   https://gitlab.com/BuildStream/buildstream/issues/953


Proposal
--------
While this proposal has multiple problem statements, the proposed
changes will be equally multi pronged.


Project facing API
~~~~~~~~~~~~~~~~~~
In order for a project author to use a plugin with this method, we
would introduce a new "plugin origin", as those are described here:

    https://docs.buildstream.build/format_project.html#external-plugins

Entries in the plugin list using this new "origin" would be expressed
much like a git source, and additionally allow specification of some
other details such as whether the plugins in the given origin are
"trusted".

Unlike other origins, I hope we would need to support the dictionary
which defines which format version number of each plugin is required,
this information I think is redundant if we already control the version
which will be used via a git 'ref' of the plugin repository.

We could call this origin the "git" origin:

Example:

  plugins:

  # The declaration of a single git origin
  - origin: git

    # Regular git related fields
    url: https://foo.org/bar.git
    track: master
    ref: ....

    # Where the plugins are located inside the repo.
    #
    # These could default to 'elements' and 'sources',
    # such that users need not actually specify them.
    #
    element-path: elements
    source-path: sources

    # Whether plugins from this repo are "trusted"
    trusted: True

For the purposes of tracking, I think it is important to perform
tracking of plugins completely separately from elements and junctions.

This would mean we would need to introduce a new command:

   bst source track-plugins

or even just:

   bst track-plugins

We could extend this to allow tracking of individual plugin repos, but
this would require naming the git origins described in the project.conf
above, which does seem a bit more wordy than necessary.

If people prefer that each origin be named, and that we can track them
individually, I can do that.

For projects which use `project.refs` storage; plugin refs would be
stored separately in `plugin.refs`, similarly to how junction refs are
stored in `junction.refs`.


Fragmentation and hosting of plugins
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section does not relate to the implementation, but rather the
wider scope plan of how we intend to split up plugins.

If we go ahead with this proposal, I would prefer to have a maximum
fragmentation of plugins.

This would have the following benefits:

  * Projects have finer grained control of which version to use for
    each plugin they do use.

  * Plugin developers have more freedom, they can do more experimental
    things in branches and test them out in projects without needing to
    synchronize with other plugins in the same repository.

  * Clearer documentation story.

    I would imagine that upstream BuildStream maintains only a list of
    all plugin repositories and those are easily identifiable by their
    name.

    The list should link to the documentation of each individual plugin
    and hopefully resemble the existing lists here:

      https://docs.buildstream.build/core_plugins.html

I would imagine that the existing plugins we do currently maintain as
"core BuildStream" would be placed in subgroups of the BuildStream
group on gitlab.

Something like the following would be pretty intuitive:

    https://gitlab.com/BuildStream/source-plugins/ostree
    https://gitlab.com/BuildStream/source-plugins/bzr
    https://gitlab.com/BuildStream/source-plugins/tar
    ...

    https://gitlab.com/BuildStream/element-plugins/autotools
    https://gitlab.com/BuildStream/element-plugins/meson
    ...

Note that contrary to other misinterpretations of my casual proposal
last november, there is no infrastructure or registry or anything of
the sort to be maintained, we just download plugins in the same way we
download sources.

The only thing to maintain is documentation which pertains mostly to
discoverability of existing plugins.

Further, I would propose that plugins which exist in these
'source-plugins' and 'element-plugins' subgroups represent the
"blessed" set up upstream plugins which upstream has agreed to
maintain.

For the documentation page, I would propose a second section of not
blessed plugins where we have a very lax policy - i.e. anyone who wants
us to link to a plugin they have developed from our documentation will
be accepted, we just add a link to their repository in the "external
plugins" section.


GitOrigin
~~~~~~~~~
For starters, this will involve creating a Source implementation to
back the above API. Let's call this object a "GitOrigin" as it fits
with the API above.

This Source will of course be something like a `git` source, and will
derive most functionality from `_gitsourcebase.py` (this means that
even if we do move the `git` source itself outside of BuildStream, we
need to keep `_gitsourcebase.py` inside BuildStream core).

Instead of loading GitOrigin through the regular plugin mechanics, the
BuildStream core will just instantiate these source directly.


SourceCache
~~~~~~~~~~~
I have not given much consideration to how this plays with the new
SourceCache.

My intent is to have GitOrigin actually be a Source and behave in the
same way as much as possible, it will fetch into the regular 'sources'
location on the local filesystem.

I think it is ideal that SourceCache is also used for GitOrigin
plugins, but non-essential, as the plugins themselves are not needed in
a remote execution context (there is no need to share them with remote
workers in a shared CAS).

As the above is equally true for junctions as it is for plugins, I
think an initial implementation will behave the same as junctions do.


Cache Keys and "trusted" attributes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Plugins which have been loaded via a GitOrigin will be handed the
"trusted" and "ref" attributes of the creating GitOrigin at
instantiation time, allowing the the base plugin classes to contribute
to the cache key differently.

I.e. they can consider the "ref" instead of the BST_ARTIFACT_VERSION in
the case of an untrusted/unstable plugin.


Program Flow
~~~~~~~~~~~~
The behavior will be similar to that of the loading of junctions.

I think there are already some inconsistencies as to when we do or dont
fetch junctions at startup time, I would hope to reconcile the behavior
or fetching junctions to be more consistent (behave in the same way for
every command if possible) and I would then make the fetching of
plugins have the same behavior.

The fetching and staging of GitOrigin plugins will occur in the first
pass of the project load. After resolving any aliases and such which
might be used in the origin itself, and before loading any elements or
attempting to search for junctions.

Also similarly to junctions, the plugins will be staged in the
project's local state directory under a directory named by the cache
key of the GitOrigin itself, at:

    .bst/staged-plugins/${CACHE_KEY}/

It would seem to be nice if one were able to YAML include '(@)' the
GitOrigin across junction boundaries, but I don't think this is easily
possible as it would be a rather circular exercise: we at least need
the source plugins in order to fetch a junction which provides the file
to include defining where that source plugin comes from.

I think by extension of the same logic, and due to current
implementation of the load cycle, it may also not be possible to use
conditional statements of project options around loading of the
plugins. This seems conceptually possible and if so, then it will just
work; but I don't think it is important to support for an initial
implementation and I am not considering this as a priority.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]