Re: BuildStream as a Library



Hi Sander,

Ok so, this will require some more thought...

On Tue, 2018-04-24 at 13:08 +0000, Sander Striker wrote:
Hi,

On Tue, Apr 24, 2018 at 8:53 AM Tristan Van Berkom <tristan vanberkom codethink co uk> wrote:
On Mon, 2018-04-23 at 16:17 +0000, Benjamin Schubert wrote:
Hey Everyone,

Hi Benjamin,

We were wondering if BuildStream is or will ever be considered as usable as a 
Library? Do you provide a stable API other than plugins and config files,
or would it be possible to help design an API that could be used by another 
python program.

[...] 
First of all I'd like to say that I'm enthusiastic about some of the
things which could be done with BuildStream as a library, and have
designed things such that the frontend code is fairly isolated and
separate from the core.

However, I have to say that I'm highly doubtful that we can offer a
stable API in the short term, maybe this is something we can consider
for 1.4 (the next, next stable series). There is a fair amount of
refactoring I want to get done especially around _pipeline.py, and the
initialization codepaths; also I think we need the freedom to churn the
internals for some time.

I think that is fine.  Currently having an understanding of what
would be the API surface, the burden is on the authors of this
external tool to deal with API instability for the time being.  This
can be achieved by pinning to BuildStream releases or by actively
tracking changes and keeping up to date.

Looking forward, the last piece of mess to untangle is the Pipeline,
which unfortunately is the main calling interface for executing
commands at the same time as being the code where pipelines (lists of
elements) are constructed.

Untangling this into separate pieces is the first step towards any sane
longer term public API surface.

That said, I'd like to highlight a few things:

* Your tooling needs to get the original tracking urls and
  repositories related to open workspaces

* Open workspaces risk diverging from the BuildStream project
  definitions, i.e. a user may very well update their project
  through the VCS and still have open workspaces, in which case
  the branches etc related to the open workspace can change.

This is actually expected for this tool; the branch will be used to generate
a pull request from, with the target either being the tracking branch or
the default branch of the repository.

* The `bst workspace list` command is probably one of the fastest
  running codepaths, as it need not require loading the project,
  it also reports parsable YAML intentionally, which is a stable
  interface (in contrast with the private .bst/workspaces.yml file
  which is not a stable interface).

Seeing as I feel it's unrealistic to open up more public facing python
API at least for a fair amount of time, do you think it would suit your
purposes instead to record and track more information in the workspace
metadata, such that it can be reported though `bst workspace list` ?

Do you suggest copying the information that the Source plugins have
on the element into workspaces.yml?  I'm not sure if making that
information part of the public API of `bst workspace list` is
desirable.  I'm also not sure if that would open us up to the
information going stale.

This of course depends on the perspective, the reason why I was
suggesting this approach is entirely because I expect the tracking
information which was used at `bst workspace open` to be more relevant
than whatever data happens to be in the project at a given time.

Of course, tracking branches can also have been updated in the
workspace independently of the project, so the relevant information
here really depends on what you mean to do with it - in the end, you
might find yourself wanting both and presenting the user with a choice
when things diverge.

Would we include all tracking data, including refs?

Right, one way would be to serialize the original source definition
YAML *as is*.

One thing that will likely be an obstacle for this is that BuildStream
itself has no idea what the tracking information *is*, we only delegate
the activity of tracking to Source implementations, and recommend as a
matter of consistency, that Source implementations use the word "track"
in their own custom YAML configurations to represent a tracking branch.

This may boil down to introducing an ability, from a library
perspective, to report the original Source configuration YAML as loaded
by BuildStream. The alternative being forcing plugins to be able to
report symbolic tracking information as we do with "refs" (which we do
purely in order to have the ability to save them where we want to after
performing a delegated tracking operation).

In this way, we could potentially record the layouting of sources in a
workspace, their upstream URLs and tracking metadata at the time
someone opens a workspace, such that it could be reported in a stable
way; and also have the benefit of being guaranteed to be the
information which was in use when the workspace was opened.

I think I'm more +0 than +1 on this approach.


Yes I see.

Another thing that we are currently lacking in BuildStream is a `bst
show` semantic to report information about Sources.

Since this doesn't really fit into the `bst show` API (i.e. 1:N mapping
of elements:sources makes the API weird), it may require a separate
command for showing details about sources.

This might be another alternative avenue for extracting the data you
want from BuildStream with a stable API sooner.

To be honest, use cases I can imagine to be more interesting for having
the BuildStream engine exposed in library form might include:

* Arbitrary different frontends, perhaps a graphical UI frontend to
  running BuildStream

* CRUD tools, UIs for modeling pipelines

* Other analytical purposes, anything which might require iterating
  over a loaded pipeline and possibly reading annotations in public
  data for whichever reason

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]