Re: [BuildStream] Proposal: BuildStream manifest file generation



Hi Josh,

Thanks for taking the time to write this up to the list, I think it
will help clear some things up regarding #235.

On Thu, 2018-08-02 at 16:40 +0100, Josh Smith via Buildstream-list
wrote:
Hi all,

This is a proposal to allow `bst build` or `bst checkout` to
optionally produce a manifest file describing the collection of
sources used to build the output. This will allow CI to be configured
to produce a build manifest which can be referenced and compared
against for releases.

This comes from a downstream issue in freedesktop and is also
documented as buildstream issue #235.

I think it would be important to understand more precisely what
information they are interested in, as it stands issue #235 does not
itself specifically call for any new feature from BuildStream, and
could be closed without implementing anything new, i.e.:

    https://gitlab.com/BuildStream/buildstream/issues/235#note_92260502

Problem Statement
~~~~~~~~~~~~
BuildStream currently collects a collection of .bst files to
configure and build a collection of artifacts. On a release, project
maintainers may wish to provide a manifest of build sources, which
currently means raking through a collection of .bst files for
sources.

Interestingly, I don't think we understand the same thing by the word
"manifest", although I can see how both interpretations can be
interesting.

  Output manifest
  ~~~~~~~~~~~~~~~
  The list of files produced in a given artifact, or in a set of
  artifacts in a dependency tree (i.e. `--deps run` is interesting
  for a target, as it includes everything which is not a build-only
  dependency).

  Input manifest
  ~~~~~~~~~~~~~~
  A list of inputs which were used to create the project, this
  is mostly covered by the `bst show` invocation I outlined in #235,
  digging deeper than just the bst files is a bit problematic, more
  below.

Proposed Solution
~~~~~~~~~~~
When `bst build` is supplied with an option "--build-manifest" it
will produce a YAML dictionary containing the date/time of the build,
the version of buildstream used, a collection of elements and their
sources (name, url, ref). 

Here is where you hit a technical challenge.

  o A source does not have a name, although it does have a position
    in a list, inside a named target element (.bst file).

  o It is currently impossible for BuildStream core to identify what
    is the URL associated with a given Source.

    The parsing of the URL and ref and such, are in the domain of the
    plugins themselves, and while extending the API is possible, it is
    not possible to stop supporting plugins which do not implement a
    given API, the core must be able to fallback gracefully, and our
    functionality is limited by what the plugins in use happen to
    implement (or alternatively, the core can be made to abort
    gracefully when encountering plugins which do not implement
    functionality that is asked for by a given BuildStream invocation).

    Plugins are guaranteed to implement only the original set of APIs,
    what is guaranteed can be gleaned by observing what is optional in
    the plugin facing documentation:

        http://buildstream.gitlab.io/buildstream/buildstream.source.html
        http://buildstream.gitlab.io/buildstream/buildstream.plugin.html

The fact that Sources mostly happen to use the key "ref" to load what
is returned and set by "Source.get_ref()" and "Source.set_ref()" from
the YAML, and that they normally use the key "url" to load the URL from
where something is downloaded, is mostly a matter of following
established precedent, but this cannot be relied upon by the core.

Some sources have no URL, some have more than one URL; git has
optionally configurable extra URLs which allow overriding of the URLs
from whence to obtain submodules... the waters are muddy around here.

That said, the ability for the core to aggregate and report things
about sources, such as their refs and URLs, has been requested before,
usually this has been discussed in the context of additional `bst show`
functionality, or a separate `bst show` like command specifically for
sources (since the `bst show` CLI interface is not very amenable to
this, a separate command might make more sense).


This feature will be opt-in and therefore will not change the default
behaviour of buildstream while still adding a useful feature for
those users who choose to use it.

I will say right away that I am opposed to baking this kind of
additional functionality into `bst build`.

  o This would set a precedent for lumping whatever people want into
    the `bst build` command, when it comes to introspecting any
    information they might want to know about a built pipeline (or a
    pipeline they are about to build) - this information should be
    readily available through other bst commands.

  o Whenever implementing a new feature, we should be getting the most
    bang for our buck.

    This is to say that, if you want this information after a build,
    this does not mean that nobody will ever want this information at
    any other given time.

    Implementing this through another codepath, helps us provide a
    good set of scriptable bst commands which can accommodate every
    users needs, without implementing various separate codepaths
    to support these in different corner cases inside BuildStream.


Alternatives
~~~~~~~
Alternatively to this solution, this functionality could be added to
the proposed `bst artifact` subcommand as a way of listing the
current artifacts within your project. 
However I believe that producing a manfiest file from a build is more
about having a reference of the contents of the build, what sources
were used and what artifacts have been produced.

When I spoke of this, I was very much talking about an artifact
manifest, which I think should be implemented as:

     `bst artifact list-content`

However, I think this proposal is more about aggregating the refs and
URLs for Sources used by elements when building the project, rather
than computing a manifest of what output was produced (which the above
should be able to achieve).

If this is about urls and refs from sources; I think this needs to go
back to the drawing board.


In the off chance you were hoping for a hybrid of both, baked into the
same interface, I again would much prefer to offer flexible tools which
allow you to achieve the same thing, by scripting your calls to the
BuildStream CLI interface.

I.e. just because you might want a list of artifact content as a part
of the file you want to generate; does not mean that nobody will ever
want to view artifact content for any other reason: We should be
implementing the means for you to generate this file, by providing a
useful set of tools; but we should not be generating the whole thing
for you.

This would be yet another stable API surface to maintain, and the
situation is worsened by trying to cover everybody's different use
cases for such a file in the same API. Maybe one person will want it in
JSON while another wants XML; maybe one person wants to add data that
others are not interested in, causing everyones metadata to grow in
orders of magnitude with more data they didnt ask for, all of this can
be avoided by just providing the means for you to generate the file you
want.

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]