Re: [BuildStream] Proposal: BuildStream manifest file generation



Hi Tristan,

Thanks for the input, I've addressed points below and had a few thoughts about improving this proposal.

Proposed Solution 2.0
~~~~~~~~~~~~~~
Project configuration will provide a `manifest:` configuration for selecting what data will be included in the manifest. (The 'manifest' may need to be renamed for clarity)

    manifest:
        elements:
            junction:
            - options
        sources:
            git:
            - ref
            - url
            tar:
            - url
            - ref

This configuration would enable all git and tar sources to include ref and url in the manifest as well as having junctions state their options.
Regarding source names, I think it is best to identify sources by index and kind since there is no field for identifying sources.
Potentially the ability to optionally name sources could be quite useful to this feature.


Providing a manifest configuration will invoke a manifest to be produced during a build (maybe checkout?). The manifest can most likely also be
generated through other means such as providing `--build-manifest` to commands such as show.

Element() and Source() will get buildstream private functions to build their own manifest section using the
manifest configuration available through their associated project.

Side Note: Currently, config validation is performed within Configure because each Plugin knows which config keys are valid.
If we could make this a constant that is defined by the plugins, then validation can be performed for both manifest configuration
and before we call `Configure()` (We could avoid changes to backwards compatibility by only performing these validation checks
when the constant is actually populated)


On 03/08/18 08:09, Tristan Van Berkom wrote:
Problem Statement
~~~~~~~~~~~~
BuildStream currently collects a collection of .bst files to
configure and build a collection of artifacts. On a release, project
maintainers may wish to provide a manifest of build sources, which
currently means raking through a collection of .bst files for
sources.
Interestingly, I don't think we understand the same thing by the word
"manifest", although I can see how both interpretations can be
interesting.

  Output manifest
  ~~~~~~~~~~~~~~~
  The list of files produced in a given artifact, or in a set of
  artifacts in a dependency tree (i.e. `--deps run` is interesting
  for a target, as it includes everything which is not a build-only
  dependency).

  Input manifest
  ~~~~~~~~~~~~~~
  A list of inputs which were used to create the project, this
  is mostly covered by the `bst show` invocation I outlined in #235,
  digging deeper than just the bst files is a bit problematic, more
  below.
This is definitely worth clarifying I agree, perhaps we need a clearer name for any potential feature in this area.
As you made out, this feature is mostly relating to the `Input manifest` in terms of defining which sources were
used and their 'urls', 'refs' etc.

Proposed Solution
~~~~~~~~~~~
When `bst build` is supplied with an option "--build-manifest" it
will produce a YAML dictionary containing the date/time of the build,
the version of buildstream used, a collection of elements and their
sources (name, url, ref). 
Here is where you hit a technical challenge.

  o A source does not have a name, although it does have a position
    in a list, inside a named target element (.bst file).

  o It is currently impossible for BuildStream core to identify what
    is the URL associated with a given Source.

    The parsing of the URL and ref and such, are in the domain of the
    plugins themselves, and while extending the API is possible, it is
    not possible to stop supporting plugins which do not implement a
    given API, the core must be able to fallback gracefully, and our
    functionality is limited by what the plugins in use happen to
    implement (or alternatively, the core can be made to abort
    gracefully when encountering plugins which do not implement
    functionality that is asked for by a given BuildStream invocation).

    Plugins are guaranteed to implement only the original set of APIs,
    what is guaranteed can be gleaned by observing what is optional in
    the plugin facing documentation:

        http://buildstream.gitlab.io/buildstream/buildstream.source.html
        http://buildstream.gitlab.io/buildstream/buildstream.plugin.html

The fact that Sources mostly happen to use the key "ref" to load what
is returned and set by "Source.get_ref()" and "Source.set_ref()" from
the YAML, and that they normally use the key "url" to load the URL from
where something is downloaded, is mostly a matter of following
established precedent, but this cannot be relied upon by the core.

Some sources have no URL, some have more than one URL; git has
optionally configurable extra URLs which allow overriding of the URLs
from whence to obtain submodules... the waters are muddy around here.

That said, the ability for the core to aggregate and report things
about sources, such as their refs and URLs, has been requested before,
usually this has been discussed in the context of additional `bst show`
functionality, or a separate `bst show` like command specifically for
sources (since the `bst show` CLI interface is not very amenable to
this, a separate command might make more sense).
I see this problem as an API change definitely. In my branch, I assumed the url/ref keywords within buildstream
core to get a quick working example but this is not how I envisage the final implementation.

I think that we can implement some form of API within Element and Source to provide their own 'input manifest',
this would be implemented within all core buildstream elements and sources and ignored for those that don't
(With a warning that can be configured as fatal). Realistically, this will leave manifests missing some core sources
for some time until plugins update to support this new API.

Existing functionality will be fully backwards compatible and this new feature will be available to those projects using updated plugins.

An API addition was my initial idea, this could still work but would require more work on the side of plugin developers.

This feature will be opt-in and therefore will not change the default
behaviour of buildstream while still adding a useful feature for
those users who choose to use it.
I will say right away that I am opposed to baking this kind of
additional functionality into `bst build`.

  o This would set a precedent for lumping whatever people want into
    the `bst build` command, when it comes to introspecting any
    information they might want to know about a built pipeline (or a
    pipeline they are about to build) - this information should be
    readily available through other bst commands.

  o Whenever implementing a new feature, we should be getting the most
    bang for our buck.

    This is to say that, if you want this information after a build,
    this does not mean that nobody will ever want this information at
    any other given time.

    Implementing this through another codepath, helps us provide a
    good set of scriptable bst commands which can accommodate every
    users needs, without implementing various separate codepaths
    to support these in different corner cases inside BuildStream.

I agree, this should be implemented in a generic form so that various bst commands can make use of this in one
way or another. I suggested `bst build` as it is where I initially saw a primary use case for producing an
input manifest during builds.

This would be yet another stable API surface to maintain, and the
situation is worsened by trying to cover everybody's different use
cases for such a file in the same API. Maybe one person will want it in
JSON while another wants XML; maybe one person wants to add data that
others are not interested in, causing everyones metadata to grow in
orders of magnitude with more data they didnt ask for, all of this can
be avoided by just providing the means for you to generate the file you
want.
I realise that adding to the element/source API increases the amount of work maintaining said API which is why I agree that we
should try and minimise this. Regarding file types, I don't think that writing to a variety of different file types is a particularly large workload,
from a dictionary we can get to writing each of these file types in a fairly straight forward manner.






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]