Manifests and metadata



Hi all

I had a go at building a minimal VM image with BuildStream and I found
that my split rules aren't working particularly well -- lots of junk
ends up in the image that shouldn't be there.

I also found we don't have any good way of debugging the split rules. I
can `bst checkout` my 'compose' element and see what it contains, but I
can't easily reference a given file back to the artifact that produced
it or the split-rules domain that led to it being there.  There needs to
be some way of getting hold of a manifest that contains this
information.

Manifests are in fact already proposed:

  https://gitlab.com/BuildStream/buildstream/issues/82

The exact form isn't set, but hopefully something like this:

    files:
    - filename: filename1
    - filename: filename2

This proposal won't tell me where the files came from though, just
what files exist.  So I'd like to propose extending this to be
something like:

    files:
    - filename: filename1
      artifact: gcc
      domain: docs

    - filename: filename2
      artifact: gcc
      domain: devel
      integration: true     # The integration step changed/added this

    - filename: filename3
      artifact: gcc
      orphan: yes           # No split rule claimed this file

I did a very rough prototype of this approach and it works fine, and
already helped me fix my split rules a bit.

However, there are a few complications:

  * Should this info only be provided for compose elements? It's
    largely superfluous for other kinds of artifact.
  * How are end users supposed to access this info ?
  * The manifest.yaml will be multiple megabytes in size
  * Loading it may be slow (see existing research in issue 82)

In terms of accessing the info, we have previously discussed extending
`bst checkout` to check out artifact metadata as well as contents. I
prototyped that and it works OK:

  https://gitlab.com/BuildStream/buildstream/merge_requests/139

Alternately, we could extend `bst show` to print the manifest. I am a
little uneasy about this because it's a lot of info, where presently
`bst show` mostly deals with one-liners.

Also, this might be the first --format variable that requires the
element to already be built.

The size issue could be mitigated by compressing the yaml, although
this would also require benchmarking.

Alternately, we could avoid storing this info in the artifact and
instead have a `bst manifest` or `bst show --format=%{manifest}`
command that generates it on request.  This could be extended to do more
cool stuff too and might be really useful for tracking provenance of
deployed artifacts.

Thoughts ?

Sam

--
Sam Thursfield, Codethink Ltd.
Office telephone: +44 161 236 5575


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]