[BuildStream] Reboot discussion: bst artifact subcommand group



Hi all,

As Sander pointed out the other day[0], it seems like this one fell
through the cracks and it would be good to get a summary here, and
touch base one more time before filing some issues and get some tasks
started to attack this.

So I've spend some hours sifting through the discussion and refreshing
my memory on this, here goes with the summary (or reboot of the
discussion).


Commonality
-----------
This thread went into individual commands which should go into the
group, but a lot of the discussion was what behaviors should be common
to all of the `bst artifact` subgroup commands, lets first take a look
at what we discussed at this high level.

Unless there are arguments, I think we can assume consensus that:

  - The way an artifact argument is expressed on the command line
    should be consistent across all artifact commands.

  - The `--deps` argument should also be supported for artifact
    subgroup commands.

More details on these follow...


Specifying artifacts on the command line
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All the commands in the subgroup should accept the same format
for specifying artifacts, this includes:

  - An element name: If element(s) are specified, they need to
    be loaded and have their cache keys resolved, specifying an
    element implies that we want to specify the artifact with
    the corresponding cache key.

  - Artifact name, which is:

       "%{project-name}/%{element-name}/%{cache-key}"

  - Glob pattern matching would be important, in order to support
    commands where we can observe all of the artifacts for a given
    element, e.g.: `bst artifact show project/element/*`

    This could be an augmentation but rather easy to implement given
    the rest of the machinery.

  - Tab completion should be supported for artifact names

  - As Sander pointed out[1], it would also be good to be able to
    address artifacts with *only* a cache key.

      This spawned a bit of a side discussion about artifact
      reusability in the case of project or element renames.

      Summary of this is that it is certainly agreed that it would be
      desirable to allow addressing artifacts by their cache keys, and
      it is a mistake that this detail was overlooked, as the
      implication is that we cannot reuse artifacts for renamed
      elements.

      Caveat: since the fully qualified source URLs reside in the
      project.conf specified aliases, one crux if this issue is that
      project name should probably be tied to cache keys, otherwise
      the assumption is that source refs alone are strong enough to
      uniquely identify sources, without the URLs identifying where
      they came from.

      This conversation is related to the recently filed issue #1034[2]

    I still believe that while BuildStream should internally use only
    cache keys for dependency relationships internally, it is a much
    better user experience to use fully qualified artifact names on
    the command line[3], so I think both should be supported.

    I think the right thing to do here is to:

      (A) Fix buildstream internally to be able to address artifacts
          as only cache keys but also preserve artifact names for
          user convenience, solving #1034 first, and allowing better
          reuse of artifacts with renamed element names.

      (B) Once #1034 is solved, then we can additionally support cache
          keys on the command line for the artifact commands.

    This should not block work on the artifact commands.


Specifying artifacts recursively: `--deps`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
As with the regular commands, we should have symmetry with the `--deps`
arguments for `bst artifact` commands.

This implies that we need some algorithm to reconstruct dependency
graphs from the artifact metadata instead of from the project files.

NOTE: The `bst artifact checkout` command was implmenented in the
      knowledge that this was unfinished. As such the `--deps run`
      option for checking out artifacts is only supported when
      an element is specified, but not yet supported when an artifact
      is specified.

      This is the origins of the ArtifactElement in BuildStream, as is
      it will be important to have a less than fully qualified element
      structure. It was discussed that we could potentially load real
      elements from artifacts but this could cause trouble especially
      as you might not have the plugins locally for the artifacts you
      might want to manipulate.

This is useful for pretty much any conceivable artifact subgroup
commands, such as:

  - pull

    After pulling one artifact, read it's metadata and derive what else
    needs to be pulled as we go along, populating a local cache with
    exactly only the artifacts needed for the dependencies of that
    specified artifact.

  - push

    Construct the graph from local artifact metadata and push to
    a remote cache, only exactly the artifacts for a specific build

  - show

    Show artifacts recursively by reconstructing the build graph
    from artifact metadata

  - log

    Show the log files one by one in a pager (if connected to
    the terminal, otherwise directly to stdout).


Commands
--------
There was not much discussion on the individual commands proposed.

I think there is pretty much automatic consensus on:

  - push
  - pull
  - checkout
  - log

Other commands originally proposed were, but I couldn't really say
there is consensus as it was not discussed very much:

  - diff

    Show the added/removed/changed files between two artifacts

  - list-content

    List the content of an artifact, optionally showing sizes
    of individual files (think something like `tar -tf file.tar`
    but for artifacts)

  - list

    list some artifacts specified on the command line, showing
    artifact creation date and overall size and any interesting
    information about the artifact

Later in the discussion, it was pointed out that a lot of the `list`
stuff would be more appropriate and symetrical with `bst show` if
implemented as a separate `bst artifact show` command[3], we didn't go
too much into detail but I think it's a no brainer that we should have
this symmetry.

One suggestion I might for a potential change here, is that I think we
can drop `bst artifact log` completely. The log is essentially a
component of an artifact anyway, and we should be able to achieve the
same with `bst artifact show --format %{log} <artifact>`, also this is
more naturally scriptable when one wants to collect all of the logs of
all of the artifacts for a given build and generate a report.

Cheers,
    -Tristan


[0]: https://mail.gnome.org/archives/buildstream-list/2019-June/msg00003.html
[1]: https://mail.gnome.org/archives/buildstream-list/2018-September/msg00028.html
[2]: https://gitlab.com/BuildStream/buildstream/issues/1034
[3]: https://mail.gnome.org/archives/buildstream-list/2018-September/msg00030.html



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]