[BuildStream] Machine readable output [WAS: Proposal: Workspace related DX features & design]

From: Tristan Van Berkom <tristan vanberkom codethink co uk>
To: Sander Striker <s striker striker nl>
Cc: William Salmon <will salmon codethink co uk>, buildstream-list gnome org
Subject: [BuildStream] Machine readable output [WAS: Proposal: Workspace related DX features & design]
Date: Fri, 07 Sep 2018 15:58:19 +0900

Hi all,

On Thu, 2018-09-06 at 13:01 +0200, Sander Striker wrote:

Hi,

[...]
On Thu, Sep 6, 2018 at 12:42 PM Tristan Van Berkom <tristan vanberkom codethink co uk> wrote:

On Thu, 2018-09-06 at 11:42 +0200, Sander Striker wrote:

I don't think you missed the discussion.  I missed the mention of it
in the paragraph above, or at least didn't consciously pick it up.  I
think the topic is much broader in scope than the workspace UI.


Agreed,

It only overlaps because of `bst workspace list`, and I would hope to
get all plausible API breaks out of the way for peace of mind if that
is alright :)


That's fine, but we should probably split the thread, as people that
have muted it because they don't care as much about the workspace UI
might lose out.


You are absolutely right, I should have changed the topic, and have
done so in this reply to hopefully get some more eyes on this.

For anyone jumping into this topic right now, I recommend going back
and reading my opening message 2 emails back, which is here:

    https://mail.gnome.org/archives/buildstream-list/2018-September/msg00014.html

[...]

I think introduction of a --non-interactive option is almost the
equivalent of requiring the specification of --format when the
default output is considered unstable.  The biggest difference is
that we now no longer have a common default for machine readable
output, so everyone writing a script needs to start with looking what
to put in the format string, rather than looking at a reasonable
default.


I have always viewed the `bst show` formatting option as mandatory
for scriptability, but this is admittedly an unspoken view.

Mostly I viewed it this way because parsing the default output is
impractical, as such we have changed the defaults here without much
consideration, this is my fault.

For a use case of post processing captured output, we then require
coordination on what format string to use.  I could see this in the
case of the bst artifact family, which you raise above, where it is
expected that scripts exist to provide additional data based on the
parsed output.  I don't necessarily think that we would need to rerun
the same bst command multiple times just for different format
strings, if there is no real need other than a baked in --format.


We should not have to re-run the same `bst` command multiple times, the
`--format` argument as implemented in `bst` show allows for digging out
everything possible for a given "record" (or "element" more
specifically, in the case of `bst show`).


Right, but there is no 'machine readable default'.  So one script
might like things in one order, another in a different order.  As
long as they incorporate calling bst show, things are fine.  But
scripts that you run based on captured output, need to all agree on
the format.  Which is where a default is useful.

The introduction of a --non-interactive flag allows for tweaks that
benefit humans, without having to worry about interfering with
machines.


Here things get a bit more interesting:

  * We already have the --interactive/--no-interactive switch

  * This switch is turned on automatically if connected to a tty,
    otherwise it is turned off.

  * Running a command like `bst show element.bst | cat`,
    or running the following in a script:

      element_data=$(bst show element.bst)

    Effectively disconnects `bst` from the tty, regardless of whether
    the command was actually launched by a user from a shell.

If we wanted to provide API stability for the default formatting of
commands like `bst show` or tentatively `bst artifact show`, while
retaining the liberty of changing the defaults for the sake of users
only (is this what you are getting at ?), then would could effectively
do this with the existing "interactive" switch; without even requiring
that scripts ever specify `--no-interactive` in their invocations.


That was what I was getting at.

But, considering that usually we would expect a script to `--format`
their invocations to "show" commands for their own convenience anyway
(even if only for the sake of separating outputs with a `::` separator
or suchlike), would it make sense to keep API stability for the default
outputs at all ?

I feel like if we mandate scripts to specify `--format` if they want
API stable results, it is just as onerous as requiring a different
flag, while they will probably want to specify `--format` anyway, so
maybe it's better to only advertise API stability for invocations which
specify `--format` ?


The only downside I see is the lack of a machine readable default. 
Other than that I think this is fine, and we should just document
that the default is not considered stable.


Then we can agree on not changing the current approach, which I like :)

While it is possible to parse the default (unstable) output of
`bst show` or similar source level or artifact level commands which I
think should follow the same API pattern, probably with `awk`, I feel
that parsing the output of these commands without having specified the
`--format` option is unwieldy; the `--format` option is there to make
it easier for scripting, amongst other things.

For humans, it might make sense on the other hand to provide more API
unstable options, like `-l/--long` output for instance.

This would allow extracting more output for a given record in a human
readable way, without requiring specification of a complex format
string.

An alternative way of thinking about it is to have a --yaml or --json 
option for structured output, which scripts could use to extract data
without specifying format.


I rather dislike this option because it implies an all or nothing data
extraction, where some of the fields can be more expensive to extract
than others.

For instance, if we are not interested in the 'state' string from
`bst show`, then we need not interrogate the source cache or artifact
cache.

In addition to the above, I am hopeful to push any additional formats
and data representations (including their revisioning, stability, etc)
out of scope for BuildStream itself, and just sleep soundly in the
knowledge that anyone can generate these for their own purposes quite
easily with a script.

Even without a script, it should be quite possible to generate JSON or
YAML output with `bst show` already, consider for example:

  bst show --deps all \
           --format '{ "name": "%{name}", "key": "%{full-key}" },' \
           target.bst

Note that the above currently causes a crash which I just discovered
while experimenting with the above. It looks like the regex for
variable substitution needs to be fixed to ensure we capture the first
closing `}` instead of the last one, this should be simple enough to
fix, though.

Cheers,
    -Tristan

References:
- Re: [BuildStream] Proposal: Workspace related DX features & design
  - From: Tristan Van Berkom
- Re: [BuildStream] Proposal: Workspace related DX features & design
  - From: Sander Striker
- Re: [BuildStream] Proposal: Workspace related DX features & design
  - From: Tristan Van Berkom
- Re: [BuildStream] Proposal: Workspace related DX features & design
  - From: Sander Striker

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]