Re: [BuildStream] bst artifact show - PROPOSAL



Hi Tristan,

Thanks for taking the time to reply. I'll leave some replies inline.

On 2019-08-05 18:57, Tristan Van Berkom wrote:
Hi James,

On Mon, 2019-08-05 at 08:38 +0100, James Ennis via buildstream-list wrote:
Hello list,

This post is in response to Ben's proposal of the `bst artifact show`
command [0] and the replies from myself, Tristan and Sander [1].

[...]


Just a few points:

  * You say that the command takes a list of artifact refs and element
    names as targets.

    I wonder if it should be allowed for these to be interchangeable,
    or if it will result in a difficult to load build graph with
    elements loaded from the active working tree interspersed with
    artifacts build from the same tree in a different state (same
    element more than once with different cache key in memory).

    We'll probably discover any limitations here in implementation.

  * There was no mention here of the `--deps` argument.

    I think it's important to have the `--deps` argument for this
    command, and just pointing out that this is going to require some
    work to load ArtifactElement objects recursively in memory.

It seems that your first two points are closely related, so I'll reply
to them both here.

So most of my inspiration for this came from the fact that we should try
to be consistent with our current artifact subcommands. However, all of
our current subcommands are not currently consistent! The subcommands I
was trying to be consistent with were `bst artifact log/delete`, which
do not support a --deps option. So, we don't actually ever load a
dependency graph - regardless of the arguments.

In hindsight, I agree that a --deps options is important so we should
ensure this.

As things currently stand, given an artifact ref (or list of refs), we
do not have support for loading a dependency graph but, given a Element
name, or a list of Element names, we do.

The above is the exact reason why `bst artifact push/pull/checkout` do
not support taking artifact refs as arguments.

Way back when Richard was working on this (!1008 [0]) there was some
(ugly) logic in the frontend which bailed out of `bst artifact checkout`
if `--deps` was not None AND a ref was present [1].

So we could implement something like this for `bst artifact
show` and that should (partially) address your first two points. But it
seems to me that until we have support for loading a dependency graph
from an artifact ref, there isn't a "cleaner" way around this (unless
I'm missing something!).


  * Re-raising again that there was discussion around it being
    desirable to also address artifacts only by cache key (without
    needing to use the full ref, making the full ref name mostly
    just for user convenience).

Ah, yes. I vaguely remember reading this. I agree with what you say
about leaving this out as a separate topic. I'll look for/file an issue
and mark it as a follow up to this work once this discussion has
settled.


It's also not entirely clear to me how you intend to implement querying
of remotes, for instance, is there a shallow "pull queue" which ensures
we download just the shallow artifacts (metadata only) which runs first
before actually showing anything ?

Apologies, I should have covered this in the first email.

In short, no, I was not thinking of using anything like a shallow pull
queue. Instead, I was thinking along the lines of splitting up our
current ArtifactCache._pull_artifact() [2] method into two methods - as
the availability of the artifact is *kind of* performed here. These
methods would be _check_available() and _pull_artifact(), where the
second is called only if the first returns True. This would be one of
the prior commits.

It would probably then make sense to add an Element.check_if_in_remote()
(or similar) method (which calls ArtifactCache._check_available() - or
we could have the artifact cache method non-private?). And a similar
method would exist for ArtifactElements.

I would then imagine Stream.artifact_show() to do something like this:
1. Load project (including the initialisation of remotes)
2. Load the targets to obtain a list of Elements and/or ArtifactElements
3. Loop over the list of remotes (or simply take the specified remote),
   then for each remote, loop over the list of objects call
   Element/ArtifactElement.check_if_in_remote() for each
   Element/ArtifactElement
4. Each time this returns true, we populate a new list of tuples which
   contain the artifact ref and the remote url.
5. We then output this list to the user in a pretty way (described in
   the last email).

Note that 1. and 2. are already handled by Stream._load().

Also note that we can obtain a sorted list (which represents the
dependency graph) in the Stream so I don't think this changes much when
we consider the --deps option (apart from the fact that the list is much
larger).

I believe this should achieve what I've previously described.


Is the downloaded artifact metadata persisted in the local artifact
cache so that a subsequent show command need not re-download the
shallow artifacts ?

I like this idea, but what if much later, the artifact in the remote was
expired? A subsequent show command might show us false information.



Cheers,
    -Tristan

[0] https://gitlab.com/BuildStream/buildstream/merge_requests/1008
[1] https://gitlab.com/BuildStream/buildstream/merge_requests/1008/diffs#1c09d8ba42b5e54906be5cb0a4042b329f2b8104_1060_1141 [2] https://gitlab.com/BuildStream/buildstream/blob/master/src/buildstream/_artifactcache.py#L613

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]