Re: [BuildStream] Proposal: bst source delete command



On Sat, 2020-02-22 at 10:02 +0900, Tristan Van Berkom via buildstream-
list wrote:
Hi Chandan,

On Feb 22, 2020, at 6:49 AM, Chandan Singh <chandan chandansingh net> wrote:

Hi all,

BuildStream currently offers no way of systematically deleting locally cached
sources. I think it should however do so - not only to maintain symmetry with
the `artifact delete` command but also because sources can be substantial in
size in many cases.

In this message, I propose to add a new `source delete` command.

Not really against this, but I’d like to point out that this is not a
usage pattern I’d want to require of normal users. The artifact
delete command was created mostly for the convenience of developers,
e.g to be able to easily rerun tasks in their plugin while developing
it.

The user on the other hand should simply be given the convenience of
BuildStream respecting the users decided cache quotas, and never be
pestered to manually type a command or asked to know which things
they’d like to delete.

From this perspective, I’d hope to see this as a power user feature
(which is fine to have I think).

I generally agree. However, one issue is that we only have a cache
quota for CAS. There is no expiry mechanism for the plugin-controlled
cache directories. To some extent this is intentional as, depending on
the source and the internet connection, fetching may be slow (and
sometimes not reliable).

`bst source delete` would indeed not be a solution to this, though. For
users that mostly fetch sources from a CAS-based remote source cache
(or Remote Asset server in the future), this may also not be a
significant issue. However, we probably still need a solution for this
at some point.


Unlike its `artifact delete` counterpart, this is slightly more nuanced as we
need to delete the sources from two places:

1. From the local CAS cache
2. From the plugin's cache directories

(1) is simple, but (2) would require cooperation from source plugins. To that
end, I propose to add a new abstract method for source plugins, like
`Source.delete()`.

I’m not convinced about this part.

This will require some API change support from the plugin, but
instead of adding more to plugins, I feel like we might get away with
being more strict with the plugin’s cache directories.

If we instead decide that plugins need to ask for a directory while
specifying the name of the source which can later be given to `bst
source delete`, then BuildStream retains ownership of the directory
and do what it pleases with it later, without asking the plugin to do
anything.

This sounds sensible. We already have `get_mirror_directory()` but we
currently leave the structure within that directory completely up to
the plugin. In most cases the plugin uses a subdirectory of the form
`utils.url_directory_name(url)`. One option could be to add an abstract
method `Source.get_url()` and then let BuildStream core construct the
full mirror path. This additional method may also be needed for the
second part of the Remote Asset API support. Details TBD.


Open questions
--------------

1. Since source plugins do not know about how many times a a given source is
  used, they can't do reference counting of sources. So, imagine a scenario
  where elements A and B both use a git repository G. Using this approach,
  `bst source delete A` would end up deleting it, as we have no way of knowing
  that B is also using it. This may not be too bad but the user may have
  expected the source to persist if another element is still using it.

This is a good question, what exactly is `A` in `bst source delete A`
?

I wasn’t expecting that we’d be passing element names here, but how
can we address sources directly ?

I think for the power user feature it's fine to pass element names here
and delete the sources regardless of whether they are used by other
elements as well (they may also be used by other projects). As I see
it, the main purpose is to make it possible to force a refetch (for
debugging, development) or manually free up some space.

2. What to do with unreachable sources? How do we delete them? It is not
  inconceivable to imagine that certain URLs are no longer referenced. But,
  again, we don't have any way of ascertaining that.

This is related to my comment that we'll need a solution for the
missing expiry mechanism of the plugin-controlled cache. I don't think
`bst source delete` can solve that issue.

We could theoretically add a command to delete all unreferenced
sources. However, I can't think of a sensible interface for this when
considering that many users will work with multiple projects. Requiring
users to specify all project directories whose sources they wish to
keep doesn't seem ideal. That said, it may make sense to add this as
another power user feature.

In any case, this should indeed not block `bst source delete`.

Cheers,
Jürg



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]