[Notes] [Git][BuildStream/buildstream][tristan/element-processing-order]

Tristan Van Berkom pushed to branch tristan/element-processing-order at BuildStream / buildstream

Commits:

4b544555

by Chandan Singh at 2019-01-15T21:28:40Z

.gitlab-ci.yml: Add tests for python 3.7

We already have tests for python 3.5 and 3.6 but not 3.7.

Fixes https://gitlab.com/BuildStream/buildstream/issues/838.

80fe0d9a

by Javier Jardón at 2019-01-15T22:48:22Z

Merge branch 'chandan/python37-tests' into 'master'

.gitlab-ci.yml: Add tests for python 3.7

Closes #838

See merge request BuildStream/buildstream!1074

c91784ab

by Chandan Singh at 2019-01-15T22:49:01Z

conftest.py: Don't use deprecated get_marker() function

Starting from `pytest` version 4.1.0, `Node.get_marker()` has been
removed, and hence our tests break when running with newer versions of
`pytest`. It was deprecated since a while back but it has recently been
removed completely. Use `get_closest_marker()` as a replacement that is
suggested in the changelog, and seems to work fine for our use case.

See https://github.com/pytest-dev/pytest/pull/4564 for more context on
the upstream issue.

One way of verifying this change is that this should fix the recently
added `tests-fedora-update-deps` job, that was failing before due to
this issue.

ecae4d73

by Javier Jardón at 2019-01-15T23:23:41Z

Merge branch 'chandan/fix-pytest-get-marker' into 'master'

conftest.py: Don't use deprecated get_marker() function

See merge request BuildStream/buildstream!1073

0eac4008

by Valentin David at 2019-01-16T10:04:57Z

buildstream/_gitsourcebase.py: Reduce git history for git describe.

Found during #833.

`git rev-list --boundary tag..HEAD` unfortunately gives boundaries
that are deeper than should when there is a merge commit between `tag`
and `HEAD`. The common ancestory of the two parents of the merge is
a boundary instead of the parent of the branch that is not traversed.

`--ancestry-path` fixes this issue by restricting `git` traversing
those branches.

a405e08f

by Valentin David at 2019-01-16T10:04:57Z

buildstream/_gitsourcebase.py: Fix case where HEAD is tagged

`git rev-list --boundary HEAD..HEAD` does not return any boundary.
So in this case we need to manually tag the HEAD as a boundary.

c0631d48

by Valentin David at 2019-01-16T10:33:52Z

Merge branch 'valentindavid/git-reduced-history' into 'master'

buildstream/_gitsourcebase.py: Reduce git history for git describe.

See merge request BuildStream/buildstream!1069

b608ac86

by James Ennis at 2019-01-16T11:05:44Z

_context.py: Doc fix: get_toplevel_project() returns object not list

97a3beb6

by James Ennis at 2019-01-16T11:05:44Z

_context.py: Add documentation to get_workspaces() command

d6587aa0

by James Ennis at 2019-01-16T11:33:21Z

Merge branch 'jennis/doc_fixes_in_context' into 'master'

Small documentation/comment fixes in context.py

See merge request BuildStream/buildstream!1072

2683f98a

by Raoul Hidalgo Charman at 2019-01-16T11:55:07Z

_cas: Rename artifactcache folder and move that to a root module

Other components will start to reply on cas modules, and not the artifact cache
modules so it should be organized to reflect this.

All relevant imports have been changed.

Part #802

d2cc4798

by Raoul Hidalgo Charman at 2019-01-16T11:55:07Z

casremote.py: Move remote CAS classes into its own file

Part of #802

76f67483

by Raoul Hidalgo Charman at 2019-01-16T11:55:07Z

cas: move remote only functions to CASRemote

List of methods moved
* Initialization check: made it a class method that is run in a subprocess, for
  when checking in the main buildstream process.
* fetch_blobs
* send_blobs
* verify_digest_on_remote
* push_method

Part of #802

6c428bc9

by Jürg Billeter at 2019-01-16T12:56:38Z

Merge branch 'raoul/cas-refactor' into 'master'

Cas refactor

See merge request BuildStream/buildstream!1071

dd8f5634

by Tristan Van Berkom at 2019-01-16T14:06:46Z

_scheduler: Refactor of queues and resources.

This branch makes the following changes:

  * jobs/job.py: No longer stores any interested resource list

    Jobs are ephemeral again, they only ever exist while they
    are running.

  * queues/queue.py: Revert to only handling lists of elements

    Elements pass through the queues, Queue.harvest_jobs()
    replaces Queue.pop_ready_jobs() and now the Queue stops
    creating jobs as soon as there are not enough resources
    for the job.

    Also removed unused `prepare()` abstract method.

  * queues/buildqueue.py: Adapt the part where we launch a job

    This part needs to be reworked anyway, just touch it up for
    now so that it doesnt break with the surrounding changes.

  * jobs/{cachesize,cleanup}job.py: Expose uniform complete callback

    Allows the scheduler to manage resource deallocation for these
    two job completions as a custom thing, at the same phase
    that the Queues take care of their own resource deallocation.

  * resources.py: No longer has knowledge of the job

    Since jobs are ephemeral, they are not a suitable place
    to store the resource identifiers, these must be provided
    by the callers wherever needed.

    Now the main Resources object is owned by the Scheduler
    but shared with Queues, each take care of managing the
    resources of the jobs they create through the same
    resource API.

  * scheduler.py: Reverted to only creating jobs on demand

    This changes the flow of the scheduler such that whenever
    jobs complete, the queues are interrogated for as many
    jobs which can run at the moment but not more; and this
    completely removes the waiting list.

    For the internal cache management jobs, we handle this
    with a little state instead of having a waiting list
    and only launch when the resources permit it.

By abolishing the scheduler waiting list and creating jobs
on demand, we fix the order of element processing and consequently
fix issue #712.

d8e2b8a5

by Tristan Van Berkom at 2019-01-16T14:06:46Z

tests/frontend/order.py: Enable the test for build and fix the fetch tests

With the scheduler changes, fetch jobs get automatically skipped
so the output is changed, using a separate repo for each element
fixes the test such that every fetch job gets a job launched.

d59294cf

by Tristan Van Berkom at 2019-01-16T14:06:46Z

_artifactcache/artifactcache.py: Rephrase failure message

It was saying "There is not enough space to build the given element.",
this makes me think the error is associated to a specific element, but
this does not make sense to show up in a cleanup task.

Instead say "There is not enough space to complete the build.", which
should be more clear that even after cleaning up there is not enough
space.

27 changed files:

.gitlab-ci.yml
buildstream/_artifactcache/artifactcache.py → buildstream/_artifactcache.py
buildstream/_artifactcache/__init__.py → buildstream/_cas/__init__.py
buildstream/_artifactcache/cascache.py → buildstream/_cas/cascache.py
+ buildstream/_cas/casremote.py
buildstream/_artifactcache/casserver.py → buildstream/_cas/casserver.py
buildstream/_context.py
buildstream/_exceptions.py
buildstream/_gitsourcebase.py
buildstream/_scheduler/jobs/cachesizejob.py
buildstream/_scheduler/jobs/cleanupjob.py
buildstream/_scheduler/jobs/job.py
buildstream/_scheduler/queues/buildqueue.py
buildstream/_scheduler/queues/queue.py
buildstream/_scheduler/resources.py
buildstream/_scheduler/scheduler.py
buildstream/sandbox/_sandboxremote.py
conftest.py
doc/source/using_configuring_artifact_server.rst
tests/artifactcache/config.py
tests/artifactcache/expiry.py
tests/frontend/order.py
tests/sandboxes/storage-tests.py
tests/sources/git.py
tests/storage/virtual_directory_import.py
tests/testutils/artifactshare.py
tests/utils/misc.py

Changes:

.gitlab-ci.yml

@@ -60,6 +60,16 @@ tests-ubuntu-18.04:
    image: buildstream/testsuite-ubuntu:18.04-5da27168-32c47d1c
    <<: *tests
 +tests-python-3.7-stretch:
 +  image: buildstream/testsuite-python:3.7-stretch-a60f0c39
 +  <<: *tests
++
 +  variables:
 +    # Note that we explicitly specify TOXENV in this case because this
 +    # image has both 3.6 and 3.7 versions. python3.6 cannot be removed because
 +    # some of our base dependencies declare it as their runtime dependency.
 +    TOXENV: py37
++
  overnight-fedora-28-aarch64:
    image: buildstream/testsuite-fedora:aarch64-28-5da27168-32c47d1c
    tags:

buildstream/_artifactcache/artifactcache.py → buildstream/_artifactcache.py

@@ -19,18 +19,16 @@
  import multiprocessing
  import os
 -import signal
  import string
  from collections.abc import Mapping
 -from ..types import _KeyStrength
 -from .._exceptions import ArtifactError, CASError, LoadError, LoadErrorReason
 -from .._message import Message, MessageType
 -from .. import _signals
 -from .. import utils
 -from .. import _yaml
 +from .types import _KeyStrength
 +from ._exceptions import ArtifactError, CASError, LoadError, LoadErrorReason
 +from ._message import Message, MessageType
 +from . import utils
 +from . import _yaml
 -from .cascache import CASRemote, CASRemoteSpec
 +from ._cas import CASRemote, CASRemoteSpec
  CACHE_SIZE_FILE = "cache_size"
@@ -249,7 +247,7 @@ class ArtifactCache():
                  # FIXME: Asking the user what to do may be neater
                  default_conf = os.path.join(os.environ['XDG_CONFIG_HOME'],
                                              'buildstream.conf')
 -                detail = ("There is not enough space to build the given element.\n"
 +                detail = ("There is not enough space to complete the build.\n"
                            "Please increase the cache-quota in {}."
                            .format(self.context.config_origin or default_conf))
@@ -375,20 +373,8 @@ class ArtifactCache():
          remotes = {}
          q = multiprocessing.Queue()
          for remote_spec in remote_specs:
 -            # Use subprocess to avoid creation of gRPC threads in main BuildStream process
 -            # See https://github.com/grpc/grpc/blob/master/doc/fork_support.md for details
 -            p = multiprocessing.Process(target=self.cas.initialize_remote, args=(remote_spec, q))
 -            try:
 -                # Keep SIGINT blocked in the child process
 -                with _signals.blocked([signal.SIGINT], ignore=False):
 -                    p.start()
+-
 -                error = q.get()
 -                p.join()
 -            except KeyboardInterrupt:
 -                utils._kill_process_tree(p.pid)
 -                raise
 +            error = CASRemote.check_remote(remote_spec, q)
              if error and on_failure:
                  on_failure(remote_spec.url, error)
@@ -747,7 +733,7 @@ class ArtifactCache():
                                  "servers are configured as push remotes.")
          for remote in push_remotes:
 -            message_digest = self.cas.push_message(remote, message)
 +            message_digest = remote.push_message(message)
          return message_digest

buildstream/_artifactcache/__init__.py → buildstream/_cas/__init__.py

@@ -17,4 +17,5 @@
  #  Authors:
  #        Tristan Van Berkom <tristan vanberkom codethink co uk>
 -from .artifactcache import ArtifactCache, ArtifactCacheSpec, CACHE_SIZE_FILE
 +from .cascache import CASCache
 +from .casremote import CASRemote, CASRemoteSpec

buildstream/_artifactcache/cascache.py → buildstream/_cas/cascache.py

@@ -17,85 +17,23 @@
  #  Authors:
  #        Jürg Billeter <juerg billeter codethink co uk>
 -from collections import namedtuple
  import hashlib
  import itertools
 -import io
  import os
  import stat
  import tempfile
  import uuid
  import contextlib
 -from urllib.parse import urlparse
  import grpc
 -from .._protos.google.rpc import code_pb2
 -from .._protos.google.bytestream import bytestream_pb2, bytestream_pb2_grpc
 -from .._protos.build.bazel.remote.execution.v2 import remote_execution_pb2, remote_execution_pb2_grpc
 -from .._protos.buildstream.v2 import buildstream_pb2, buildstream_pb2_grpc
 +from .._protos.build.bazel.remote.execution.v2 import remote_execution_pb2
 +from .._protos.buildstream.v2 import buildstream_pb2
  from .. import utils
 -from .._exceptions import CASError, LoadError, LoadErrorReason
 -from .. import _yaml
 +from .._exceptions import CASCacheError
+-
 -# The default limit for gRPC messages is 4 MiB.
 -# Limit payload to 1 MiB to leave sufficient headroom for metadata.
 -_MAX_PAYLOAD_BYTES = 1024 * 1024
+-
+-
 -class CASRemoteSpec(namedtuple('CASRemoteSpec', 'url push server_cert client_key client_cert instance_name')):
+-
 -    # _new_from_config_node
 -    #
 -    # Creates an CASRemoteSpec() from a YAML loaded node
 -    #
 -    @staticmethod
 -    def _new_from_config_node(spec_node, basedir=None):
 -        _yaml.node_validate(spec_node, ['url', 'push', 'server-cert', 'client-key', 'client-cert', 'instance-name'])
 -        url = _yaml.node_get(spec_node, str, 'url')
 -        push = _yaml.node_get(spec_node, bool, 'push', default_value=False)
 -        if not url:
 -            provenance = _yaml.node_get_provenance(spec_node, 'url')
 -            raise LoadError(LoadErrorReason.INVALID_DATA,
 -                            "{}: empty artifact cache URL".format(provenance))
+-
 -        instance_name = _yaml.node_get(spec_node, str, 'instance-name', default_value=None)
+-
 -        server_cert = _yaml.node_get(spec_node, str, 'server-cert', default_value=None)
 -        if server_cert and basedir:
 -            server_cert = os.path.join(basedir, server_cert)
+-
 -        client_key = _yaml.node_get(spec_node, str, 'client-key', default_value=None)
 -        if client_key and basedir:
 -            client_key = os.path.join(basedir, client_key)
+-
 -        client_cert = _yaml.node_get(spec_node, str, 'client-cert', default_value=None)
 -        if client_cert and basedir:
 -            client_cert = os.path.join(basedir, client_cert)
+-
 -        if client_key and not client_cert:
 -            provenance = _yaml.node_get_provenance(spec_node, 'client-key')
 -            raise LoadError(LoadErrorReason.INVALID_DATA,
 -                            "{}: 'client-key' was specified without 'client-cert'".format(provenance))
+-
 -        if client_cert and not client_key:
 -            provenance = _yaml.node_get_provenance(spec_node, 'client-cert')
 -            raise LoadError(LoadErrorReason.INVALID_DATA,
 -                            "{}: 'client-cert' was specified without 'client-key'".format(provenance))
+-
 -        return CASRemoteSpec(url, push, server_cert, client_key, client_cert, instance_name)
+-
+-
 -CASRemoteSpec.__new__.__defaults__ = (None, None, None, None)
+-
+-
 -class BlobNotFound(CASError):
+-
 -    def __init__(self, blob, msg):
 -        self.blob = blob
 -        super().__init__(msg)
 +from .casremote import BlobNotFound, _CASBatchRead, _CASBatchUpdate
  # A CASCache manages a CAS repository as specified in the Remote Execution API.
@@ -120,7 +58,7 @@ class CASCache():
          headdir = os.path.join(self.casdir, 'refs', 'heads')
          objdir = os.path.join(self.casdir, 'objects')
          if not (os.path.isdir(headdir) and os.path.isdir(objdir)):
 -            raise CASError("CAS repository check failed for '{}'".format(self.casdir))
 +            raise CASCacheError("CAS repository check failed for '{}'".format(self.casdir))
      # contains():
+     #
@@ -169,7 +107,7 @@ class CASCache():
      #     subdir (str): Optional specific dir to extract
+     #
      # Raises:
 -    #     CASError: In cases there was an OSError, or if the ref did not exist.
 +    #     CASCacheError: In cases there was an OSError, or if the ref did not exist.
+     #
      # Returns: path to extracted directory
+     #
@@ -201,7 +139,7 @@ class CASCache():
                  # Another process beat us to rename
                  pass
              except OSError as e:
 -                raise CASError("Failed to extract directory for ref '{}': {}".format(ref, e)) from e
 +                raise CASCacheError("Failed to extract directory for ref '{}': {}".format(ref, e)) from e
          return originaldest
@@ -245,29 +183,6 @@ class CASCache():
          return modified, removed, added
 -    def initialize_remote(self, remote_spec, q):
 -        try:
 -            remote = CASRemote(remote_spec)
 -            remote.init()
+-
 -            request = buildstream_pb2.StatusRequest(instance_name=remote_spec.instance_name)
 -            response = remote.ref_storage.Status(request)
+-
 -            if remote_spec.push and not response.allow_updates:
 -                q.put('CAS server does not allow push')
 -            else:
 -                # No error
 -                q.put(None)
+-
 -        except grpc.RpcError as e:
 -            # str(e) is too verbose for errors reported to the user
 -            q.put(e.details())
+-
 -        except Exception as e:               # pylint: disable=broad-except
 -            # Whatever happens, we need to return it to the calling process
 -            #
 -            q.put(str(e))
+-
      # pull():
+     #
      # Pull a ref from a remote repository.
@@ -306,7 +221,7 @@ class CASCache():
              return True
          except grpc.RpcError as e:
              if e.code() != grpc.StatusCode.NOT_FOUND:
 -                raise CASError("Failed to pull ref {}: {}".format(ref, e)) from e
 +                raise CASCacheError("Failed to pull ref {}: {}".format(ref, e)) from e
              else:
                  return False
          except BlobNotFound as e:
@@ -360,7 +275,7 @@ class CASCache():
      #   (bool): True if any remote was updated, False if no pushes were required
+     #
      # Raises:
 -    #   (CASError): if there was an error
 +    #   (CASCacheError): if there was an error
+     #
      def push(self, refs, remote):
          skipped_remote = True
@@ -395,7 +310,7 @@ class CASCache():
                  skipped_remote = False
          except grpc.RpcError as e:
              if e.code() != grpc.StatusCode.RESOURCE_EXHAUSTED:
 -                raise CASError("Failed to push ref {}: {}".format(refs, e), temporary=True) from e
 +                raise CASCacheError("Failed to push ref {}: {}".format(refs, e), temporary=True) from e
          return not skipped_remote
@@ -408,57 +323,13 @@ class CASCache():
      #     directory (Directory): A virtual directory object to push.
+     #
      # Raises:
 -    #     (CASError): if there was an error
 +    #     (CASCacheError): if there was an error
+     #
      def push_directory(self, remote, directory):
          remote.init()
          self._send_directory(remote, directory.ref)
 -    # push_message():
 -    #
 -    # Push the given protobuf message to a remote.
 -    #
 -    # Args:
 -    #     remote (CASRemote): The remote to push to
 -    #     message (Message): A protobuf message to push.
 -    #
 -    # Raises:
 -    #     (CASError): if there was an error
 -    #
 -    def push_message(self, remote, message):
+-
 -        message_buffer = message.SerializeToString()
 -        message_digest = utils._message_digest(message_buffer)
+-
 -        remote.init()
+-
 -        with io.BytesIO(message_buffer) as b:
 -            self._send_blob(remote, message_digest, b)
+-
 -        return message_digest
+-
 -    # verify_digest_on_remote():
 -    #
 -    # Check whether the object is already on the server in which case
 -    # there is no need to upload it.
 -    #
 -    # Args:
 -    #     remote (CASRemote): The remote to check
 -    #     digest (Digest): The object digest.
 -    #
 -    def verify_digest_on_remote(self, remote, digest):
 -        remote.init()
+-
 -        request = remote_execution_pb2.FindMissingBlobsRequest(instance_name=remote.spec.instance_name)
 -        request.blob_digests.extend([digest])
+-
 -        response = remote.cas.FindMissingBlobs(request)
 -        if digest in response.missing_blob_digests:
 -            return False
+-
 -        return True
+-
      # objpath():
+     #
      # Return the path of an object based on its digest.
@@ -531,7 +402,7 @@ class CASCache():
              pass
          except OSError as e:
 -            raise CASError("Failed to hash object: {}".format(e)) from e
 +            raise CASCacheError("Failed to hash object: {}".format(e)) from e
          return digest
@@ -572,7 +443,7 @@ class CASCache():
                  return digest
          except FileNotFoundError as e:
 -            raise CASError("Attempt to access unavailable ref: {}".format(e)) from e
 +            raise CASCacheError("Attempt to access unavailable ref: {}".format(e)) from e
      # update_mtime()
+     #
@@ -585,7 +456,7 @@ class CASCache():
          try:
              os.utime(self._refpath(ref))
          except FileNotFoundError as e:
 -            raise CASError("Attempt to access unavailable ref: {}".format(e)) from e
 +            raise CASCacheError("Attempt to access unavailable ref: {}".format(e)) from e
      # calculate_cache_size()
+     #
@@ -676,7 +547,7 @@ class CASCache():
          # Remove cache ref
          refpath = self._refpath(ref)
          if not os.path.exists(refpath):
 -            raise CASError("Could not find ref '{}'".format(ref))
 +            raise CASCacheError("Could not find ref '{}'".format(ref))
          os.unlink(refpath)
@@ -792,7 +663,7 @@ class CASCache():
                  # The process serving the socket can't be cached anyway
                  pass
              else:
 -                raise CASError("Unsupported file type for {}".format(full_path))
 +                raise CASCacheError("Unsupported file type for {}".format(full_path))
          return self.add_object(digest=dir_digest,
                                 buffer=directory.SerializeToString())
@@ -811,7 +682,7 @@ class CASCache():
              if dirnode.name == name:
                  return dirnode.digest
 -        raise CASError("Subdirectory {} not found".format(name))
 +        raise CASCacheError("Subdirectory {} not found".format(name))
      def _diff_trees(self, tree_a, tree_b, *, added, removed, modified, path=""):
          dir_a = remote_execution_pb2.Directory()
@@ -909,23 +780,6 @@ class CASCache():
          for dirnode in directory.directories:
              yield from self._required_blobs(dirnode.digest)
 -    def _fetch_blob(self, remote, digest, stream):
 -        resource_name_components = ['blobs', digest.hash, str(digest.size_bytes)]
+-
 -        if remote.spec.instance_name:
 -            resource_name_components.insert(0, remote.spec.instance_name)
+-
 -        resource_name = '/'.join(resource_name_components)
+-
 -        request = bytestream_pb2.ReadRequest()
 -        request.resource_name = resource_name
 -        request.read_offset = 0
 -        for response in remote.bytestream.Read(request):
 -            stream.write(response.data)
 -        stream.flush()
+-
 -        assert digest.size_bytes == os.fstat(stream.fileno()).st_size
+-
      # _ensure_blob():
+     #
      # Fetch and add blob if it's not already local.
@@ -944,7 +798,7 @@ class CASCache():
              return objpath
          with tempfile.NamedTemporaryFile(dir=self.tmpdir) as f:
 -            self._fetch_blob(remote, digest, f)
 +            remote._fetch_blob(digest, f)
              added_digest = self.add_object(path=f.name, link_directly=True)
              assert added_digest.hash == digest.hash
@@ -1051,7 +905,7 @@ class CASCache():
      def _fetch_tree(self, remote, digest):
          # download but do not store the Tree object
          with tempfile.NamedTemporaryFile(dir=self.tmpdir) as out:
 -            self._fetch_blob(remote, digest, out)
 +            remote._fetch_blob(digest, out)
              tree = remote_execution_pb2.Tree()
@@ -1071,39 +925,6 @@ class CASCache():
          return dirdigest
 -    def _send_blob(self, remote, digest, stream, u_uid=uuid.uuid4()):
 -        resource_name_components = ['uploads', str(u_uid), 'blobs',
 -                                    digest.hash, str(digest.size_bytes)]
+-
 -        if remote.spec.instance_name:
 -            resource_name_components.insert(0, remote.spec.instance_name)
+-
 -        resource_name = '/'.join(resource_name_components)
+-
 -        def request_stream(resname, instream):
 -            offset = 0
 -            finished = False
 -            remaining = digest.size_bytes
 -            while not finished:
 -                chunk_size = min(remaining, _MAX_PAYLOAD_BYTES)
 -                remaining -= chunk_size
+-
 -                request = bytestream_pb2.WriteRequest()
 -                request.write_offset = offset
 -                # max. _MAX_PAYLOAD_BYTES chunks
 -                request.data = instream.read(chunk_size)
 -                request.resource_name = resname
 -                request.finish_write = remaining <= 0
+-
 -                yield request
+-
 -                offset += chunk_size
 -                finished = request.finish_write
+-
 -        response = remote.bytestream.Write(request_stream(resource_name, stream))
+-
 -        assert response.committed_size == digest.size_bytes
+-
      def _send_directory(self, remote, digest, u_uid=uuid.uuid4()):
          required_blobs = self._required_blobs(digest)
@@ -1137,7 +958,7 @@ class CASCache():
                  if (digest.size_bytes >= remote.max_batch_total_size_bytes or
                          not remote.batch_update_supported):
                      # Too large for batch request, upload in independent request.
 -                    self._send_blob(remote, digest, f, u_uid=u_uid)
 +                    remote._send_blob(digest, f, u_uid=u_uid)
                  else:
                      if not batch.add(digest, f):
                          # Not enough space left in batch request.
@@ -1150,183 +971,6 @@ class CASCache():
          batch.send()
 -# Represents a single remote CAS cache.
 -#
 -class CASRemote():
 -    def __init__(self, spec):
 -        self.spec = spec
 -        self._initialized = False
 -        self.channel = None
 -        self.bytestream = None
 -        self.cas = None
 -        self.ref_storage = None
 -        self.batch_update_supported = None
 -        self.batch_read_supported = None
 -        self.capabilities = None
 -        self.max_batch_total_size_bytes = None
+-
 -    def init(self):
 -        if not self._initialized:
 -            url = urlparse(self.spec.url)
 -            if url.scheme == 'http':
 -                port = url.port or 80
 -                self.channel = grpc.insecure_channel('{}:{}'.format(url.hostname, port))
 -            elif url.scheme == 'https':
 -                port = url.port or 443
+-
 -                if self.spec.server_cert:
 -                    with open(self.spec.server_cert, 'rb') as f:
 -                        server_cert_bytes = f.read()
 -                else:
 -                    server_cert_bytes = None
+-
 -                if self.spec.client_key:
 -                    with open(self.spec.client_key, 'rb') as f:
 -                        client_key_bytes = f.read()
 -                else:
 -                    client_key_bytes = None
+-
 -                if self.spec.client_cert:
 -                    with open(self.spec.client_cert, 'rb') as f:
 -                        client_cert_bytes = f.read()
 -                else:
 -                    client_cert_bytes = None
+-
 -                credentials = grpc.ssl_channel_credentials(root_certificates=server_cert_bytes,
 -                                                           private_key=client_key_bytes,
 -                                                           certificate_chain=client_cert_bytes)
 -                self.channel = grpc.secure_channel('{}:{}'.format(url.hostname, port), credentials)
 -            else:
 -                raise CASError("Unsupported URL: {}".format(self.spec.url))
+-
 -            self.bytestream = bytestream_pb2_grpc.ByteStreamStub(self.channel)
 -            self.cas = remote_execution_pb2_grpc.ContentAddressableStorageStub(self.channel)
 -            self.capabilities = remote_execution_pb2_grpc.CapabilitiesStub(self.channel)
 -            self.ref_storage = buildstream_pb2_grpc.ReferenceStorageStub(self.channel)
+-
 -            self.max_batch_total_size_bytes = _MAX_PAYLOAD_BYTES
 -            try:
 -                request = remote_execution_pb2.GetCapabilitiesRequest(instance_name=self.spec.instance_name)
 -                response = self.capabilities.GetCapabilities(request)
 -                server_max_batch_total_size_bytes = response.cache_capabilities.max_batch_total_size_bytes
 -                if 0 < server_max_batch_total_size_bytes < self.max_batch_total_size_bytes:
 -                    self.max_batch_total_size_bytes = server_max_batch_total_size_bytes
 -            except grpc.RpcError as e:
 -                # Simply use the defaults for servers that don't implement GetCapabilities()
 -                if e.code() != grpc.StatusCode.UNIMPLEMENTED:
 -                    raise
+-
 -            # Check whether the server supports BatchReadBlobs()
 -            self.batch_read_supported = False
 -            try:
 -                request = remote_execution_pb2.BatchReadBlobsRequest(instance_name=self.spec.instance_name)
 -                response = self.cas.BatchReadBlobs(request)
 -                self.batch_read_supported = True
 -            except grpc.RpcError as e:
 -                if e.code() != grpc.StatusCode.UNIMPLEMENTED:
 -                    raise
+-
 -            # Check whether the server supports BatchUpdateBlobs()
 -            self.batch_update_supported = False
 -            try:
 -                request = remote_execution_pb2.BatchUpdateBlobsRequest(instance_name=self.spec.instance_name)
 -                response = self.cas.BatchUpdateBlobs(request)
 -                self.batch_update_supported = True
 -            except grpc.RpcError as e:
 -                if (e.code() != grpc.StatusCode.UNIMPLEMENTED and
 -                        e.code() != grpc.StatusCode.PERMISSION_DENIED):
 -                    raise
+-
 -            self._initialized = True
+-
+-
 -# Represents a batch of blobs queued for fetching.
 -#
 -class _CASBatchRead():
 -    def __init__(self, remote):
 -        self._remote = remote
 -        self._max_total_size_bytes = remote.max_batch_total_size_bytes
 -        self._request = remote_execution_pb2.BatchReadBlobsRequest(instance_name=remote.spec.instance_name)
 -        self._size = 0
 -        self._sent = False
+-
 -    def add(self, digest):
 -        assert not self._sent
+-
 -        new_batch_size = self._size + digest.size_bytes
 -        if new_batch_size > self._max_total_size_bytes:
 -            # Not enough space left in current batch
 -            return False
+-
 -        request_digest = self._request.digests.add()
 -        request_digest.hash = digest.hash
 -        request_digest.size_bytes = digest.size_bytes
 -        self._size = new_batch_size
 -        return True
+-
 -    def send(self):
 -        assert not self._sent
 -        self._sent = True
+-
 -        if not self._request.digests:
 -            return
+-
 -        batch_response = self._remote.cas.BatchReadBlobs(self._request)
+-
 -        for response in batch_response.responses:
 -            if response.status.code == code_pb2.NOT_FOUND:
 -                raise BlobNotFound(response.digest.hash, "Failed to download blob {}: {}".format(
 -                    response.digest.hash, response.status.code))
 -            if response.status.code != code_pb2.OK:
 -                raise CASError("Failed to download blob {}: {}".format(
 -                    response.digest.hash, response.status.code))
 -            if response.digest.size_bytes != len(response.data):
 -                raise CASError("Failed to download blob {}: expected {} bytes, received {} bytes".format(
 -                    response.digest.hash, response.digest.size_bytes, len(response.data)))
+-
 -            yield (response.digest, response.data)
+-
+-
 -# Represents a batch of blobs queued for upload.
 -#
 -class _CASBatchUpdate():
 -    def __init__(self, remote):
 -        self._remote = remote
 -        self._max_total_size_bytes = remote.max_batch_total_size_bytes
 -        self._request = remote_execution_pb2.BatchUpdateBlobsRequest(instance_name=remote.spec.instance_name)
 -        self._size = 0
 -        self._sent = False
+-
 -    def add(self, digest, stream):
 -        assert not self._sent
+-
 -        new_batch_size = self._size + digest.size_bytes
 -        if new_batch_size > self._max_total_size_bytes:
 -            # Not enough space left in current batch
 -            return False
+-
 -        blob_request = self._request.requests.add()
 -        blob_request.digest.hash = digest.hash
 -        blob_request.digest.size_bytes = digest.size_bytes
 -        blob_request.data = stream.read(digest.size_bytes)
 -        self._size = new_batch_size
 -        return True
+-
 -    def send(self):
 -        assert not self._sent
 -        self._sent = True
+-
 -        if not self._request.requests:
 -            return
+-
 -        batch_response = self._remote.cas.BatchUpdateBlobs(self._request)
+-
 -        for response in batch_response.responses:
 -            if response.status.code != code_pb2.OK:
 -                raise CASError("Failed to upload blob {}: {}".format(
 -                    response.digest.hash, response.status.code))
+-
+-
  def _grouper(iterable, n):
      while True:
          try:

buildstream/_cas/casremote.py

 +from collections import namedtuple
 +import io
 +import os
 +import multiprocessing
 +import signal
 +from urllib.parse import urlparse
 +import uuid
++
 +import grpc
++
 +from .. import _yaml
 +from .._protos.google.rpc import code_pb2
 +from .._protos.google.bytestream import bytestream_pb2, bytestream_pb2_grpc
 +from .._protos.build.bazel.remote.execution.v2 import remote_execution_pb2, remote_execution_pb2_grpc
 +from .._protos.buildstream.v2 import buildstream_pb2, buildstream_pb2_grpc
++
 +from .._exceptions import CASRemoteError, LoadError, LoadErrorReason
 +from .. import _signals
 +from .. import utils
++
 +# The default limit for gRPC messages is 4 MiB.
 +# Limit payload to 1 MiB to leave sufficient headroom for metadata.
 +_MAX_PAYLOAD_BYTES = 1024 * 1024
++
++
 +class CASRemoteSpec(namedtuple('CASRemoteSpec', 'url push server_cert client_key client_cert instance_name')):
++
 +    # _new_from_config_node
 +    #
 +    # Creates an CASRemoteSpec() from a YAML loaded node
 +    #
 +    @staticmethod
 +    def _new_from_config_node(spec_node, basedir=None):
 +        _yaml.node_validate(spec_node, ['url', 'push', 'server-cert', 'client-key', 'client-cert', 'instance_name'])
 +        url = _yaml.node_get(spec_node, str, 'url')
 +        push = _yaml.node_get(spec_node, bool, 'push', default_value=False)
 +        if not url:
 +            provenance = _yaml.node_get_provenance(spec_node, 'url')
 +            raise LoadError(LoadErrorReason.INVALID_DATA,
 +                            "{}: empty artifact cache URL".format(provenance))
++
 +        instance_name = _yaml.node_get(spec_node, str, 'server-cert', default_value=None)
++
 +        server_cert = _yaml.node_get(spec_node, str, 'server-cert', default_value=None)
 +        if server_cert and basedir:
 +            server_cert = os.path.join(basedir, server_cert)
++
 +        client_key = _yaml.node_get(spec_node, str, 'client-key', default_value=None)
 +        if client_key and basedir:
 +            client_key = os.path.join(basedir, client_key)
++
 +        client_cert = _yaml.node_get(spec_node, str, 'client-cert', default_value=None)
 +        if client_cert and basedir:
 +            client_cert = os.path.join(basedir, client_cert)
++
 +        if client_key and not client_cert:
 +            provenance = _yaml.node_get_provenance(spec_node, 'client-key')
 +            raise LoadError(LoadErrorReason.INVALID_DATA,
 +                            "{}: 'client-key' was specified without 'client-cert'".format(provenance))
++
 +        if client_cert and not client_key:
 +            provenance = _yaml.node_get_provenance(spec_node, 'client-cert')
 +            raise LoadError(LoadErrorReason.INVALID_DATA,
 +                            "{}: 'client-cert' was specified without 'client-key'".format(provenance))
++
 +        return CASRemoteSpec(url, push, server_cert, client_key, client_cert, instance_name)
++
++
 +CASRemoteSpec.__new__.__defaults__ = (None, None, None, None)
++
++
 +class BlobNotFound(CASRemoteError):
++
 +    def __init__(self, blob, msg):
 +        self.blob = blob
 +        super().__init__(msg)
++
++
 +# Represents a single remote CAS cache.
 +#
 +class CASRemote():
 +    def __init__(self, spec):
 +        self.spec = spec
 +        self._initialized = False
 +        self.channel = None
 +        self.bytestream = None
 +        self.cas = None
 +        self.ref_storage = None
 +        self.batch_update_supported = None
 +        self.batch_read_supported = None
 +        self.capabilities = None
 +        self.max_batch_total_size_bytes = None
++
 +    def init(self):
 +        if not self._initialized:
 +            url = urlparse(self.spec.url)
 +            if url.scheme == 'http':
 +                port = url.port or 80
 +                self.channel = grpc.insecure_channel('{}:{}'.format(url.hostname, port))
 +            elif url.scheme == 'https':
 +                port = url.port or 443
++
 +                if self.spec.server_cert:
 +                    with open(self.spec.server_cert, 'rb') as f:
 +                        server_cert_bytes = f.read()
 +                else:
 +                    server_cert_bytes = None
++
 +                if self.spec.client_key:
 +                    with open(self.spec.client_key, 'rb') as f:
 +                        client_key_bytes = f.read()
 +                else:
 +                    client_key_bytes = None
++
 +                if self.spec.client_cert:
 +                    with open(self.spec.client_cert, 'rb') as f:
 +                        client_cert_bytes = f.read()
 +                else:
 +                    client_cert_bytes = None
++
 +                credentials = grpc.ssl_channel_credentials(root_certificates=server_cert_bytes,
 +                                                           private_key=client_key_bytes,
 +                                                           certificate_chain=client_cert_bytes)
 +                self.channel = grpc.secure_channel('{}:{}'.format(url.hostname, port), credentials)
 +            else:
 +                raise CASRemoteError("Unsupported URL: {}".format(self.spec.url))
++
 +            self.bytestream = bytestream_pb2_grpc.ByteStreamStub(self.channel)
 +            self.cas = remote_execution_pb2_grpc.ContentAddressableStorageStub(self.channel)
 +            self.capabilities = remote_execution_pb2_grpc.CapabilitiesStub(self.channel)
 +            self.ref_storage = buildstream_pb2_grpc.ReferenceStorageStub(self.channel)
++
 +            self.max_batch_total_size_bytes = _MAX_PAYLOAD_BYTES
 +            try:
 +                request = remote_execution_pb2.GetCapabilitiesRequest()
 +                response = self.capabilities.GetCapabilities(request)
 +                server_max_batch_total_size_bytes = response.cache_capabilities.max_batch_total_size_bytes
 +                if 0 < server_max_batch_total_size_bytes < self.max_batch_total_size_bytes:
 +                    self.max_batch_total_size_bytes = server_max_batch_total_size_bytes
 +            except grpc.RpcError as e:
 +                # Simply use the defaults for servers that don't implement GetCapabilities()
 +                if e.code() != grpc.StatusCode.UNIMPLEMENTED:
 +                    raise
++
 +            # Check whether the server supports BatchReadBlobs()
 +            self.batch_read_supported = False
 +            try:
 +                request = remote_execution_pb2.BatchReadBlobsRequest()
 +                response = self.cas.BatchReadBlobs(request)
 +                self.batch_read_supported = True
 +            except grpc.RpcError as e:
 +                if e.code() != grpc.StatusCode.UNIMPLEMENTED:
 +                    raise
++
 +            # Check whether the server supports BatchUpdateBlobs()
 +            self.batch_update_supported = False
 +            try:
 +                request = remote_execution_pb2.BatchUpdateBlobsRequest()
 +                response = self.cas.BatchUpdateBlobs(request)
 +                self.batch_update_supported = True
 +            except grpc.RpcError as e:
 +                if (e.code() != grpc.StatusCode.UNIMPLEMENTED and
 +                        e.code() != grpc.StatusCode.PERMISSION_DENIED):
 +                    raise
++
 +            self._initialized = True
++
 +    # check_remote
 +    #
 +    # Used when checking whether remote_specs work in the buildstream main
 +    # thread, runs this in a seperate process to avoid creation of gRPC threads
 +    # in the main BuildStream process
 +    # See https://github.com/grpc/grpc/blob/master/doc/fork_support.md for details
 +    @classmethod
 +    def check_remote(cls, remote_spec, q):
++
 +        def __check_remote():
 +            try:
 +                remote = cls(remote_spec)
 +                remote.init()
++
 +                request = buildstream_pb2.StatusRequest()
 +                response = remote.ref_storage.Status(request)
++
 +                if remote_spec.push and not response.allow_updates:
 +                    q.put('CAS server does not allow push')
 +                else:
 +                    # No error
 +                    q.put(None)
++
 +            except grpc.RpcError as e:
 +                # str(e) is too verbose for errors reported to the user
 +                q.put(e.details())
++
 +            except Exception as e:               # pylint: disable=broad-except
 +                # Whatever happens, we need to return it to the calling process
 +                #
 +                q.put(str(e))
++
 +        p = multiprocessing.Process(target=__check_remote)
++
 +        try:
 +            # Keep SIGINT blocked in the child process
 +            with _signals.blocked([signal.SIGINT], ignore=False):
 +                p.start()
++
 +            error = q.get()
 +            p.join()
 +        except KeyboardInterrupt:
 +            utils._kill_process_tree(p.pid)
 +            raise
++
 +        return error
++
 +    # verify_digest_on_remote():
 +    #
 +    # Check whether the object is already on the server in which case
 +    # there is no need to upload it.
 +    #
 +    # Args:
 +    #     digest (Digest): The object digest.
 +    #
 +    def verify_digest_on_remote(self, digest):
 +        self.init()
++
 +        request = remote_execution_pb2.FindMissingBlobsRequest()
 +        request.blob_digests.extend([digest])
++
 +        response = self.cas.FindMissingBlobs(request)
 +        if digest in response.missing_blob_digests:
 +            return False
++
 +        return True
++
 +    # push_message():
 +    #
 +    # Push the given protobuf message to a remote.
 +    #
 +    # Args:
 +    #     message (Message): A protobuf message to push.
 +    #
 +    # Raises:
 +    #     (CASRemoteError): if there was an error
 +    #
 +    def push_message(self, message):
++
 +        message_buffer = message.SerializeToString()
 +        message_digest = utils._message_digest(message_buffer)
++
 +        self.init()
++
 +        with io.BytesIO(message_buffer) as b:
 +            self._send_blob(message_digest, b)
++
 +        return message_digest
++
 +    ################################################
 +    #             Local Private Methods            #
 +    ################################################
 +    def _fetch_blob(self, digest, stream):
 +        resource_name = '/'.join(['blobs', digest.hash, str(digest.size_bytes)])
 +        request = bytestream_pb2.ReadRequest()
 +        request.resource_name = resource_name
 +        request.read_offset = 0
 +        for response in self.bytestream.Read(request):
 +            stream.write(response.data)
 +        stream.flush()
++
 +        assert digest.size_bytes == os.fstat(stream.fileno()).st_size
++
 +    def _send_blob(self, digest, stream, u_uid=uuid.uuid4()):
 +        resource_name = '/'.join(['uploads', str(u_uid), 'blobs',
 +                                  digest.hash, str(digest.size_bytes)])
++
 +        def request_stream(resname, instream):
 +            offset = 0
 +            finished = False
 +            remaining = digest.size_bytes
 +            while not finished:
 +                chunk_size = min(remaining, _MAX_PAYLOAD_BYTES)
 +                remaining -= chunk_size
++
 +                request = bytestream_pb2.WriteRequest()
 +                request.write_offset = offset
 +                # max. _MAX_PAYLOAD_BYTES chunks
 +                request.data = instream.read(chunk_size)
 +                request.resource_name = resname
 +                request.finish_write = remaining <= 0
++
 +                yield request
++
 +                offset += chunk_size
 +                finished = request.finish_write
++
 +        response = self.bytestream.Write(request_stream(resource_name, stream))
++
 +        assert response.committed_size == digest.size_bytes
++
++
 +# Represents a batch of blobs queued for fetching.
 +#
 +class _CASBatchRead():
 +    def __init__(self, remote):
 +        self._remote = remote
 +        self._max_total_size_bytes = remote.max_batch_total_size_bytes
 +        self._request = remote_execution_pb2.BatchReadBlobsRequest()
 +        self._size = 0
 +        self._sent = False
++
 +    def add(self, digest):
 +        assert not self._sent
++
 +        new_batch_size = self._size + digest.size_bytes
 +        if new_batch_size > self._max_total_size_bytes:
 +            # Not enough space left in current batch
 +            return False
++
 +        request_digest = self._request.digests.add()
 +        request_digest.hash = digest.hash
 +        request_digest.size_bytes = digest.size_bytes
 +        self._size = new_batch_size
 +        return True
++
 +    def send(self):
 +        assert not self._sent
 +        self._sent = True
++
 +        if not self._request.digests:
 +            return
++
 +        batch_response = self._remote.cas.BatchReadBlobs(self._request)
++
 +        for response in batch_response.responses:
 +            if response.status.code == code_pb2.NOT_FOUND:
 +                raise BlobNotFound(response.digest.hash, "Failed to download blob {}: {}".format(
 +                    response.digest.hash, response.status.code))
 +            if response.status.code != code_pb2.OK:
 +                raise CASRemoteError("Failed to download blob {}: {}".format(
 +                    response.digest.hash, response.status.code))
 +            if response.digest.size_bytes != len(response.data):
 +                raise CASRemoteError("Failed to download blob {}: expected {} bytes, received {} bytes".format(
 +                    response.digest.hash, response.digest.size_bytes, len(response.data)))
++
 +            yield (response.digest, response.data)
++
++
 +# Represents a batch of blobs queued for upload.
 +#
 +class _CASBatchUpdate():
 +    def __init__(self, remote):
 +        self._remote = remote
 +        self._max_total_size_bytes = remote.max_batch_total_size_bytes
 +        self._request = remote_execution_pb2.BatchUpdateBlobsRequest()
 +        self._size = 0
 +        self._sent = False
++
 +    def add(self, digest, stream):
 +        assert not self._sent
++
 +        new_batch_size = self._size + digest.size_bytes
 +        if new_batch_size > self._max_total_size_bytes:
 +            # Not enough space left in current batch
 +            return False
++
 +        blob_request = self._request.requests.add()
 +        blob_request.digest.hash = digest.hash
 +        blob_request.digest.size_bytes = digest.size_bytes
 +        blob_request.data = stream.read(digest.size_bytes)
 +        self._size = new_batch_size
 +        return True
++
 +    def send(self):
 +        assert not self._sent
 +        self._sent = True
++
 +        if not self._request.requests:
 +            return
++
 +        batch_response = self._remote.cas.BatchUpdateBlobs(self._request)
++
 +        for response in batch_response.responses:
 +            if response.status.code != code_pb2.OK:
 +                raise CASRemoteError("Failed to upload blob {}: {}".format(
 +                    response.digest.hash, response.status.code))

buildstream/_artifactcache/casserver.py → buildstream/_cas/casserver.py

buildstream/_context.py

@@ -31,7 +31,7 @@ from ._exceptions import LoadError, LoadErrorReason, BstError
  from ._message import Message, MessageType
  from ._profile import Topics, profile_start, profile_end
  from ._artifactcache import ArtifactCache
 -from ._artifactcache.cascache import CASCache
 +from ._cas import CASCache
  from ._workspaces import Workspaces, WorkspaceProjectCache, WORKSPACE_PROJECT_FILE
  from .plugin import _plugin_lookup
  from .sandbox import SandboxRemote
@@ -317,11 +317,18 @@ class Context():
      # invoked with as opposed to a junctioned subproject.
+     #
      # Returns:
 -    #    (list): The list of projects
 +    #    (Project): The Project object
+     #
      def get_toplevel_project(self):
          return self._projects[0]
 +    # get_workspaces():
 +    #
 +    # Return a Workspaces object containing a list of workspaces.
 +    #
 +    # Returns:
 +    #    (Workspaces): The Workspaces object
 +    #
      def get_workspaces(self):
          return self._workspaces

buildstream/_exceptions.py

@@ -284,6 +284,21 @@ class CASError(BstError):
          super().__init__(message, detail=detail, domain=ErrorDomain.CAS, reason=reason, temporary=True)
 +# CASRemoteError
 +#
 +# Raised when errors are encountered in the remote CAS
 +class CASRemoteError(CASError):
 +    pass
++
++
 +# CASCacheError
 +#
 +# Raised when errors are encountered in the local CASCacheError
 +#
 +class CASCacheError(CASError):
 +    pass
++
++
  # PipelineError
+ #
  # Raised from pipeline operations

buildstream/_gitsourcebase.py

@@ -296,18 +296,24 @@ class GitMirror(SourceFetcher):
              shallow = set()
              for _, commit_ref, _ in self.tags:
 -                _, out = self.source.check_output([self.source.host_git, 'rev-list',
 -                                                   '--boundary', '{}..{}'.format(commit_ref, self.ref)],
 -                                                  fail="Failed to get git history {}..{} in directory: {}"
 -                                                  .format(commit_ref, self.ref, fullpath),
 -                                                  fail_temporarily=True,
 -                                                  cwd=self.mirror)
 -                for line in out.splitlines():
 -                    rev = line.lstrip('-')
 -                    if line[0] == '-':
 -                        shallow.add(rev)
 -                    else:
 -                        included.add(rev)
 +                if commit_ref == self.ref:
 +                    # rev-list does not work in case of same rev
 +                    shallow.add(self.ref)
 +                else:
 +                    _, out = self.source.check_output([self.source.host_git, 'rev-list',
 +                                                       '--ancestry-path', '--boundary',
 +                                                       '{}..{}'.format(commit_ref, self.ref)],
 +                                                      fail="Failed to get git history {}..{} in directory: {}"
 +                                                      .format(commit_ref, self.ref, fullpath),
 +                                                      fail_temporarily=True,
 +                                                      cwd=self.mirror)
 +                    self.source.warn("refs {}..{}: {}".format(commit_ref, self.ref, out.splitlines()))
 +                    for line in out.splitlines():
 +                        rev = line.lstrip('-')
 +                        if line[0] == '-':
 +                            shallow.add(rev)
 +                        else:
 +                            included.add(rev)
              shallow -= included
              included |= shallow

buildstream/_scheduler/jobs/cachesizejob.py

@@ -34,8 +34,8 @@ class CacheSizeJob(Job):
          if status == JobStatus.OK:
              self._artifacts.set_cache_size(result)
 -            if self._complete_cb:
 -                self._complete_cb(result)
 +        if self._complete_cb:
 +            self._complete_cb(status, result)
      def child_process_data(self):
          return {}

buildstream/_scheduler/jobs/cleanupjob.py

@@ -20,8 +20,9 @@ from .job import Job, JobStatus
  class CleanupJob(Job):
 -    def __init__(self, *args, **kwargs):
 +    def __init__(self, *args, complete_cb, **kwargs):
          super().__init__(*args, **kwargs)
 +        self._complete_cb = complete_cb
          context = self._scheduler.context
          self._artifacts = context.artifactcache
@@ -32,3 +33,6 @@ class CleanupJob(Job):
      def parent_complete(self, status, result):
          if status == JobStatus.OK:
              self._artifacts.set_cache_size(result)
++
 +        if self._complete_cb:
 +            self._complete_cb(status, result)

buildstream/_scheduler/jobs/job.py

@@ -85,28 +85,11 @@ class Process(multiprocessing.Process):
  #    action_name (str): The queue action name
  #    logfile (str): A template string that points to the logfile
  #                   that should be used - should contain {pid}.
 -#    resources (iter(ResourceType)) - A set of resources this job
 -#                                     wants to use.
 -#    exclusive_resources (iter(ResourceType)) - A set of resources
 -#                                               this job wants to use
 -#                                               exclusively.
  #    max_retries (int): The maximum number of retries
+ #
  class Job():
 -    def __init__(self, scheduler, action_name, logfile, *,
 -                 resources=None, exclusive_resources=None, max_retries=0):
+-
 -        if resources is None:
 -            resources = set()
 -        else:
 -            resources = set(resources)
 -        if exclusive_resources is None:
 -            exclusive_resources = set()
 -        else:
 -            exclusive_resources = set(resources)
+-
 -        assert exclusive_resources <= resources, "All exclusive resources must also be resources!"
 +    def __init__(self, scheduler, action_name, logfile, *, max_retries=0):
+         #
          # Public members
@@ -114,12 +97,6 @@ class Job():
          self.action_name = action_name   # The action name for the Queue
          self.child_data = None           # Data to be sent to the main process
 -        # The resources this job wants to access
 -        self.resources = resources
 -        # Resources this job needs to access exclusively, i.e., no
 -        # other job should be allowed to access them
 -        self.exclusive_resources = exclusive_resources
+-
+         #
          # Private members
+         #

buildstream/_scheduler/queues/buildqueue.py

@@ -57,11 +57,10 @@ class BuildQueue(Queue):
                            logfile=logfile)
              job = ElementJob(self._scheduler, self.action_name,
                               logfile, element=element, queue=self,
 -                             resources=self.resources,
                               action_cb=self.process,
                               complete_cb=self._job_done,
                               max_retries=self._max_retries)
 -            self._done_queue.append(job)
 +            self._done_queue.append(element)
              self.failed_elements.append(element)
              self._scheduler._job_complete_callback(job, False)

buildstream/_scheduler/queues/queue.py

@@ -72,8 +72,9 @@ class Queue():
          # Private members
+         #
          self._scheduler = scheduler
 -        self._wait_queue = deque()
 -        self._done_queue = deque()
 +        self._resources = scheduler.resources  # Shared resource pool
 +        self._wait_queue = deque()             # Ready / Waiting elements
 +        self._done_queue = deque()             # Processed / Skipped elements
          self._max_retries = 0
          # Assert the subclass has setup class data
@@ -115,16 +116,6 @@ class Queue():
      def status(self, element):
          return QueueStatus.READY
 -    # prepare()
 -    #
 -    # Abstract method for handling job preparation in the main process.
 -    #
 -    # Args:
 -    #    element (Element): The element which is scheduled
 -    #
 -    def prepare(self, element):
 -        pass
+-
      # done()
+     #
      # Abstract method for handling a successful job completion.
@@ -153,26 +144,18 @@ class Queue():
          if not elts:
              return
 -        # Note: The internal lists work with jobs. This is not
 -        #       reflected in any external methods (except
 -        #       pop/peek_ready_jobs).
 -        def create_job(element):
 -            logfile = self._element_log_path(element)
 -            return ElementJob(self._scheduler, self.action_name,
 -                              logfile, element=element, queue=self,
 -                              resources=self.resources,
 -                              action_cb=self.process,
 -                              complete_cb=self._job_done,
 -                              max_retries=self._max_retries)
+-
 -        # Place skipped elements directly on the done queue
 -        jobs = [create_job(elt) for elt in elts]
 -        skip = [job for job in jobs if self.status(job.element) == QueueStatus.SKIP]
 -        wait = [job for job in jobs if job not in skip]
+-
 -        self.skipped_elements.extend([job.element for job in skip])
 -        self._wait_queue.extend(wait)
 -        self._done_queue.extend(skip)
 +        # Place skipped elements on the done queue right away.
 +        #
 +        # The remaining ready and waiting elements must remain in the
 +        # same queue, and ready status must be determined at the moment
 +        # which the scheduler is asking for the next job.
 +        #
 +        skip = [elt for elt in elts if self.status(elt) == QueueStatus.SKIP]
 +        wait = [elt for elt in elts if elt not in skip]
++
 +        self.skipped_elements.extend(skip)  # Public record of skipped elements
 +        self._done_queue.extend(skip)       # Elements to be processed
 +        self._wait_queue.extend(wait)       # Elements eligible to be dequeued
      # dequeue()
+     #
@@ -184,69 +167,59 @@ class Queue():
+     #
      def dequeue(self):
          while self._done_queue:
 -            yield self._done_queue.popleft().element
 +            yield self._done_queue.popleft()
      # dequeue_ready()
+     #
 -    # Reports whether there are any elements to dequeue
 +    # Reports whether any elements can be promoted to other queues
+     #
      # Returns:
 -    #    (bool): Whether there are elements to dequeue
 +    #    (bool): Whether there are elements ready
+     #
      def dequeue_ready(self):
          return any(self._done_queue)
 -    # pop_ready_jobs()
 -    #
 -    # Returns:
 -    #     ([Job]): A list of jobs to run
 +    # harvest_jobs()
+     #
      # Process elements in the queue, moving elements which were enqueued
 -    # into the dequeue pool, and processing them if necessary.
 -    #
 -    # This will have different results for elements depending
 -    # on the Queue.status() implementation.
 -    #
 -    #   o Elements which are QueueStatus.WAIT will not be affected
 +    # into the dequeue pool, and creating as many jobs for which resources
 +    # can be reserved.
+     #
 -    #   o Elements which are QueueStatus.SKIP will move directly
 -    #     to the dequeue pool
 -    #
 -    #   o For Elements which are QueueStatus.READY a Job will be
 -    #     created and returned to the caller, given that the scheduler
 -    #     allows the Queue enough resources for the given job
 +    # Returns:
 +    #     ([Job]): A list of jobs which can be run now
+     #
 -    def pop_ready_jobs(self):
 +    def harvest_jobs(self):
          unready = []
          ready = []
          while self._wait_queue:
 -            job = self._wait_queue.popleft()
 -            element = job.element
 +            if not self._resources.reserve(self.resources, peek=True):
 +                break
 +            element = self._wait_queue.popleft()
              status = self.status(element)
++
              if status == QueueStatus.WAIT:
 -                unready.append(job)
 -                continue
 +                unready.append(element)
              elif status == QueueStatus.SKIP:
 -                self._done_queue.append(job)
 +                self._done_queue.append(element)
                  self.skipped_elements.append(element)
 -                continue
+-
 -            self.prepare(element)
 -            ready.append(job)
 +            else:
 +                reserved = self._resources.reserve(self.resources)
 +                assert reserved
 +                ready.append(element)
 -        # These were not ready but were in the beginning, give em
 -        # first priority again next time around
          self._wait_queue.extendleft(unready)
 -        return ready
+-
 -    def peek_ready_jobs(self):
 -        def ready(job):
 -            return self.status(job.element) == QueueStatus.READY
+-
 -        yield from (job for job in self._wait_queue if ready(job))
 +        return [
 +            ElementJob(self._scheduler, self.action_name,
 +                       self._element_log_path(element),
 +                       element=element, queue=self,
 +                       action_cb=self.process,
 +                       complete_cb=self._job_done,
 +                       max_retries=self._max_retries)
 +            for element in ready
 +        ]
      #####################################################
      #                 Private Methods                   #
@@ -292,6 +265,10 @@ class Queue():
+     #
      def _job_done(self, job, element, status, result):
 +        # Now release the resources we reserved
 +        #
 +        self._resources.release(self.resources)
++
          # Update values that need to be synchronized in the main task
          # before calling any queue implementation
          self._update_workspaces(element, job)
@@ -324,12 +301,8 @@ class Queue():
                            detail=traceback.format_exc())
              self.failed_elements.append(element)
          else:
 -            #
 -            # No exception occured in post processing
 -            #
+-
 -            # All jobs get placed on the done queue for later processing.
 -            self._done_queue.append(job)
 +            # All elements get placed on the done queue for later processing.
 +            self._done_queue.append(element)
              # These lists are for bookkeeping purposes for the UI and logging.
              if status == JobStatus.SKIPPED:

buildstream/_scheduler/resources.py

@@ -34,28 +34,25 @@ class Resources():
              ResourceType.UPLOAD: set()
+         }
 -    def clear_job_resources(self, job):
 -        for resource in job.exclusive_resources:
 -            self._exclusive_resources[resource].remove(hash(job))
 +    # reserve()
 +    #
 +    # Reserves a set of resources
 +    #
 +    # Args:
 +    #    resources (set): A set of ResourceTypes
 +    #    exclusive (set): Another set of ResourceTypes
 +    #    peek (bool): Whether to only peek at whether the resource is available
 +    #
 +    # Returns:
 +    #    (bool): True if the resources could be reserved
 +    #
 +    def reserve(self, resources, exclusive=None, *, peek=False):
 +        if exclusive is None:
 +            exclusive = set()
 -        for resource in job.resources:
 -            self._used_resources[resource] -= 1
+-
 -    def reserve_exclusive_resources(self, job):
 -        exclusive = job.exclusive_resources
+-
 -        # The very first thing we do is to register any exclusive
 -        # resources this job may want. Even if the job is not yet
 -        # allowed to run (because another job is holding the resource
 -        # it wants), we can still set this - it just means that any
 -        # job *currently* using these resources has to finish first,
 -        # and no new jobs wanting these can be launched (except other
 -        # exclusive-access jobs).
 -        #
 -        for resource in exclusive:
 -            self._exclusive_resources[resource].add(hash(job))
 +        resources = set(resources)
 +        exclusive = set(exclusive)
 -    def reserve_job_resources(self, job):
          # First, we check if the job wants to access a resource that
          # another job wants exclusive access to. If so, it cannot be
          # scheduled.
@@ -68,7 +65,8 @@ class Resources():
          #        is currently not possible, but may be worth thinking
          #        about.
+         #
 -        for resource in job.resources - job.exclusive_resources:
 +        for resource in resources - exclusive:
++
              # If our job wants this resource exclusively, we never
              # check this, so we can get away with not (temporarily)
              # removing it from the set.
@@ -84,14 +82,14 @@ class Resources():
          # at a time, despite being allowed to be part of the exclusive
          # set.
+         #
 -        for exclusive in job.exclusive_resources:
 -            if self._used_resources[exclusive] != 0:
 +        for resource in exclusive:
 +            if self._used_resources[resource] != 0:
                  return False
          # Finally, we check if we have enough of each resource
          # available. If we don't have enough, the job cannot be
          # scheduled.
 -        for resource in job.resources:
 +        for resource in resources:
              if (self._max_resources[resource] > 0 and
                      self._used_resources[resource] >= self._max_resources[resource]):
                  return False
@@ -99,7 +97,70 @@ class Resources():
          # Now we register the fact that our job is using the resources
          # it asked for, and tell the scheduler that it is allowed to
          # continue.
 -        for resource in job.resources:
 -            self._used_resources[resource] += 1
 +        if not peek:
 +            for resource in resources:
 +                self._used_resources[resource] += 1
          return True
++
 +    # release()
 +    #
 +    # Release resources previously reserved with Resources.reserve()
 +    #
 +    # Args:
 +    #    resources (set): A set of resources to release
 +    #
 +    def release(self, resources):
 +        for resource in resources:
 +            assert self._used_resources[resource] > 0, "Scheduler resource imbalance"
 +            self._used_resources[resource] -= 1
++
 +    # register_exclusive_interest()
 +    #
 +    # Inform the resources pool that `source` has an interest in
 +    # reserving this resource exclusively.
 +    #
 +    # The source parameter is used to identify the caller, it
 +    # must be ensured to be unique for the time that the
 +    # interest is registered.
 +    #
 +    # This function may be called multiple times, and subsequent
 +    # calls will simply have no effect until clear_exclusive_interest()
 +    # is used to clear the interest.
 +    #
 +    # This must be called in advance of reserve()
 +    #
 +    # Args:
 +    #    resources (set): Set of resources to reserve exclusively
 +    #    source (any): Source identifier, to be used again when unregistering
 +    #                  the interest.
 +    #
 +    def register_exclusive_interest(self, resources, source):
++
 +        # The very first thing we do is to register any exclusive
 +        # resources this job may want. Even if the job is not yet
 +        # allowed to run (because another job is holding the resource
 +        # it wants), we can still set this - it just means that any
 +        # job *currently* using these resources has to finish first,
 +        # and no new jobs wanting these can be launched (except other
 +        # exclusive-access jobs).
 +        #
 +        for resource in resources:
 +            self._exclusive_resources[resource].add(source)
++
 +    # unregister_exclusive_interest()
 +    #
 +    # Clear the exclusive interest in these resources.
 +    #
 +    # This should be called by the given source which registered
 +    # an exclusive interest.
 +    #
 +    # Args:
 +    #    resources (set): Set of resources to reserve exclusively
 +    #    source (str): Source identifier, to be used again when unregistering
 +    #                  the interest.
 +    #
 +    def unregister_exclusive_interest(self, resources, source):
++
 +        for resource in resources:
 +            self._exclusive_resources[resource].remove(source)

buildstream/_scheduler/scheduler.py

@@ -28,7 +28,7 @@ from contextlib import contextmanager
  # Local imports
  from .resources import Resources, ResourceType
 -from .jobs import CacheSizeJob, CleanupJob
 +from .jobs import JobStatus, CacheSizeJob, CleanupJob
  # A decent return code for Scheduler.run()
@@ -38,14 +38,10 @@ class SchedStatus():
      TERMINATED = 1
 -# Our _REDUNDANT_EXCLUSIVE_ACTIONS jobs are special ones
 -# which we launch dynamically, they have the property of being
 -# meaningless to queue if one is already queued, and it also
 -# doesnt make sense to run them in parallel
 +# Some action names for the internal jobs we launch
+ #
  _ACTION_NAME_CLEANUP = 'cleanup'
  _ACTION_NAME_CACHE_SIZE = 'cache_size'
 -_REDUNDANT_EXCLUSIVE_ACTIONS = [_ACTION_NAME_CLEANUP, _ACTION_NAME_CACHE_SIZE]
  # Scheduler()
@@ -81,8 +77,6 @@ class Scheduler():
+         #
          # Public members
+         #
 -        self.active_jobs = []       # Jobs currently being run in the scheduler
 -        self.waiting_jobs = []      # Jobs waiting for resources
          self.queues = None          # Exposed for the frontend to print summaries
          self.context = context      # The Context object shared with Queues
          self.terminated = False     # Whether the scheduler was asked to terminate or has terminated
@@ -95,15 +89,23 @@ class Scheduler():
+         #
          # Private members
+         #
 +        self._active_jobs = []                # Jobs currently being run in the scheduler
 +        self._starttime = start_time          # Initial application start time
 +        self._suspendtime = None              # Session time compensation for suspended state
 +        self._queue_jobs = True               # Whether we should continue to queue jobs
++
 +        # State of cache management related jobs
 +        self._cache_size_scheduled = False    # Whether we have a cache size job scheduled
 +        self._cache_size_running = None       # A running CacheSizeJob, or None
 +        self._cleanup_scheduled = False       # Whether we have a cleanup job scheduled
 +        self._cleanup_running = None          # A running CleanupJob, or None
++
 +        # Callbacks to report back to the Scheduler owner
          self._interrupt_callback = interrupt_callback
          self._ticker_callback = ticker_callback
          self._job_start_callback = job_start_callback
          self._job_complete_callback = job_complete_callback
 -        self._starttime = start_time
 -        self._suspendtime = None
 -        self._queue_jobs = True      # Whether we should continue to queue jobs
+-
          # Whether our exclusive jobs, like 'cleanup' are currently already
          # waiting or active.
+         #
@@ -113,9 +115,9 @@ class Scheduler():
          self._exclusive_waiting = set()
          self._exclusive_active = set()
 -        self._resources = Resources(context.sched_builders,
 -                                    context.sched_fetchers,
 -                                    context.sched_pushers)
 +        self.resources = Resources(context.sched_builders,
 +                                   context.sched_fetchers,
 +                                   context.sched_pushers)
      # run()
+     #
@@ -150,7 +152,7 @@ class Scheduler():
          self._connect_signals()
          # Run the queues
 -        self._schedule_queue_jobs()
 +        self._sched()
          self.loop.run_forever()
          self.loop.close()
@@ -240,12 +242,14 @@ class Scheduler():
      #    status (JobStatus): The status of the completed job
+     #
      def job_completed(self, job, status):
 -        self._resources.clear_job_resources(job)
 -        self.active_jobs.remove(job)
 -        if job.action_name in _REDUNDANT_EXCLUSIVE_ACTIONS:
 -            self._exclusive_active.remove(job.action_name)
++
 +        # Remove from the active jobs list
 +        self._active_jobs.remove(job)
++
 +        # Scheduler owner facing callback
          self._job_complete_callback(job, status)
 -        self._schedule_queue_jobs()
++
 +        # Now check for more jobs
          self._sched()
      # check_cache_size():
@@ -255,78 +259,104 @@ class Scheduler():
      # if needed.
+     #
      def check_cache_size(self):
 -        job = CacheSizeJob(self, _ACTION_NAME_CACHE_SIZE,
 -                           'cache_size/cache_size',
 -                           resources=[ResourceType.CACHE,
 -                                      ResourceType.PROCESS],
 -                           complete_cb=self._run_cleanup)
 -        self._schedule_jobs([job])
++
 +        # Here we assume we are called in response to a job
 +        # completion callback, or before entering the scheduler.
 +        #
 +        # As such there is no need to call `_sched()` from here,
 +        # and we prefer to run it once at the last moment.
 +        #
 +        self._cache_size_scheduled = True
      #######################################################
      #                  Local Private Methods              #
      #######################################################
 -    # _sched()
 +    # _spawn_job()
+     #
 -    # The main driving function of the scheduler, it will be called
 -    # automatically when Scheduler.run() is called initially,
 +    # Spanws a job
+     #
 -    def _sched(self):
 -        for job in self.waiting_jobs:
 -            self._resources.reserve_exclusive_resources(job)
 +    # Args:
 +    #    job (Job): The job to spawn
 +    #
 +    def _spawn_job(self, job):
 +        job.spawn()
 +        self._active_jobs.append(job)
 +        if self._job_start_callback:
 +            self._job_start_callback(job)
 -        for job in self.waiting_jobs:
 -            if not self._resources.reserve_job_resources(job):
 -                continue
 +    # Callback for the cache size job
 +    def _cache_size_job_complete(self, status, cache_size):
 -            # Postpone these jobs if one is already running
 -            if job.action_name in _REDUNDANT_EXCLUSIVE_ACTIONS and \
 -               job.action_name in self._exclusive_active:
 -                continue
 +        # Deallocate cache size job resources
 +        self._cache_size_running = None
 +        self.resources.release([ResourceType.CACHE, ResourceType.PROCESS])
 -            job.spawn()
 -            self.waiting_jobs.remove(job)
 -            self.active_jobs.append(job)
 +        # Schedule a cleanup job if we've hit the threshold
 +        if status != JobStatus.OK:
 +            return
 -            if job.action_name in _REDUNDANT_EXCLUSIVE_ACTIONS:
 -                self._exclusive_waiting.remove(job.action_name)
 -                self._exclusive_active.add(job.action_name)
 +        context = self.context
 +        artifacts = context.artifactcache
 -            if self._job_start_callback:
 -                self._job_start_callback(job)
 +        if artifacts.has_quota_exceeded():
 +            self._cleanup_scheduled = True
 -        # If nothings ticking, time to bail out
 -        if not self.active_jobs and not self.waiting_jobs:
 -            self.loop.stop()
 +    # Callback for the cleanup job
 +    def _cleanup_job_complete(self, status, cache_size):
 -    # _schedule_jobs()
 -    #
 -    # The main entry point for jobs to be scheduled.
 -    #
 -    # This is called either as a result of scanning the queues
 -    # in _schedule_queue_jobs(), or directly by the Scheduler
 -    # to insert special jobs like cleanups.
 +        # Deallocate cleanup job resources
 +        self._cleanup_running = None
 +        self.resources.release([ResourceType.CACHE, ResourceType.PROCESS])
++
 +        # Unregister the exclusive interest when we're done with it
 +        if not self._cleanup_scheduled:
 +            self.resources.unregister_exclusive_interest(
 +                [ResourceType.CACHE], 'cache-cleanup'
 +            )
++
 +    # _sched_cleanup_job()
+     #
 -    # Args:
 -    #     jobs ([Job]): A list of jobs to schedule
 +    # Runs a cleanup job if one is scheduled to run now and
 +    # sufficient recources are available.
+     #
 -    def _schedule_jobs(self, jobs):
 -        for job in jobs:
 +    def _sched_cleanup_job(self):
 -            # Special treatment of our redundant exclusive jobs
 -            #
 -            if job.action_name in _REDUNDANT_EXCLUSIVE_ACTIONS:
 +        if self._cleanup_scheduled and self._cleanup_running is None:
++
 +            # Ensure we have an exclusive interest in the resources
 +            self.resources.register_exclusive_interest(
 +                [ResourceType.CACHE], 'cache-cleanup'
 +            )
++
 +            if self.resources.reserve([ResourceType.CACHE, ResourceType.PROCESS],
 +                                      [ResourceType.CACHE]):
 -                # Drop the job if one is already queued
 -                if job.action_name in self._exclusive_waiting:
 -                    continue
 +                # Update state and launch
 +                self._cleanup_scheduled = False
 +                self._cleanup_running = \
 +                    CleanupJob(self, _ACTION_NAME_CLEANUP, 'cleanup/cleanup',
 +                               complete_cb=self._cleanup_job_complete)
 +                self._spawn_job(self._cleanup_running)
 -                # Mark this action type as queued
 -                self._exclusive_waiting.add(job.action_name)
 +    # _sched_cache_size_job()
 +    #
 +    # Runs a cache size job if one is scheduled to run now and
 +    # sufficient recources are available.
 +    #
 +    def _sched_cache_size_job(self):
++
 +        if self._cache_size_scheduled and not self._cache_size_running:
 -            self.waiting_jobs.append(job)
 +            if self.resources.reserve([ResourceType.CACHE, ResourceType.PROCESS]):
 +                self._cache_size_scheduled = False
 +                self._cache_size_running = \
 +                    CacheSizeJob(self, _ACTION_NAME_CACHE_SIZE,
 +                                 'cache_size/cache_size',
 +                                 complete_cb=self._cache_size_job_complete)
 +                self._spawn_job(self._cache_size_running)
 -    # _schedule_queue_jobs()
 +    # _sched_queue_jobs()
+     #
      # Ask the queues what jobs they want to schedule and schedule
      # them. This is done here so we can ask for new jobs when jobs
@@ -335,7 +365,7 @@ class Scheduler():
      # This will process the Queues, pull elements through the Queues
      # and process anything that is ready.
+     #
 -    def _schedule_queue_jobs(self):
 +    def _sched_queue_jobs(self):
          ready = []
          process_queues = True
@@ -344,10 +374,7 @@ class Scheduler():
              # Pull elements forward through queues
              elements = []
              for queue in self.queues:
 -                # Enqueue elements complete from the last queue
                  queue.enqueue(elements)
+-
 -                # Dequeue processed elements for the next queue
                  elements = list(queue.dequeue())
              # Kickoff whatever processes can be processed at this time
@@ -362,41 +389,51 @@ class Scheduler():
              # thus need all the pulls to complete before ever starting
              # a build
              ready.extend(chain.from_iterable(
 -                queue.pop_ready_jobs() for queue in reversed(self.queues)
 +                q.harvest_jobs() for q in reversed(self.queues)
              ))
 -            # pop_ready_jobs() may have skipped jobs, adding them to
 -            # the done_queue.  Pull these skipped elements forward to
 -            # the next queue and process them.
 +            # harvest_jobs() may have decided to skip some jobs, making
 +            # them eligible for promotion to the next queue as a side effect.
 +            #
 +            # If that happens, do another round.
              process_queues = any(q.dequeue_ready() for q in self.queues)
 -        self._schedule_jobs(ready)
 -        self._sched()
 +        # Spawn the jobs
 +        #
 +        for job in ready:
 +            self._spawn_job(job)
 -    # _run_cleanup()
 -    #
 -    # Schedules the cache cleanup job if the passed size
 -    # exceeds the cache quota.
 +    # _sched()
+     #
 -    # Args:
 -    #    cache_size (int): The calculated cache size (ignored)
 +    # Run any jobs which are ready to run, or quit the main loop
 +    # when nothing is running or is ready to run.
+     #
 -    # NOTE: This runs in response to completion of the cache size
 -    #       calculation job lauched by Scheduler.check_cache_size(),
 -    #       which will report the calculated cache size.
 +    # This is the main driving function of the scheduler, it is called
 +    # initially when we enter Scheduler.run(), and at the end of whenever
 +    # any job completes, after any bussiness logic has occurred and before
 +    # going back to sleep.
+     #
 -    def _run_cleanup(self, cache_size):
 -        context = self.context
 -        artifacts = context.artifactcache
 +    def _sched(self):
 -        if not artifacts.has_quota_exceeded():
 -            return
 +        if not self.terminated:
++
 +            #
 +            # Try the cache management jobs
 +            #
 +            self._sched_cleanup_job()
 +            self._sched_cache_size_job()
++
 +            #
 +            # Run as many jobs as the queues can handle for the
 +            # available resources
 +            #
 +            self._sched_queue_jobs()
 -        job = CleanupJob(self, _ACTION_NAME_CLEANUP, 'cleanup/cleanup',
 -                         resources=[ResourceType.CACHE,
 -                                    ResourceType.PROCESS],
 -                         exclusive_resources=[ResourceType.CACHE])
 -        self._schedule_jobs([job])
 +        #
 +        # If nothing is ticking then bail out
 +        #
 +        if not self._active_jobs:
 +            self.loop.stop()
      # _suspend_jobs()
+     #
@@ -406,7 +443,7 @@ class Scheduler():
          if not self.suspended:
              self._suspendtime = datetime.datetime.now()
              self.suspended = True
 -            for job in self.active_jobs:
 +            for job in self._active_jobs:
                  job.suspend()
      # _resume_jobs()
@@ -415,7 +452,7 @@ class Scheduler():
+     #
      def _resume_jobs(self):
          if self.suspended:
 -            for job in self.active_jobs:
 +            for job in self._active_jobs:
                  job.resume()
              self.suspended = False
              self._starttime += (datetime.datetime.now() - self._suspendtime)
@@ -488,19 +525,16 @@ class Scheduler():
          wait_limit = 20.0
          # First tell all jobs to terminate
 -        for job in self.active_jobs:
 +        for job in self._active_jobs:
              job.terminate()
          # Now wait for them to really terminate
 -        for job in self.active_jobs:
 +        for job in self._active_jobs:
              elapsed = datetime.datetime.now() - wait_start
              timeout = max(wait_limit - elapsed.total_seconds(), 0.0)
              if not job.terminate_wait(timeout):
                  job.kill()
 -        # Clear out the waiting jobs
 -        self.waiting_jobs = []
+-
      # Regular timeout for driving status in the UI
      def _tick(self):
          elapsed = self.elapsed_time()

buildstream/sandbox/_sandboxremote.py

@@ -38,7 +38,7 @@ from .._protos.google.rpc import code_pb2
  from .._exceptions import SandboxError
  from .. import _yaml
  from .._protos.google.longrunning import operations_pb2, operations_pb2_grpc
 -from .._artifactcache.cascache import CASRemote, CASRemoteSpec
 +from .._cas import CASRemote, CASRemoteSpec
  class RemoteExecutionSpec(namedtuple('RemoteExecutionSpec', 'exec_service storage_service action_service')):
@@ -348,17 +348,17 @@ class SandboxRemote(Sandbox):
              except grpc.RpcError as e:
                  raise SandboxError("Failed to push source directory to remote: {}".format(e)) from e
 -            if not cascache.verify_digest_on_remote(casremote, upload_vdir.ref):
 +            if not casremote.verify_digest_on_remote(upload_vdir.ref):
                  raise SandboxError("Failed to verify that source has been pushed to the remote artifact cache.")
              # Push command and action
              try:
 -                cascache.push_message(casremote, command_proto)
 +                casremote.push_message(command_proto)
              except grpc.RpcError as e:
                  raise SandboxError("Failed to push command to remote: {}".format(e))
              try:
 -                cascache.push_message(casremote, action)
 +                casremote.push_message(action)
              except grpc.RpcError as e:
                  raise SandboxError("Failed to push action to remote: {}".format(e))

conftest.py

@@ -32,7 +32,7 @@ def pytest_addoption(parser):
  def pytest_runtest_setup(item):
 -    if item.get_marker('integration') and not item.config.getvalue('integration'):
 +    if item.get_closest_marker('integration') and not item.config.getvalue('integration'):
          pytest.skip('skipping integration test')

doc/source/using_configuring_artifact_server.rst

@@ -94,7 +94,7 @@ requiring BuildStream's more exigent dependencies by setting the
  Command reference
  ~~~~~~~~~~~~~~~~~
 -.. click:: buildstream._artifactcache.casserver:server_main
 +.. click:: buildstream._cas.casserver:server_main
     :prog: bst-artifact-server

tests/artifactcache/config.py

@@ -3,8 +3,7 @@ import pytest
  import itertools
  import os
 -from buildstream._artifactcache import ArtifactCacheSpec
 -from buildstream._artifactcache.artifactcache import _configured_remote_artifact_cache_specs
 +from buildstream._artifactcache import ArtifactCacheSpec, _configured_remote_artifact_cache_specs
  from buildstream._context import Context
  from buildstream._project import Project
  from buildstream.utils import _deduplicate

tests/artifactcache/expiry.py

@@ -342,13 +342,13 @@ def test_invalid_cache_quota(cli, datafiles, tmpdir, quota, success):
          total_space = 10000
      volume_space_patch = mock.patch(
 -        "buildstream._artifactcache.artifactcache.ArtifactCache._get_volume_space_info_for",
 +        "buildstream._artifactcache.ArtifactCache._get_volume_space_info_for",
          autospec=True,
          return_value=(free_space, total_space),
+     )
      cache_size_patch = mock.patch(
 -        "buildstream._artifactcache.artifactcache.ArtifactCache.get_cache_size",
 +        "buildstream._artifactcache.ArtifactCache.get_cache_size",
          autospec=True,
          return_value=0,
+     )

tests/frontend/order.py

@@ -12,7 +12,21 @@ DATA_DIR = os.path.join(
+ )
 -def create_element(repo, name, path, dependencies, ref=None):
 +# create_element()
 +#
 +# Args:
 +#    project (str): The project directory where testing is happening
 +#    name (str): The element name to create
 +#    dependencies (list): The list of dependencies to dump into YAML format
 +#
 +# Returns:
 +#    (Repo): The corresponding git repository created for the element
 +def create_element(project, name, dependencies):
 +    dev_files_path = os.path.join(project, 'files', 'dev-files')
 +    element_path = os.path.join(project, 'elements')
 +    repo = create_repo('git', project, "{}-repo".format(name))
 +    ref = repo.create(dev_files_path)
++
      element = {
          'kind': 'import',
          'sources': [
@@ -20,7 +34,9 @@ def create_element(repo, name, path, dependencies, ref=None):
          ],
          'depends': dependencies
+     }
 -    _yaml.dump(element, os.path.join(path, name))
 +    _yaml.dump(element, os.path.join(element_path, name))
++
 +    return repo
  # This tests a variety of scenarios and checks that the order in
@@ -59,18 +75,6 @@ def create_element(repo, name, path, dependencies, ref=None):
  @pytest.mark.parametrize("operation", [('show'), ('fetch'), ('build')])
  def test_order(cli, datafiles, tmpdir, operation, target, template, expected):
      project = os.path.join(datafiles.dirname, datafiles.basename)
 -    dev_files_path = os.path.join(project, 'files', 'dev-files')
 -    element_path = os.path.join(project, 'elements')
+-
 -    # FIXME: Remove this when the test passes reliably.
 -    #
 -    #        There is no reason why the order should not
 -    #        be preserved when the builders is set to 1,
 -    #        the scheduler queue processing still seems to
 -    #        be losing the order.
 -    #
 -    if operation == 'build':
 -        pytest.skip("FIXME: This still only sometimes passes")
      # Configure to only allow one fetcher at a time, make it easy to
      # determine what is being planned in what order.
@@ -84,11 +88,8 @@ def test_order(cli, datafiles, tmpdir, operation, target, template, expected):
      # Build the project from the template, make import elements
      # all with the same repo
+     #
 -    repo = create_repo('git', str(tmpdir))
 -    ref = repo.create(dev_files_path)
      for element, dependencies in template.items():
 -        create_element(repo, element, element_path, dependencies, ref=ref)
 -        repo.add_commit()
 +        create_element(project, element, dependencies)
      # Run test and collect results
      if operation == 'show':

tests/sandboxes/storage-tests.py

@@ -3,7 +3,7 @@ import pytest
  from buildstream._exceptions import ErrorDomain
 -from buildstream._artifactcache.cascache import CASCache
 +from buildstream._cas import CASCache
  from buildstream.storage._casbaseddirectory import CasBasedDirectory
  from buildstream.storage._filebaseddirectory import FileBasedDirectory

tests/sources/git.py

@@ -883,6 +883,195 @@ def test_git_describe(cli, tmpdir, datafiles, ref_storage, tag_type):
      assert p.returncode != 0
 +@pytest.mark.skipif(HAVE_GIT is False, reason="git is not available")
 +@pytest.mark.datafiles(os.path.join(DATA_DIR, 'template'))
 +@pytest.mark.parametrize("ref_storage", [('inline'), ('project.refs')])
 +@pytest.mark.parametrize("tag_type", [('annotated'), ('lightweight')])
 +def test_git_describe_head_is_tagged(cli, tmpdir, datafiles, ref_storage, tag_type):
 +    project = str(datafiles)
++
 +    project_config = _yaml.load(os.path.join(project, 'project.conf'))
 +    project_config['ref-storage'] = ref_storage
 +    _yaml.dump(_yaml.node_sanitize(project_config), os.path.join(project, 'project.conf'))
++
 +    repofiles = os.path.join(str(tmpdir), 'repofiles')
 +    os.makedirs(repofiles, exist_ok=True)
 +    file0 = os.path.join(repofiles, 'file0')
 +    with open(file0, 'w') as f:
 +        f.write('test\n')
++
 +    repo = create_repo('git', str(tmpdir))
++
 +    def tag(name):
 +        if tag_type == 'annotated':
 +            repo.add_annotated_tag(name, name)
 +        else:
 +            repo.add_tag(name)
++
 +    ref = repo.create(repofiles)
 +    tag('uselesstag')
++
 +    file1 = os.path.join(str(tmpdir), 'file1')
 +    with open(file1, 'w') as f:
 +        f.write('test\n')
 +    repo.add_file(file1)
++
 +    file2 = os.path.join(str(tmpdir), 'file2')
 +    with open(file2, 'w') as f:
 +        f.write('test\n')
 +    repo.branch('branch2')
 +    repo.add_file(file2)
++
 +    repo.checkout('master')
 +    file3 = os.path.join(str(tmpdir), 'file3')
 +    with open(file3, 'w') as f:
 +        f.write('test\n')
 +    repo.add_file(file3)
++
 +    tagged_ref = repo.merge('branch2')
 +    tag('tag')
++
 +    config = repo.source_config()
 +    config['track'] = repo.latest_commit()
 +    config['track-tags'] = True
++
 +    # Write out our test target
 +    element = {
 +        'kind': 'import',
 +        'sources': [
 +            config
 +        ],
 +    }
 +    element_path = os.path.join(project, 'target.bst')
 +    _yaml.dump(element, element_path)
++
 +    if ref_storage == 'inline':
 +        result = cli.run(project=project, args=['source', 'track', 'target.bst'])
 +        result.assert_success()
 +    else:
 +        result = cli.run(project=project, args=['source', 'track', 'target.bst', '--deps', 'all'])
 +        result.assert_success()
++
 +    if ref_storage == 'inline':
 +        element = _yaml.load(element_path)
 +        tags = _yaml.node_sanitize(element['sources'][0]['tags'])
 +        assert len(tags) == 1
 +        for tag in tags:
 +            assert 'tag' in tag
 +            assert 'commit' in tag
 +            assert 'annotated' in tag
 +            assert tag['annotated'] == (tag_type == 'annotated')
++
 +        assert set([(tag['tag'], tag['commit']) for tag in tags]) == set([('tag', repo.rev_parse('tag^{commit}'))])
++
 +    checkout = os.path.join(str(tmpdir), 'checkout')
++
 +    result = cli.run(project=project, args=['build', 'target.bst'])
 +    result.assert_success()
 +    result = cli.run(project=project, args=['checkout', 'target.bst', checkout])
 +    result.assert_success()
++
 +    if tag_type == 'annotated':
 +        options = []
 +    else:
 +        options = ['--tags']
 +    describe = subprocess.check_output(['git', 'describe'] + options,
 +                                       cwd=checkout).decode('ascii')
 +    assert describe.startswith('tag')
++
 +    tags = subprocess.check_output(['git', 'tag'],
 +                                   cwd=checkout).decode('ascii')
 +    tags = set(tags.splitlines())
 +    assert tags == set(['tag'])
++
 +    rev_list = subprocess.check_output(['git', 'rev-list', '--all'],
 +                                       cwd=checkout).decode('ascii')
++
 +    assert set(rev_list.splitlines()) == set([tagged_ref])
++
 +    p = subprocess.run(['git', 'log', repo.rev_parse('uselesstag')],
 +                       cwd=checkout)
 +    assert p.returncode != 0
++
++
 +@pytest.mark.skipif(HAVE_GIT is False, reason="git is not available")
 +@pytest.mark.datafiles(os.path.join(DATA_DIR, 'template'))
 +def test_git_describe_relevant_history(cli, tmpdir, datafiles):
 +    project = str(datafiles)
++
 +    project_config = _yaml.load(os.path.join(project, 'project.conf'))
 +    project_config['ref-storage'] = 'project.refs'
 +    _yaml.dump(_yaml.node_sanitize(project_config), os.path.join(project, 'project.conf'))
++
 +    repofiles = os.path.join(str(tmpdir), 'repofiles')
 +    os.makedirs(repofiles, exist_ok=True)
 +    file0 = os.path.join(repofiles, 'file0')
 +    with open(file0, 'w') as f:
 +        f.write('test\n')
++
 +    repo = create_repo('git', str(tmpdir))
 +    repo.create(repofiles)
++
 +    file1 = os.path.join(str(tmpdir), 'file1')
 +    with open(file1, 'w') as f:
 +        f.write('test\n')
 +    repo.add_file(file1)
 +    repo.branch('branch')
 +    repo.checkout('master')
++
 +    file2 = os.path.join(str(tmpdir), 'file2')
 +    with open(file2, 'w') as f:
 +        f.write('test\n')
 +    repo.add_file(file2)
++
 +    file3 = os.path.join(str(tmpdir), 'file3')
 +    with open(file3, 'w') as f:
 +        f.write('test\n')
 +    branch_boundary = repo.add_file(file3)
++
 +    repo.checkout('branch')
 +    file4 = os.path.join(str(tmpdir), 'file4')
 +    with open(file4, 'w') as f:
 +        f.write('test\n')
 +    tagged_ref = repo.add_file(file4)
 +    repo.add_annotated_tag('tag1', 'tag1')
++
 +    head = repo.merge('master')
++
 +    config = repo.source_config()
 +    config['track'] = head
 +    config['track-tags'] = True
++
 +    # Write out our test target
 +    element = {
 +        'kind': 'import',
 +        'sources': [
 +            config
 +        ],
 +    }
 +    element_path = os.path.join(project, 'target.bst')
 +    _yaml.dump(element, element_path)
++
 +    result = cli.run(project=project, args=['source', 'track', 'target.bst', '--deps', 'all'])
 +    result.assert_success()
++
 +    checkout = os.path.join(str(tmpdir), 'checkout')
++
 +    result = cli.run(project=project, args=['build', 'target.bst'])
 +    result.assert_success()
 +    result = cli.run(project=project, args=['checkout', 'target.bst', checkout])
 +    result.assert_success()
++
 +    describe = subprocess.check_output(['git', 'describe'],
 +                                       cwd=checkout).decode('ascii')
 +    assert describe.startswith('tag1-2-')
++
 +    rev_list = subprocess.check_output(['git', 'rev-list', '--all'],
 +                                       cwd=checkout).decode('ascii')
++
 +    assert set(rev_list.splitlines()) == set([head, tagged_ref, branch_boundary])
++
++
  @pytest.mark.skipif(HAVE_GIT is False, reason="git is not available")
  @pytest.mark.datafiles(os.path.join(DATA_DIR, 'template'))
  def test_default_do_not_track_tags(cli, tmpdir, datafiles):

tests/storage/virtual_directory_import.py

@@ -8,7 +8,7 @@ from tests.testutils import cli
  from buildstream.storage._casbaseddirectory import CasBasedDirectory
  from buildstream.storage._filebaseddirectory import FileBasedDirectory
  from buildstream._artifactcache import ArtifactCache
 -from buildstream._artifactcache.cascache import CASCache
 +from buildstream._cas import CASCache
  from buildstream import utils

tests/testutils/artifactshare.py

@@ -11,8 +11,8 @@ from multiprocessing import Process, Queue
  import pytest_cov
  from buildstream import _yaml
 -from buildstream._artifactcache.cascache import CASCache
 -from buildstream._artifactcache.casserver import create_server
 +from buildstream._cas import CASCache
 +from buildstream._cas.casserver import create_server
  from buildstream._exceptions import CASError
  from buildstream._protos.build.bazel.remote.execution.v2 import remote_execution_pb2

tests/utils/misc.py

@@ -23,7 +23,7 @@ def test_parse_size_over_1024T(cli, tmpdir):
      _yaml.dump({'name': 'main'}, str(project.join("project.conf")))
      volume_space_patch = mock.patch(
 -        "buildstream._artifactcache.artifactcache.ArtifactCache._get_volume_space_info_for",
 +        "buildstream._artifactcache.ArtifactCache._get_volume_space_info_for",
          autospec=True,
          return_value=(1025 * TiB, 1025 * TiB)
+     )

[Notes] [Git][BuildStream/buildstream][tristan/element-processing-order] 17 commits: .gitlab-ci.yml: Add tests for python 3.7

Tristan Van Berkom pushed to branch tristan/element-processing-order at BuildStream / buildstream

Commits:

27 changed files:

Changes: