[Notes] [Git][BuildStream/buildstream][valentindavid/cache_server_fill

Valentin David pushed to branch valentindavid/cache_server_fill_up at BuildStream / buildstream

Commits:

082b3811

by Angelos Evripiotis at 2018-11-15T19:48:12Z

contributing: add guidance on unit tests

Decrease uncertainty around whether unit tests are welcome in the
project or not.

b5b79056

by Jürg Billeter at 2018-11-15T20:24:58Z

Merge branch 'aevri/unit_tests' into 'master'

contributing: add guidance on unit tests

See merge request BuildStream/buildstream!943

5b839b3d

by Valentin David at 2018-11-16T20:59:03Z

Use f_bavail to query available space. Not f_bfree.

f_bfree space might not be usable. In practice we see failures in big
disks because f_bfree is over 2GB and f_bavail is 0. We get ENOSPC if
writing on disk then.

89bd9b98

by Valentin David at 2018-11-16T20:59:03Z

Make cache clients not fail when a blob is not available.

We plan to make cache incomplete. That is some blobs are missing.  For
most of cases we will delete references when requested if they are
incomplete. But there will be corner cases where objects are removed
after the reference is requested.

f96606a5
by Valentin David at 2018-11-16T20:59:03Z
```
Fix type error in RPC messages
```

f1e54c28

by Valentin David at 2018-11-16T20:59:03Z

Avoid copying temporary file when adding object to CAS in server.

The file is already a temporary file and does not need copy.  ENOSPC
is thrown during that copy in issue #609.

Fixes #678.

b2557ba6

by Valentin David at 2018-11-16T20:59:03Z

"Fallocate" object disk space to avoid getting NOSPC error when writing

This locks the temporary object file so that cleanup does not need to
be done for every write.

5cb96d0f

by Valentin David at 2018-11-16T20:59:03Z

Update mtimes of objects for requested references.

This also remove references when some objects are missing. This is in
preparation for the move from reference to object garbage collection.

fe32945e

by Valentin David at 2018-11-16T20:59:03Z

Move cas server from ref-based to object-based garbage collection.

72aae91a

by Valentin David at 2018-11-16T20:59:03Z

Cleanup cache in cas server more agressively

When there is less than 2GB left, it cleans up have 10GB available.
These values are configurable.

f23fb826

by Valentin David at 2018-11-16T20:59:03Z

Lock cache cleanup in cas server

Cleaning up in parallel might slow down the cleaning process

Changes:

CONTRIBUTING.rst

@@ -1547,6 +1547,24 @@ Tests that run a sandbox should be decorated with::
  and use the integration cli helper.
 +You should first aim to write tests that exercise your changes from the cli.
 +This is so that the testing is end-to-end, and the changes are guaranteed to
 +work for the end-user. The cli is considered stable, and so tests written in
 +terms of it are unlikely to require updating as the internals of the software
 +change over time.
++
 +It may be impractical to sufficiently examine some changes this way. For
 +example, the number of cases to test and the running time of each test may be
 +too high. It may also be difficult to contrive circumstances to cover every
 +line of the change. If this is the case, next you can consider also writing
 +unit tests that work more directly on the changes.
++
 +It is important to write unit tests in such a way that they do not break due to
 +changes unrelated to what they are meant to test. For example, if the test
 +relies on a lot of BuildStream internals, a large refactoring will likely
 +require the test to be rewritten. Pure functions that only rely on the Python
 +Standard Library are excellent candidates for unit testing.
++
  Measuring performance
  ---------------------

buildstream/_artifactcache/cascache.py

@@ -25,6 +25,7 @@ import stat
  import tempfile
  import uuid
  import errno
 +import contextlib
  from urllib.parse import urlparse
  import grpc
@@ -43,6 +44,13 @@ from .._exceptions import CASError
  _MAX_PAYLOAD_BYTES = 1024 * 1024
 +class BlobNotFound(CASError):
++
 +    def __init__(self, blob, msg):
 +        self.blob = blob
 +        super().__init__(msg)
++
++
  # A CASCache manages a CAS repository as specified in the Remote Execution API.
+ #
  # Args:
@@ -219,6 +227,8 @@ class CASCache():
                  raise CASError("Failed to pull ref {}: {}".format(ref, e)) from e
              else:
                  return False
 +        except BlobNotFound as e:
 +            return False
      # pull_tree():
+     #
@@ -391,13 +401,14 @@ class CASCache():
      #     digest (Digest): An optional Digest object to populate
      #     path (str): Path to file to add
      #     buffer (bytes): Byte buffer to add
 +    #     link_directly (bool): Whether file given by path can be linked
+     #
      # Returns:
      #     (Digest): The digest of the added object
+     #
      # Either `path` or `buffer` must be passed, but not both.
+     #
 -    def add_object(self, *, digest=None, path=None, buffer=None):
 +    def add_object(self, *, digest=None, path=None, buffer=None, link_directly=False):
          # Exactly one of the two parameters has to be specified
          assert (path is None) != (buffer is None)
@@ -407,28 +418,34 @@ class CASCache():
          try:
              h = hashlib.sha256()
              # Always write out new file to avoid corruption if input file is modified
 -            with tempfile.NamedTemporaryFile(dir=self.tmpdir) as out:
 -                # Set mode bits to 0644
 -                os.chmod(out.name, stat.S_IRUSR | stat.S_IWUSR | stat.S_IRGRP | stat.S_IROTH)
+-
 -                if path:
 -                    with open(path, 'rb') as f:
 -                        for chunk in iter(lambda: f.read(4096), b""):
 -                            h.update(chunk)
 -                            out.write(chunk)
 +            with contextlib.ExitStack() as stack:
 +                if path is not None and link_directly:
 +                    tmp = stack.enter_context(open(path, 'rb'))
 +                    for chunk in iter(lambda: tmp.read(4096), b""):
 +                        h.update(chunk)
                  else:
 -                    h.update(buffer)
 -                    out.write(buffer)
 +                    tmp = stack.enter_context(tempfile.NamedTemporaryFile(dir=self.tmpdir))
 +                    # Set mode bits to 0644
 +                    os.chmod(tmp.name, stat.S_IRUSR | stat.S_IWUSR | stat.S_IRGRP | stat.S_IROTH)
 -                out.flush()
 +                    if path:
 +                        with open(path, 'rb') as f:
 +                            for chunk in iter(lambda: f.read(4096), b""):
 +                                h.update(chunk)
 +                                tmp.write(chunk)
 +                    else:
 +                        h.update(buffer)
 +                        tmp.write(buffer)
++
 +                    tmp.flush()
                  digest.hash = h.hexdigest()
 -                digest.size_bytes = os.fstat(out.fileno()).st_size
 +                digest.size_bytes = os.fstat(tmp.fileno()).st_size
                  # Place file at final location
                  objpath = self.objpath(digest)
                  os.makedirs(os.path.dirname(objpath), exist_ok=True)
 -                os.link(out.name, objpath)
 +                os.link(tmp.name, objpath)
          except FileExistsError as e:
              # We can ignore the failed link() if the object is already in the repo.
@@ -526,6 +543,41 @@ class CASCache():
          # first ref of this list will be the file modified earliest.
          return [ref for _, ref in sorted(zip(mtimes, refs))]
 +    # list_objects():
 +    #
 +    # List cached objects in Least Recently Modified (LRM) order.
 +    #
 +    # Returns:
 +    #     (list) - A list of objects and timestamps in LRM order
 +    #
 +    def list_objects(self):
 +        objs = []
 +        mtimes = []
++
 +        for root, _, files in os.walk(os.path.join(self.casdir, 'objects')):
 +            for filename in files:
 +                obj_path = os.path.join(root, filename)
 +                try:
 +                    mtimes.append(os.path.getmtime(obj_path))
 +                except FileNotFoundError:
 +                    pass
 +                else:
 +                    objs.append(obj_path)
++
 +        # NOTE: Sorted will sort from earliest to latest, thus the
 +        # first element of this list will be the file modified earliest.
 +        return sorted(zip(mtimes, objs))
++
 +    def clean_up_refs_until(self, time):
 +        ref_heads = os.path.join(self.casdir, 'refs', 'heads')
++
 +        for root, _, files in os.walk(ref_heads):
 +            for filename in files:
 +                ref_path = os.path.join(root, filename)
 +                # Obtain the mtime (the time a file was last modified)
 +                if os.path.getmtime(ref_path) < time:
 +                    os.unlink(ref_path)
++
      # remove():
+     #
      # Removes the given symbolic ref from the repo.
@@ -585,6 +637,10 @@ class CASCache():
          return pruned
 +    def update_tree_mtime(self, tree):
 +        reachable = set()
 +        self._reachable_refs_dir(reachable, tree, update_mtime=True)
++
      ################################################
      #             Local Private Methods            #
      ################################################
@@ -729,10 +785,13 @@ class CASCache():
                  a += 1
                  b += 1
 -    def _reachable_refs_dir(self, reachable, tree):
 +    def _reachable_refs_dir(self, reachable, tree, update_mtime=False):
          if tree.hash in reachable:
              return
 +        if update_mtime:
 +            os.utime(self.objpath(tree))
++
          reachable.add(tree.hash)
          directory = remote_execution_pb2.Directory()
@@ -741,10 +800,12 @@ class CASCache():
              directory.ParseFromString(f.read())
          for filenode in directory.files:
 +            if update_mtime:
 +                os.utime(self.objpath(filenode.digest))
              reachable.add(filenode.digest.hash)
          for dirnode in directory.directories:
 -            self._reachable_refs_dir(reachable, dirnode.digest)
 +            self._reachable_refs_dir(reachable, dirnode.digest, update_mtime=update_mtime)
      def _required_blobs(self, directory_digest):
          # parse directory, and recursively add blobs
@@ -798,7 +859,7 @@ class CASCache():
          with tempfile.NamedTemporaryFile(dir=self.tmpdir) as f:
              self._fetch_blob(remote, digest, f)
 -            added_digest = self.add_object(path=f.name)
 +            added_digest = self.add_object(path=f.name, link_directly=True)
              assert added_digest.hash == digest.hash
          return objpath
@@ -809,7 +870,7 @@ class CASCache():
                  f.write(data)
                  f.flush()
 -                added_digest = self.add_object(path=f.name)
 +                added_digest = self.add_object(path=f.name, link_directly=True)
                  assert added_digest.hash == digest.hash
      # Helper function for _fetch_directory().
@@ -1113,6 +1174,9 @@ class _CASBatchRead():
          batch_response = self._remote.cas.BatchReadBlobs(self._request)
          for response in batch_response.responses:
 +            if response.status.code == code_pb2.NOT_FOUND:
 +                raise BlobNotFound(response.digest.hash, "Failed to download blob {}: {}".format(
 +                    response.digest.hash, response.status.code))
              if response.status.code != code_pb2.OK:
                  raise CASError("Failed to download blob {}: {}".format(
                      response.digest.hash, response.status.code))

buildstream/_artifactcache/casserver.py

@@ -24,6 +24,9 @@ import signal
  import sys
  import tempfile
  import uuid
 +import errno
 +import ctypes
 +import threading
  import click
  import grpc
@@ -31,6 +34,7 @@ import grpc
  from .._protos.build.bazel.remote.execution.v2 import remote_execution_pb2, remote_execution_pb2_grpc
  from .._protos.google.bytestream import bytestream_pb2, bytestream_pb2_grpc
  from .._protos.buildstream.v2 import buildstream_pb2, buildstream_pb2_grpc
 +from .._protos.google.rpc import code_pb2
  from .._exceptions import CASError
@@ -55,18 +59,22 @@ class ArtifactTooLargeException(Exception):
  #     repo (str): Path to CAS repository
  #     enable_push (bool): Whether to allow blob uploads and artifact updates
+ #
 -def create_server(repo, *, enable_push):
 +def create_server(repo, *, enable_push,
 +                  max_head_size=int(10e9),
 +                  min_head_size=int(2e9)):
      cas = CASCache(os.path.abspath(repo))
      # Use max_workers default from Python 3.5+
      max_workers = (os.cpu_count() or 1) * 5
      server = grpc.server(futures.ThreadPoolExecutor(max_workers))
 +    cache_cleaner = _CacheCleaner(cas, max_head_size, min_head_size)
++
      bytestream_pb2_grpc.add_ByteStreamServicer_to_server(
 -        _ByteStreamServicer(cas, enable_push=enable_push), server)
 +        _ByteStreamServicer(cas, cache_cleaner, enable_push=enable_push), server)
      remote_execution_pb2_grpc.add_ContentAddressableStorageServicer_to_server(
 -        _ContentAddressableStorageServicer(cas, enable_push=enable_push), server)
 +        _ContentAddressableStorageServicer(cas, cache_cleaner, enable_push=enable_push), server)
      remote_execution_pb2_grpc.add_CapabilitiesServicer_to_server(
          _CapabilitiesServicer(), server)
@@ -84,9 +92,19 @@ def create_server(repo, *, enable_push):
  @click.option('--client-certs', help="Public client certificates for TLS (PEM-encoded)")
  @click.option('--enable-push', default=False, is_flag=True,
                help="Allow clients to upload blobs and update artifact cache")
 +@click.option('--head-room-min', type=click.INT,
 +              help="Disk head room minimum in bytes",
 +              default=2e9)
 +@click.option('--head-room-max', type=click.INT,
 +              help="Disk head room maximum in bytes",
 +              default=10e9)
  @click.argument('repo')
 -def server_main(repo, port, server_key, server_cert, client_certs, enable_push):
 -    server = create_server(repo, enable_push=enable_push)
 +def server_main(repo, port, server_key, server_cert, client_certs, enable_push,
 +                head_room_min, head_room_max):
 +    server = create_server(repo,
 +                           max_head_size=head_room_max,
 +                           min_head_size=head_room_min,
 +                           enable_push=enable_push)
      use_tls = bool(server_key)
@@ -127,11 +145,43 @@ def server_main(repo, port, server_key, server_cert, client_certs, enable_push):
          server.stop(0)
 +class _FallocateCall:
++
 +    FALLOC_FL_KEEP_SIZE = 1
 +    FALLOC_FL_PUNCH_HOLE = 2
 +    FALLOC_FL_NO_HIDE_STALE = 4
 +    FALLOC_FL_COLLAPSE_RANGE = 8
 +    FALLOC_FL_ZERO_RANGE = 16
 +    FALLOC_FL_INSERT_RANGE = 32
 +    FALLOC_FL_UNSHARE_RANGE = 64
++
 +    def __init__(self):
 +        self.libc = ctypes.CDLL("libc.so.6", use_errno=True)
 +        try:
 +            self.fallocate64 = self.libc.fallocate64
 +        except AttributeError:
 +            self.fallocate = self.libc.fallocate
++
 +    def __call__(self, fd, mode, offset, length):
 +        if hasattr(self, 'fallocate64'):
 +            ret = self.fallocate64(ctypes.c_int(fd), ctypes.c_int(mode),
 +                                   ctypes.c_int64(offset), ctypes.c_int64(length))
 +        else:
 +            ret = self.fallocate(ctypes.c_int(fd), ctypes.c_int(mode),
 +                                 ctypes.c_int(offset), ctypes.c_int(length))
 +        if ret == -1:
 +            err = ctypes.get_errno()
 +            raise OSError(errno, os.strerror(err))
 +        return ret
++
++
  class _ByteStreamServicer(bytestream_pb2_grpc.ByteStreamServicer):
 -    def __init__(self, cas, *, enable_push):
 +    def __init__(self, cas, cache_cleaner, *, enable_push):
          super().__init__()
          self.cas = cas
          self.enable_push = enable_push
 +        self.fallocate = _FallocateCall()
 +        self.cache_cleaner = cache_cleaner
      def Read(self, request, context):
          resource_name = request.resource_name
@@ -189,17 +239,34 @@ class _ByteStreamServicer(bytestream_pb2_grpc.ByteStreamServicer):
                          context.set_code(grpc.StatusCode.NOT_FOUND)
                          return response
 -                    try:
 -                        _clean_up_cache(self.cas, client_digest.size_bytes)
 -                    except ArtifactTooLargeException as e:
 -                        context.set_code(grpc.StatusCode.RESOURCE_EXHAUSTED)
 -                        context.set_details(str(e))
 -                        return response
 +                    while True:
 +                        if client_digest.size_bytes == 0:
 +                            break
 +                        try:
 +                            self.cache_cleaner.clean_up(client_digest.size_bytes)
 +                        except ArtifactTooLargeException as e:
 +                            context.set_code(grpc.StatusCode.RESOURCE_EXHAUSTED)
 +                            context.set_details(str(e))
 +                            return response
++
 +                        try:
 +                            self.fallocate(out.fileno(), 0, 0, client_digest.size_bytes)
 +                            break
 +                        except OSError as e:
 +                            # Multiple upload can happen in the same time
 +                            if e.errno != errno.ENOSPC:
 +                                raise
++
                  elif request.resource_name:
                      # If it is set on subsequent calls, it **must** match the value of the first request.
                      if request.resource_name != resource_name:
                          context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
                          return response
++
 +                if (offset + len(request.data)) > client_digest.size_bytes:
 +                    context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
 +                    return response
++
                  out.write(request.data)
                  offset += len(request.data)
                  if request.finish_write:
@@ -207,7 +274,7 @@ class _ByteStreamServicer(bytestream_pb2_grpc.ByteStreamServicer):
                          context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
                          return response
                      out.flush()
 -                    digest = self.cas.add_object(path=out.name)
 +                    digest = self.cas.add_object(path=out.name, link_directly=True)
                      if digest.hash != client_digest.hash:
                          context.set_code(grpc.StatusCode.FAILED_PRECONDITION)
                          return response
@@ -220,18 +287,26 @@ class _ByteStreamServicer(bytestream_pb2_grpc.ByteStreamServicer):
  class _ContentAddressableStorageServicer(remote_execution_pb2_grpc.ContentAddressableStorageServicer):
 -    def __init__(self, cas, *, enable_push):
 +    def __init__(self, cas, cache_cleaner, *, enable_push):
          super().__init__()
          self.cas = cas
          self.enable_push = enable_push
 +        self.cache_cleaner = cache_cleaner
      def FindMissingBlobs(self, request, context):
          response = remote_execution_pb2.FindMissingBlobsResponse()
          for digest in request.blob_digests:
 -            if not _has_object(self.cas, digest):
 -                d = response.missing_blob_digests.add()
 -                d.hash = digest.hash
 -                d.size_bytes = digest.size_bytes
 +            objpath = self.cas.objpath(digest)
 +            try:
 +                os.utime(objpath)
 +            except OSError as e:
 +                if e.errno != errno.ENOENT:
 +                    raise
 +                else:
 +                    d = response.missing_blob_digests.add()
 +                    d.hash = digest.hash
 +                    d.size_bytes = digest.size_bytes
++
          return response
      def BatchReadBlobs(self, request, context):
@@ -250,12 +325,12 @@ class _ContentAddressableStorageServicer(remote_execution_pb2_grpc.ContentAddres
              try:
                  with open(self.cas.objpath(digest), 'rb') as f:
                      if os.fstat(f.fileno()).st_size != digest.size_bytes:
 -                        blob_response.status.code = grpc.StatusCode.NOT_FOUND
 +                        blob_response.status.code = code_pb2.NOT_FOUND
                          continue
                      blob_response.data = f.read(digest.size_bytes)
              except FileNotFoundError:
 -                blob_response.status.code = grpc.StatusCode.NOT_FOUND
 +                blob_response.status.code = code_pb2.NOT_FOUND
          return response
@@ -285,7 +360,7 @@ class _ContentAddressableStorageServicer(remote_execution_pb2_grpc.ContentAddres
                  continue
              try:
 -                _clean_up_cache(self.cas, digest.size_bytes)
 +                self.cache_cleaner.clean_up(digest.size_bytes)
                  with tempfile.NamedTemporaryFile(dir=self.cas.tmpdir) as out:
                      out.write(blob_request.data)
@@ -328,6 +403,12 @@ class _ReferenceStorageServicer(buildstream_pb2_grpc.ReferenceStorageServicer):
          try:
              tree = self.cas.resolve_ref(request.key, update_mtime=True)
 +            try:
 +                self.cas.update_tree_mtime(tree)
 +            except FileNotFoundError:
 +                self.cas.remove(request.key, defer_prune=True)
 +                context.set_code(grpc.StatusCode.NOT_FOUND)
 +                return response
              response.digest.hash = tree.hash
              response.digest.size_bytes = tree.size_bytes
@@ -400,60 +481,79 @@ def _digest_from_upload_resource_name(resource_name):
          return None
 -def _has_object(cas, digest):
 -    objpath = cas.objpath(digest)
 -    return os.path.exists(objpath)
 +class _CacheCleaner:
 +    __cleanup_cache_lock = threading.Lock()
 -# _clean_up_cache()
 -#
 -# Keep removing Least Recently Pushed (LRP) artifacts in a cache until there
 -# is enough space for the incoming artifact
 -#
 -# Args:
 -#   cas: CASCache object
 -#   object_size: The size of the object being received in bytes
 -#
 -# Returns:
 -#   int: The total bytes removed on the filesystem
 -#
 -def _clean_up_cache(cas, object_size):
 -    # Determine the available disk space, in bytes, of the file system
 -    # which mounts the repo
 -    stats = os.statvfs(cas.casdir)
 -    buffer_ = int(2e9)                # Add a 2 GB buffer
 -    free_disk_space = (stats.f_bfree * stats.f_bsize) - buffer_
 -    total_disk_space = (stats.f_blocks * stats.f_bsize) - buffer_
+-
 -    if object_size > total_disk_space:
 -        raise ArtifactTooLargeException("Artifact of size: {} is too large for "
 -                                        "the filesystem which mounts the remote "
 -                                        "cache".format(object_size))
+-
 -    if object_size <= free_disk_space:
 -        # No need to clean up
 -        return 0
+-
 -    # obtain a list of LRP artifacts
 -    LRP_artifacts = cas.list_refs()
+-
 -    removed_size = 0  # in bytes
 -    while object_size - removed_size > free_disk_space:
 -        try:
 -            to_remove = LRP_artifacts.pop(0)  # The first element in the list is the LRP artifact
 -        except IndexError:
 -            # This exception is caught if there are no more artifacts in the list
 -            # LRP_artifacts. This means the the artifact is too large for the filesystem
 -            # so we abort the process
 -            raise ArtifactTooLargeException("Artifact of size {} is too large for "
 -                                            "the filesystem which mounts the remote "
 -                                            "cache".format(object_size))
 +    def __init__(self, cas, max_head_size, min_head_size=int(2e9)):
 +        self.__cas = cas
 +        self.__max_head_size = max_head_size
 +        self.__min_head_size = min_head_size
 -        removed_size += cas.remove(to_remove, defer_prune=False)
 +    def __has_space(self, object_size):
 +        stats = os.statvfs(self.__cas.casdir)
 +        free_disk_space = (stats.f_bavail * stats.f_bsize) - self.__min_head_size
 +        total_disk_space = (stats.f_blocks * stats.f_bsize) - self.__min_head_size
 -    if removed_size > 0:
 -        logging.info("Successfully removed {} bytes from the cache".format(removed_size))
 -    else:
 -        logging.info("No artifacts were removed from the cache.")
 +        if object_size > total_disk_space:
 +            raise ArtifactTooLargeException("Artifact of size: {} is too large for "
 +                                            "the filesystem which mounts the remote "
 +                                            "cache".format(object_size))
 -    return removed_size
 +        return object_size <= free_disk_space
++
 +    # _clean_up_cache()
 +    #
 +    # Keep removing Least Recently Pushed (LRP) artifacts in a cache until there
 +    # is enough space for the incoming artifact
 +    #
 +    # Args:
 +    #   object_size: The size of the object being received in bytes
 +    #
 +    # Returns:
 +    #   int: The total bytes removed on the filesystem
 +    #
 +    def clean_up(self, object_size):
 +        if self.__has_space(object_size):
 +            return 0
++
 +        with _CacheCleaner.__cleanup_cache_lock:
 +            if self.__has_space(object_size):
 +                # Another thread has done the cleanup for us
 +                return 0
++
 +            stats = os.statvfs(self.__cas.casdir)
 +            target_disk_space = (stats.f_bavail * stats.f_bsize) - self.__max_head_size
++
 +            # obtain a list of LRP artifacts
 +            LRP_objects = self.__cas.list_objects()
++
 +            removed_size = 0  # in bytes
 +            last_mtime = 0
++
 +            while object_size - removed_size > target_disk_space:
 +                try:
 +                    last_mtime, to_remove = LRP_objects.pop(0)  # The first element in the list is the LRP artifact
 +                except IndexError:
 +                    # This exception is caught if there are no more artifacts in the list
 +                    # LRP_artifacts. This means the the artifact is too large for the filesystem
 +                    # so we abort the process
 +                    raise ArtifactTooLargeException("Artifact of size {} is too large for "
 +                                                    "the filesystem which mounts the remote "
 +                                                    "cache".format(object_size))
++
 +                try:
 +                    size = os.stat(to_remove).st_size
 +                    os.unlink(to_remove)
 +                    removed_size += size
 +                except FileNotFoundError:
 +                    pass
++
 +            self.__cas.clean_up_refs_until(last_mtime)
++
 +            if removed_size > 0:
 +                logging.info("Successfully removed {} bytes from the cache".format(removed_size))
 +            else:
 +                logging.info("No artifacts were removed from the cache.")
++
 +            return removed_size

tests/frontend/push.py

@@ -230,6 +230,8 @@ def test_artifact_expires(cli, datafiles, tmpdir):
      # Create an artifact share (remote artifact cache) in the tmpdir/artifactshare
      # Mock a file system with 12 MB free disk space
      with create_artifact_share(os.path.join(str(tmpdir), 'artifactshare'),
 +                               min_head_size=int(2e9),
 +                               max_head_size=int(2e9),
                                 total_space=int(10e9), free_space=(int(12e6) + int(2e9))) as share:
          # Configure bst to push to the cache
@@ -313,6 +315,8 @@ def test_recently_pulled_artifact_does_not_expire(cli, datafiles, tmpdir):
      # Create an artifact share (remote cache) in tmpdir/artifactshare
      # Mock a file system with 12 MB free disk space
      with create_artifact_share(os.path.join(str(tmpdir), 'artifactshare'),
 +                               min_head_size=int(2e9),
 +                               max_head_size=int(2e9),
                                 total_space=int(10e9), free_space=(int(12e6) + int(2e9))) as share:
          # Configure bst to push to the cache

tests/testutils/artifactshare.py

@@ -29,7 +29,11 @@ from buildstream._protos.build.bazel.remote.execution.v2 import remote_execution
+ #
  class ArtifactShare():
 -    def __init__(self, directory, *, total_space=None, free_space=None):
 +    def __init__(self, directory, *,
 +                 total_space=None,
 +                 free_space=None,
 +                 min_head_size=int(2e9),
 +                 max_head_size=int(10e9)):
          # The working directory for the artifact share (in case it
          # needs to do something outside of its backend's storage folder).
@@ -53,6 +57,9 @@ class ArtifactShare():
          self.total_space = total_space
          self.free_space = free_space
 +        self.max_head_size = max_head_size
 +        self.min_head_size = min_head_size
++
          q = Queue()
          self.process = Process(target=self.run, args=(q,))
@@ -76,7 +83,10 @@ class ArtifactShare():
                  self.free_space = self.total_space
              os.statvfs = self._mock_statvfs
 -        server = create_server(self.repodir, enable_push=True)
 +        server = create_server(self.repodir,
 +                               max_head_size=self.max_head_size,
 +                               min_head_size=self.min_head_size,
 +                               enable_push=True)
          port = server.add_insecure_port('localhost:0')
          server.start()
@@ -134,6 +144,15 @@ class ArtifactShare():
          try:
              tree = self.cas.resolve_ref(artifact_key)
 +            reachable = set()
 +            try:
 +                self.cas._reachable_refs_dir(reachable, tree, update_mtime=False)
 +            except FileNotFoundError:
 +                return False
 +            for digest in reachable:
 +                object_name = os.path.join(self.cas.casdir, 'objects', digest[:2], digest[2:])
 +                if not os.path.exists(object_name):
 +                    return False
              return True
          except CASError:
              return False
@@ -165,8 +184,11 @@ class ArtifactShare():
  # Create an ArtifactShare for use in a test case
+ #
  @contextmanager
 -def create_artifact_share(directory, *, total_space=None, free_space=None):
 -    share = ArtifactShare(directory, total_space=total_space, free_space=free_space)
 +def create_artifact_share(directory, *, total_space=None, free_space=None,
 +                          min_head_size=int(2e9),
 +                          max_head_size=int(10e9)):
 +    share = ArtifactShare(directory, total_space=total_space, free_space=free_space,
 +                          min_head_size=min_head_size, max_head_size=max_head_size)
      try:
          yield share
      finally:

[Notes] [Git][BuildStream/buildstream][valentindavid/cache_server_fill_up] 11 commits: contributing: add guidance on unit tests

Valentin David pushed to branch valentindavid/cache_server_fill_up at BuildStream / buildstream

Commits:

5 changed files:

Changes: