[Notes] [Git][BuildStream/buildstream][danielsilverstone-ct/further-optimisations] 12 commits: contributing: WIP again to be kind to reviewers



Title: GitLab

Daniel Silverstone pushed to branch danielsilverstone-ct/further-optimisations at BuildStream / buildstream

Commits:

7 changed files:

Changes:

  • CONTRIBUTING.rst
    ... ... @@ -97,7 +97,13 @@ a new merge request. You can also `create a merge request for an existing branch
    97 97
     You may open merge requests for the branches you create before you are ready
    
    98 98
     to have them reviewed and considered for inclusion if you like. Until your merge
    
    99 99
     request is ready for review, the merge request title must be prefixed with the
    
    100
    -``WIP:`` identifier.
    
    100
    +``WIP:`` identifier. GitLab `treats this specially
    
    101
    +<https://docs.gitlab.com/ee/user/project/merge_requests/work_in_progress_merge_requests.html>`_,
    
    102
    +which helps reviewers.
    
    103
    +
    
    104
    +Consider marking a merge request as WIP again if you are taking a while to
    
    105
    +address a review point. This signals that the next action is on you, and it
    
    106
    +won't appear in a reviewer's search for non-WIP merge requests to review.
    
    101 107
     
    
    102 108
     
    
    103 109
     Organized commits
    
    ... ... @@ -122,6 +128,12 @@ If a commit in your branch modifies behavior such that a test must also
    122 128
     be changed to match the new behavior, then the tests should be updated
    
    123 129
     with the same commit, so that every commit passes its own tests.
    
    124 130
     
    
    131
    +These principles apply whenever a branch is non-WIP. So for example, don't push
    
    132
    +'fixup!' commits when addressing review comments, instead amend the commits
    
    133
    +directly before pushing. GitLab has `good support
    
    134
    +<https://docs.gitlab.com/ee/user/project/merge_requests/versions.html>`_ for
    
    135
    +diffing between pushes, so 'fixup!' commits are not necessary for reviewers.
    
    136
    +
    
    125 137
     
    
    126 138
     Commit messages
    
    127 139
     ~~~~~~~~~~~~~~~
    
    ... ... @@ -144,6 +156,16 @@ number must be referenced in the commit message.
    144 156
     
    
    145 157
       Fixes #123
    
    146 158
     
    
    159
    +Note that the 'why' of a change is as important as the 'what'.
    
    160
    +
    
    161
    +When reviewing this, folks can suggest better alternatives when they know the
    
    162
    +'why'. Perhaps there are other ways to avoid an error when things are not
    
    163
    +frobnicated.
    
    164
    +
    
    165
    +When folks modify this code, there may be uncertainty around whether the foos
    
    166
    +should always be frobnicated. The comments, the commit message, and issue #123
    
    167
    +should shed some light on that.
    
    168
    +
    
    147 169
     In the case that you have a commit which necessarily modifies multiple
    
    148 170
     components, then the summary line should still mention generally what
    
    149 171
     changed (if possible), followed by a colon and a brief summary.
    

  • buildstream/_artifactcache/artifactcache.py
    ... ... @@ -937,15 +937,22 @@ class ArtifactCache():
    937 937
                                 "Invalid cache quota ({}): ".format(utils._pretty_size(cache_quota)) +
    
    938 938
                                 "BuildStream requires a minimum cache quota of 2G.")
    
    939 939
             elif cache_quota > cache_size + available_space:  # Check maximum
    
    940
    +            if '%' in self.context.config_cache_quota:
    
    941
    +                available = (available_space / (stat.f_blocks * stat.f_bsize)) * 100
    
    942
    +                available = '{}% of total disk space'.format(round(available, 1))
    
    943
    +            else:
    
    944
    +                available = utils._pretty_size(available_space)
    
    945
    +
    
    940 946
                 raise LoadError(LoadErrorReason.INVALID_DATA,
    
    941 947
                                 ("Your system does not have enough available " +
    
    942 948
                                  "space to support the cache quota specified.\n" +
    
    943
    -                             "You currently have:\n" +
    
    944
    -                             "- {used} of cache in use at {local_cache_path}\n" +
    
    945
    -                             "- {available} of available system storage").format(
    
    946
    -                                 used=utils._pretty_size(cache_size),
    
    947
    -                                 local_cache_path=self.context.artifactdir,
    
    948
    -                                 available=utils._pretty_size(available_space)))
    
    949
    +                             "\nYou have specified a quota of {quota} total disk space.\n" +
    
    950
    +                             "- The filesystem containing {local_cache_path} only " +
    
    951
    +                             "has: {available_size} available.")
    
    952
    +                            .format(
    
    953
    +                                quota=self.context.config_cache_quota,
    
    954
    +                                local_cache_path=self.context.artifactdir,
    
    955
    +                                available_size=available))
    
    949 956
     
    
    950 957
             # Place a slight headroom (2e9 (2GB) on the cache_quota) into
    
    951 958
             # cache_quota to try and avoid exceptions.
    

  • buildstream/_cachekey.py
    ... ... @@ -40,3 +40,20 @@ def generate_key(value):
    40 40
         ordered = _yaml.node_sanitize(value)
    
    41 41
         string = pickle.dumps(ordered)
    
    42 42
         return hashlib.sha256(string).hexdigest()
    
    43
    +
    
    44
    +
    
    45
    +# generate_key_pre_sanitized()
    
    46
    +#
    
    47
    +# Generate an sha256 hex digest from the given value. The value
    
    48
    +# must be (a) compatible with generate_key() and (b) already have
    
    49
    +# been passed through _yaml.node_sanitize()
    
    50
    +#
    
    51
    +# Args:
    
    52
    +#    value: A sanitized value to get a key for
    
    53
    +#
    
    54
    +# Returns:
    
    55
    +#    (str): An sha256 hex digest of the given value
    
    56
    +#
    
    57
    +def generate_key_pre_sanitized(value):
    
    58
    +    string = pickle.dumps(value)
    
    59
    +    return hashlib.sha256(string).hexdigest()

  • buildstream/_yaml.py
    ... ... @@ -363,8 +363,8 @@ _sentinel = object()
    363 363
     #
    
    364 364
     def node_get(node, expected_type, key, indices=None, default_value=_sentinel):
    
    365 365
         value = node.get(key, default_value)
    
    366
    -    provenance = node_get_provenance(node)
    
    367 366
         if value is _sentinel:
    
    367
    +        provenance = node_get_provenance(node)
    
    368 368
             raise LoadError(LoadErrorReason.INVALID_DATA,
    
    369 369
                             "{}: Dictionary did not contain expected key '{}'".format(provenance, key))
    
    370 370
     
    
    ... ... @@ -914,9 +914,20 @@ RoundTripRepresenter.add_representer(SanitizedDict,
    914 914
     # Only dicts are ordered, list elements are left in order.
    
    915 915
     #
    
    916 916
     def node_sanitize(node):
    
    917
    +    # Short-circuit None which occurs ca. twice per element
    
    918
    +    if node is None:
    
    919
    +        return node
    
    920
    +
    
    921
    +    node_type = type(node)
    
    922
    +    # Next short-circuit integers, floats, strings, booleans, and tuples
    
    923
    +    if node_type in (int, float, str, bool, tuple):
    
    924
    +        return node
    
    925
    +    # Now short-circuit lists
    
    926
    +    elif node_type is list:
    
    927
    +        return [node_sanitize(elt) for elt in node]
    
    917 928
     
    
    918
    -    if isinstance(node, collections.Mapping):
    
    919
    -
    
    929
    +    # Finally ChainMap and dict, and other Mappings need special handling
    
    930
    +    if node_type in (dict, ChainMap) or isinstance(node, collections.Mapping):
    
    920 931
             result = SanitizedDict()
    
    921 932
     
    
    922 933
             key_list = [key for key, _ in node_items(node)]
    
    ... ... @@ -924,10 +935,10 @@ def node_sanitize(node):
    924 935
                 result[key] = node_sanitize(node[key])
    
    925 936
     
    
    926 937
             return result
    
    927
    -
    
    928 938
         elif isinstance(node, list):
    
    929 939
             return [node_sanitize(elt) for elt in node]
    
    930 940
     
    
    941
    +    # Everything else (such as commented scalars) just gets returned as-is.
    
    931 942
         return node
    
    932 943
     
    
    933 944
     
    
    ... ... @@ -1055,16 +1066,52 @@ class ChainMap(collections.ChainMap):
    1055 1066
             except KeyError:
    
    1056 1067
                 return default
    
    1057 1068
     
    
    1069
    +# Node copying
    
    1070
    +#
    
    1071
    +# Unfortunately we copy nodes a *lot* and `isinstance()` is super-slow when
    
    1072
    +# things from collections.abc get involved.  The result is the following
    
    1073
    +# intricate but substantially faster group of tuples and the use of `in`.
    
    1074
    +#
    
    1075
    +# If any of the {node,list}_{chain_,}_copy routines raise a ValueError
    
    1076
    +# then it's likely additional types need adding to these tuples.
    
    1077
    +
    
    1078
    +# String types are directly copied in a lot of places
    
    1079
    +__string_types = (str,
    
    1080
    +                  yaml.scalarstring.PreservedScalarString,
    
    1081
    +                  yaml.scalarstring.SingleQuotedScalarString,
    
    1082
    +                  yaml.scalarstring.DoubleQuotedScalarString)
    
    1083
    +
    
    1084
    +# When chaining a copy, these types are skipped since the ChainMap will
    
    1085
    +# retrieve them from the source node when needed.
    
    1086
    +__chain_skipped_types = (str, bool,
    
    1087
    +                         yaml.scalarstring.PreservedScalarString,
    
    1088
    +                         yaml.scalarstring.SingleQuotedScalarString,
    
    1089
    +                         yaml.scalarstring.DoubleQuotedScalarString)
    
    1090
    +
    
    1091
    +# These types have to be iterated like a dictionary
    
    1092
    +__dict_types = (dict, ChainMap, yaml.comments.CommentedMap)
    
    1093
    +
    
    1094
    +# These types have to be iterated like a list
    
    1095
    +__list_types = (list, yaml.comments.CommentedSeq)
    
    1096
    +
    
    1097
    +# These are the provenance types, which have to be cloned rather than any other
    
    1098
    +# copying tactic.
    
    1099
    +__provenance_types = (Provenance, DictProvenance, MemberProvenance, ElementProvenance)
    
    1058 1100
     
    
    1059 1101
     def node_chain_copy(source):
    
    1060 1102
         copy = ChainMap({}, source)
    
    1061 1103
         for key, value in source.items():
    
    1062
    -        if isinstance(value, collections.Mapping):
    
    1104
    +        value_type = type(value)
    
    1105
    +        if value_type in __dict_types:
    
    1063 1106
                 copy[key] = node_chain_copy(value)
    
    1064
    -        elif isinstance(value, list):
    
    1107
    +        elif value_type in __list_types:
    
    1065 1108
                 copy[key] = list_chain_copy(value)
    
    1066
    -        elif isinstance(value, Provenance):
    
    1109
    +        elif value_type in __provenance_types:
    
    1067 1110
                 copy[key] = value.clone()
    
    1111
    +        elif value_type in __chain_skipped_types:
    
    1112
    +            pass # No need to copy these, the chainmap deals with it
    
    1113
    +        else:
    
    1114
    +            raise ValueError("Unable to be quick about node_chain_copy of {}".format(value_type))
    
    1068 1115
     
    
    1069 1116
         return copy
    
    1070 1117
     
    
    ... ... @@ -1072,14 +1119,17 @@ def node_chain_copy(source):
    1072 1119
     def list_chain_copy(source):
    
    1073 1120
         copy = []
    
    1074 1121
         for item in source:
    
    1075
    -        if isinstance(item, collections.Mapping):
    
    1122
    +        item_type = type(item)
    
    1123
    +        if item_type in __dict_types:
    
    1076 1124
                 copy.append(node_chain_copy(item))
    
    1077
    -        elif isinstance(item, list):
    
    1125
    +        elif item_type in __list_types:
    
    1078 1126
                 copy.append(list_chain_copy(item))
    
    1079
    -        elif isinstance(item, Provenance):
    
    1127
    +        elif item_type in __provenance_types:
    
    1080 1128
                 copy.append(item.clone())
    
    1081
    -        else:
    
    1129
    +        elif item_type in __string_types:
    
    1082 1130
                 copy.append(item)
    
    1131
    +        else: # Fallback
    
    1132
    +            raise ValueError("Unable to be quick about list_chain_copy of {}".format(item_type))
    
    1083 1133
     
    
    1084 1134
         return copy
    
    1085 1135
     
    
    ... ... @@ -1087,14 +1137,17 @@ def list_chain_copy(source):
    1087 1137
     def node_copy(source):
    
    1088 1138
         copy = {}
    
    1089 1139
         for key, value in source.items():
    
    1090
    -        if isinstance(value, collections.Mapping):
    
    1140
    +        value_type = type(value)
    
    1141
    +        if value_type in __dict_types:
    
    1091 1142
                 copy[key] = node_copy(value)
    
    1092
    -        elif isinstance(value, list):
    
    1143
    +        elif value_type in __list_types:
    
    1093 1144
                 copy[key] = list_copy(value)
    
    1094
    -        elif isinstance(value, Provenance):
    
    1145
    +        elif value_type in __provenance_types:
    
    1095 1146
                 copy[key] = value.clone()
    
    1096
    -        else:
    
    1147
    +        elif value_type in __string_types:
    
    1097 1148
                 copy[key] = value
    
    1149
    +        else:
    
    1150
    +            raise ValueError("Unable to be quick about node_copy of {}".format(value_type))
    
    1098 1151
     
    
    1099 1152
         ensure_provenance(copy)
    
    1100 1153
     
    
    ... ... @@ -1104,14 +1157,17 @@ def node_copy(source):
    1104 1157
     def list_copy(source):
    
    1105 1158
         copy = []
    
    1106 1159
         for item in source:
    
    1107
    -        if isinstance(item, collections.Mapping):
    
    1160
    +        item_type = type(item)
    
    1161
    +        if item_type in __dict_types:
    
    1108 1162
                 copy.append(node_copy(item))
    
    1109
    -        elif isinstance(item, list):
    
    1163
    +        elif item_type in __list_types:
    
    1110 1164
                 copy.append(list_copy(item))
    
    1111
    -        elif isinstance(item, Provenance):
    
    1165
    +        elif item_type in __provenance_types:
    
    1112 1166
                 copy.append(item.clone())
    
    1113
    -        else:
    
    1167
    +        elif item_type in __string_types:
    
    1114 1168
                 copy.append(item)
    
    1169
    +        else:
    
    1170
    +            raise ValueError("Unable to be quick about list_copy of {}".format(item_type))
    
    1115 1171
     
    
    1116 1172
         return copy
    
    1117 1173
     
    

  • buildstream/element.py
    ... ... @@ -2053,11 +2053,14 @@ class Element(Plugin):
    2053 2053
                 }
    
    2054 2054
     
    
    2055 2055
                 self.__cache_key_dict['fatal-warnings'] = sorted(project._fatal_warnings)
    
    2056
    +            self.__cache_key_dict['dependencies'] = []
    
    2057
    +            self.__cache_key_dict = _yaml.node_sanitize(self.__cache_key_dict)
    
    2056 2058
     
    
    2057
    -        cache_key_dict = self.__cache_key_dict.copy()
    
    2058
    -        cache_key_dict['dependencies'] = dependencies
    
    2059
    +        # This replacement is safe since OrderedDict replaces the value,
    
    2060
    +        # leaving its location in the dictionary alone.
    
    2061
    +        self.__cache_key_dict['dependencies'] = _yaml.node_sanitize(dependencies)
    
    2059 2062
     
    
    2060
    -        return _cachekey.generate_key(cache_key_dict)
    
    2063
    +        return _cachekey.generate_key_pre_sanitized(self.__cache_key_dict)
    
    2061 2064
     
    
    2062 2065
         # __can_build_incrementally()
    
    2063 2066
         #
    

  • doc/source/using_config.rst
    ... ... @@ -147,6 +147,44 @@ The default mirror is defined by its name, e.g.
    147 147
        ``--default-mirror`` command-line option.
    
    148 148
     
    
    149 149
     
    
    150
    +Local cache expiry
    
    151
    +~~~~~~~~~~~~~~~~~~
    
    152
    +BuildStream locally caches artifacts, build trees, log files and sources within a
    
    153
    +cache located at ``~/.cache/buildstream`` (unless a $XDG_CACHE_HOME environment
    
    154
    +variable exists). When building large projects, this cache can get very large,
    
    155
    +thus BuildStream will attempt to clean up the cache automatically by expiring the least
    
    156
    +recently *used* artifacts.
    
    157
    +
    
    158
    +By default, cache expiry will begin once the file system which contains the cache
    
    159
    +approaches maximum usage. However, it is also possible to impose a quota on the local
    
    160
    +cache in the user configuration. This can be done in two ways:
    
    161
    +
    
    162
    +1. By restricting the maximum size of the cache directory itself.
    
    163
    +
    
    164
    +For example, to ensure that BuildStream's cache does not grow beyond 100 GB,
    
    165
    +simply declare the following in your user configuration (``~/.config/buildstream.conf``):
    
    166
    +
    
    167
    +.. code:: yaml
    
    168
    +
    
    169
    +  cache:
    
    170
    +    quota: 100G
    
    171
    +
    
    172
    +This quota defines the maximum size of the artifact cache in bytes.
    
    173
    +Other accepted values are: K, M, G or T (or you can simply declare the value in bytes, without the suffix).
    
    174
    +This uses the same format as systemd's
    
    175
    +`resource-control <https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html>`_.
    
    176
    +
    
    177
    +2. By expiring artifacts once the file system which contains the cache exceeds a specified usage.
    
    178
    +
    
    179
    +To ensure that we start cleaning the cache once we've used 80% of local disk space (on the file system
    
    180
    +which mounts the cache):
    
    181
    +
    
    182
    +.. code:: yaml
    
    183
    +
    
    184
    +  cache:
    
    185
    +    quota: 80%
    
    186
    +
    
    187
    +
    
    150 188
     Default configuration
    
    151 189
     ---------------------
    
    152 190
     The default BuildStream configuration is specified here for reference:
    

  • tests/utils/misc.py
    ... ... @@ -27,4 +27,5 @@ def test_parse_size_over_1024T(cli, tmpdir):
    27 27
         patched_statvfs = mock_os.mock_statvfs(f_bavail=bavail, f_bsize=BLOCK_SIZE)
    
    28 28
         with mock_os.monkey_patch("statvfs", patched_statvfs):
    
    29 29
             result = cli.run(project, args=["build", "file.bst"])
    
    30
    -        assert "1025T of available system storage" in result.stderr
    30
    +        failure_msg = 'Your system does not have enough available space to support the cache quota specified.'
    
    31
    +        assert failure_msg in result.stderr



  • [Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]