Daniel Silverstone pushed to branch danielsilverstone-ct/further-optimisations at BuildStream / buildstream
Commits:
-
c0a8bb66
by Angelos Evripiotis at 2018-11-08T15:49:16Z
-
f7231e90
by Angelos Evripiotis at 2018-11-08T15:49:16Z
-
f7643440
by Angelos Evripiotis at 2018-11-08T15:49:16Z
-
dd5e7b04
by Angelos Evripiotis at 2018-11-08T16:17:04Z
-
fe33e328
by James Ennis at 2018-11-08T17:54:18Z
-
09faf002
by James Ennis at 2018-11-08T17:54:18Z
-
d153453c
by Javier Jardón at 2018-11-08T18:22:54Z
-
7b6c4a8c
by Daniel Silverstone at 2018-11-09T11:07:20Z
-
3043c8b7
by Daniel Silverstone at 2018-11-09T11:07:20Z
-
b1c4e727
by Daniel Silverstone at 2018-11-09T11:07:20Z
-
6a8abb31
by Daniel Silverstone at 2018-11-09T11:07:20Z
-
10312be5
by Daniel Silverstone at 2018-11-09T11:07:20Z
7 changed files:
- CONTRIBUTING.rst
- buildstream/_artifactcache/artifactcache.py
- buildstream/_cachekey.py
- buildstream/_yaml.py
- buildstream/element.py
- doc/source/using_config.rst
- tests/utils/misc.py
Changes:
... | ... | @@ -97,7 +97,13 @@ a new merge request. You can also `create a merge request for an existing branch |
97 | 97 |
You may open merge requests for the branches you create before you are ready
|
98 | 98 |
to have them reviewed and considered for inclusion if you like. Until your merge
|
99 | 99 |
request is ready for review, the merge request title must be prefixed with the
|
100 |
-``WIP:`` identifier.
|
|
100 |
+``WIP:`` identifier. GitLab `treats this specially
|
|
101 |
+<https://docs.gitlab.com/ee/user/project/merge_requests/work_in_progress_merge_requests.html>`_,
|
|
102 |
+which helps reviewers.
|
|
103 |
+ |
|
104 |
+Consider marking a merge request as WIP again if you are taking a while to
|
|
105 |
+address a review point. This signals that the next action is on you, and it
|
|
106 |
+won't appear in a reviewer's search for non-WIP merge requests to review.
|
|
101 | 107 |
|
102 | 108 |
|
103 | 109 |
Organized commits
|
... | ... | @@ -122,6 +128,12 @@ If a commit in your branch modifies behavior such that a test must also |
122 | 128 |
be changed to match the new behavior, then the tests should be updated
|
123 | 129 |
with the same commit, so that every commit passes its own tests.
|
124 | 130 |
|
131 |
+These principles apply whenever a branch is non-WIP. So for example, don't push
|
|
132 |
+'fixup!' commits when addressing review comments, instead amend the commits
|
|
133 |
+directly before pushing. GitLab has `good support
|
|
134 |
+<https://docs.gitlab.com/ee/user/project/merge_requests/versions.html>`_ for
|
|
135 |
+diffing between pushes, so 'fixup!' commits are not necessary for reviewers.
|
|
136 |
+ |
|
125 | 137 |
|
126 | 138 |
Commit messages
|
127 | 139 |
~~~~~~~~~~~~~~~
|
... | ... | @@ -144,6 +156,16 @@ number must be referenced in the commit message. |
144 | 156 |
|
145 | 157 |
Fixes #123
|
146 | 158 |
|
159 |
+Note that the 'why' of a change is as important as the 'what'.
|
|
160 |
+ |
|
161 |
+When reviewing this, folks can suggest better alternatives when they know the
|
|
162 |
+'why'. Perhaps there are other ways to avoid an error when things are not
|
|
163 |
+frobnicated.
|
|
164 |
+ |
|
165 |
+When folks modify this code, there may be uncertainty around whether the foos
|
|
166 |
+should always be frobnicated. The comments, the commit message, and issue #123
|
|
167 |
+should shed some light on that.
|
|
168 |
+ |
|
147 | 169 |
In the case that you have a commit which necessarily modifies multiple
|
148 | 170 |
components, then the summary line should still mention generally what
|
149 | 171 |
changed (if possible), followed by a colon and a brief summary.
|
... | ... | @@ -937,15 +937,22 @@ class ArtifactCache(): |
937 | 937 |
"Invalid cache quota ({}): ".format(utils._pretty_size(cache_quota)) +
|
938 | 938 |
"BuildStream requires a minimum cache quota of 2G.")
|
939 | 939 |
elif cache_quota > cache_size + available_space: # Check maximum
|
940 |
+ if '%' in self.context.config_cache_quota:
|
|
941 |
+ available = (available_space / (stat.f_blocks * stat.f_bsize)) * 100
|
|
942 |
+ available = '{}% of total disk space'.format(round(available, 1))
|
|
943 |
+ else:
|
|
944 |
+ available = utils._pretty_size(available_space)
|
|
945 |
+ |
|
940 | 946 |
raise LoadError(LoadErrorReason.INVALID_DATA,
|
941 | 947 |
("Your system does not have enough available " +
|
942 | 948 |
"space to support the cache quota specified.\n" +
|
943 |
- "You currently have:\n" +
|
|
944 |
- "- {used} of cache in use at {local_cache_path}\n" +
|
|
945 |
- "- {available} of available system storage").format(
|
|
946 |
- used=utils._pretty_size(cache_size),
|
|
947 |
- local_cache_path=self.context.artifactdir,
|
|
948 |
- available=utils._pretty_size(available_space)))
|
|
949 |
+ "\nYou have specified a quota of {quota} total disk space.\n" +
|
|
950 |
+ "- The filesystem containing {local_cache_path} only " +
|
|
951 |
+ "has: {available_size} available.")
|
|
952 |
+ .format(
|
|
953 |
+ quota=self.context.config_cache_quota,
|
|
954 |
+ local_cache_path=self.context.artifactdir,
|
|
955 |
+ available_size=available))
|
|
949 | 956 |
|
950 | 957 |
# Place a slight headroom (2e9 (2GB) on the cache_quota) into
|
951 | 958 |
# cache_quota to try and avoid exceptions.
|
... | ... | @@ -40,3 +40,20 @@ def generate_key(value): |
40 | 40 |
ordered = _yaml.node_sanitize(value)
|
41 | 41 |
string = pickle.dumps(ordered)
|
42 | 42 |
return hashlib.sha256(string).hexdigest()
|
43 |
+ |
|
44 |
+ |
|
45 |
+# generate_key_pre_sanitized()
|
|
46 |
+#
|
|
47 |
+# Generate an sha256 hex digest from the given value. The value
|
|
48 |
+# must be (a) compatible with generate_key() and (b) already have
|
|
49 |
+# been passed through _yaml.node_sanitize()
|
|
50 |
+#
|
|
51 |
+# Args:
|
|
52 |
+# value: A sanitized value to get a key for
|
|
53 |
+#
|
|
54 |
+# Returns:
|
|
55 |
+# (str): An sha256 hex digest of the given value
|
|
56 |
+#
|
|
57 |
+def generate_key_pre_sanitized(value):
|
|
58 |
+ string = pickle.dumps(value)
|
|
59 |
+ return hashlib.sha256(string).hexdigest()
|
... | ... | @@ -363,8 +363,8 @@ _sentinel = object() |
363 | 363 |
#
|
364 | 364 |
def node_get(node, expected_type, key, indices=None, default_value=_sentinel):
|
365 | 365 |
value = node.get(key, default_value)
|
366 |
- provenance = node_get_provenance(node)
|
|
367 | 366 |
if value is _sentinel:
|
367 |
+ provenance = node_get_provenance(node)
|
|
368 | 368 |
raise LoadError(LoadErrorReason.INVALID_DATA,
|
369 | 369 |
"{}: Dictionary did not contain expected key '{}'".format(provenance, key))
|
370 | 370 |
|
... | ... | @@ -914,9 +914,20 @@ RoundTripRepresenter.add_representer(SanitizedDict, |
914 | 914 |
# Only dicts are ordered, list elements are left in order.
|
915 | 915 |
#
|
916 | 916 |
def node_sanitize(node):
|
917 |
+ # Short-circuit None which occurs ca. twice per element
|
|
918 |
+ if node is None:
|
|
919 |
+ return node
|
|
920 |
+ |
|
921 |
+ node_type = type(node)
|
|
922 |
+ # Next short-circuit integers, floats, strings, booleans, and tuples
|
|
923 |
+ if node_type in (int, float, str, bool, tuple):
|
|
924 |
+ return node
|
|
925 |
+ # Now short-circuit lists
|
|
926 |
+ elif node_type is list:
|
|
927 |
+ return [node_sanitize(elt) for elt in node]
|
|
917 | 928 |
|
918 |
- if isinstance(node, collections.Mapping):
|
|
919 |
- |
|
929 |
+ # Finally ChainMap and dict, and other Mappings need special handling
|
|
930 |
+ if node_type in (dict, ChainMap) or isinstance(node, collections.Mapping):
|
|
920 | 931 |
result = SanitizedDict()
|
921 | 932 |
|
922 | 933 |
key_list = [key for key, _ in node_items(node)]
|
... | ... | @@ -924,10 +935,10 @@ def node_sanitize(node): |
924 | 935 |
result[key] = node_sanitize(node[key])
|
925 | 936 |
|
926 | 937 |
return result
|
927 |
- |
|
928 | 938 |
elif isinstance(node, list):
|
929 | 939 |
return [node_sanitize(elt) for elt in node]
|
930 | 940 |
|
941 |
+ # Everything else (such as commented scalars) just gets returned as-is.
|
|
931 | 942 |
return node
|
932 | 943 |
|
933 | 944 |
|
... | ... | @@ -1055,16 +1066,52 @@ class ChainMap(collections.ChainMap): |
1055 | 1066 |
except KeyError:
|
1056 | 1067 |
return default
|
1057 | 1068 |
|
1069 |
+# Node copying
|
|
1070 |
+#
|
|
1071 |
+# Unfortunately we copy nodes a *lot* and `isinstance()` is super-slow when
|
|
1072 |
+# things from collections.abc get involved. The result is the following
|
|
1073 |
+# intricate but substantially faster group of tuples and the use of `in`.
|
|
1074 |
+#
|
|
1075 |
+# If any of the {node,list}_{chain_,}_copy routines raise a ValueError
|
|
1076 |
+# then it's likely additional types need adding to these tuples.
|
|
1077 |
+ |
|
1078 |
+# String types are directly copied in a lot of places
|
|
1079 |
+__string_types = (str,
|
|
1080 |
+ yaml.scalarstring.PreservedScalarString,
|
|
1081 |
+ yaml.scalarstring.SingleQuotedScalarString,
|
|
1082 |
+ yaml.scalarstring.DoubleQuotedScalarString)
|
|
1083 |
+ |
|
1084 |
+# When chaining a copy, these types are skipped since the ChainMap will
|
|
1085 |
+# retrieve them from the source node when needed.
|
|
1086 |
+__chain_skipped_types = (str, bool,
|
|
1087 |
+ yaml.scalarstring.PreservedScalarString,
|
|
1088 |
+ yaml.scalarstring.SingleQuotedScalarString,
|
|
1089 |
+ yaml.scalarstring.DoubleQuotedScalarString)
|
|
1090 |
+ |
|
1091 |
+# These types have to be iterated like a dictionary
|
|
1092 |
+__dict_types = (dict, ChainMap, yaml.comments.CommentedMap)
|
|
1093 |
+ |
|
1094 |
+# These types have to be iterated like a list
|
|
1095 |
+__list_types = (list, yaml.comments.CommentedSeq)
|
|
1096 |
+ |
|
1097 |
+# These are the provenance types, which have to be cloned rather than any other
|
|
1098 |
+# copying tactic.
|
|
1099 |
+__provenance_types = (Provenance, DictProvenance, MemberProvenance, ElementProvenance)
|
|
1058 | 1100 |
|
1059 | 1101 |
def node_chain_copy(source):
|
1060 | 1102 |
copy = ChainMap({}, source)
|
1061 | 1103 |
for key, value in source.items():
|
1062 |
- if isinstance(value, collections.Mapping):
|
|
1104 |
+ value_type = type(value)
|
|
1105 |
+ if value_type in __dict_types:
|
|
1063 | 1106 |
copy[key] = node_chain_copy(value)
|
1064 |
- elif isinstance(value, list):
|
|
1107 |
+ elif value_type in __list_types:
|
|
1065 | 1108 |
copy[key] = list_chain_copy(value)
|
1066 |
- elif isinstance(value, Provenance):
|
|
1109 |
+ elif value_type in __provenance_types:
|
|
1067 | 1110 |
copy[key] = value.clone()
|
1111 |
+ elif value_type in __chain_skipped_types:
|
|
1112 |
+ pass # No need to copy these, the chainmap deals with it
|
|
1113 |
+ else:
|
|
1114 |
+ raise ValueError("Unable to be quick about node_chain_copy of {}".format(value_type))
|
|
1068 | 1115 |
|
1069 | 1116 |
return copy
|
1070 | 1117 |
|
... | ... | @@ -1072,14 +1119,17 @@ def node_chain_copy(source): |
1072 | 1119 |
def list_chain_copy(source):
|
1073 | 1120 |
copy = []
|
1074 | 1121 |
for item in source:
|
1075 |
- if isinstance(item, collections.Mapping):
|
|
1122 |
+ item_type = type(item)
|
|
1123 |
+ if item_type in __dict_types:
|
|
1076 | 1124 |
copy.append(node_chain_copy(item))
|
1077 |
- elif isinstance(item, list):
|
|
1125 |
+ elif item_type in __list_types:
|
|
1078 | 1126 |
copy.append(list_chain_copy(item))
|
1079 |
- elif isinstance(item, Provenance):
|
|
1127 |
+ elif item_type in __provenance_types:
|
|
1080 | 1128 |
copy.append(item.clone())
|
1081 |
- else:
|
|
1129 |
+ elif item_type in __string_types:
|
|
1082 | 1130 |
copy.append(item)
|
1131 |
+ else: # Fallback
|
|
1132 |
+ raise ValueError("Unable to be quick about list_chain_copy of {}".format(item_type))
|
|
1083 | 1133 |
|
1084 | 1134 |
return copy
|
1085 | 1135 |
|
... | ... | @@ -1087,14 +1137,17 @@ def list_chain_copy(source): |
1087 | 1137 |
def node_copy(source):
|
1088 | 1138 |
copy = {}
|
1089 | 1139 |
for key, value in source.items():
|
1090 |
- if isinstance(value, collections.Mapping):
|
|
1140 |
+ value_type = type(value)
|
|
1141 |
+ if value_type in __dict_types:
|
|
1091 | 1142 |
copy[key] = node_copy(value)
|
1092 |
- elif isinstance(value, list):
|
|
1143 |
+ elif value_type in __list_types:
|
|
1093 | 1144 |
copy[key] = list_copy(value)
|
1094 |
- elif isinstance(value, Provenance):
|
|
1145 |
+ elif value_type in __provenance_types:
|
|
1095 | 1146 |
copy[key] = value.clone()
|
1096 |
- else:
|
|
1147 |
+ elif value_type in __string_types:
|
|
1097 | 1148 |
copy[key] = value
|
1149 |
+ else:
|
|
1150 |
+ raise ValueError("Unable to be quick about node_copy of {}".format(value_type))
|
|
1098 | 1151 |
|
1099 | 1152 |
ensure_provenance(copy)
|
1100 | 1153 |
|
... | ... | @@ -1104,14 +1157,17 @@ def node_copy(source): |
1104 | 1157 |
def list_copy(source):
|
1105 | 1158 |
copy = []
|
1106 | 1159 |
for item in source:
|
1107 |
- if isinstance(item, collections.Mapping):
|
|
1160 |
+ item_type = type(item)
|
|
1161 |
+ if item_type in __dict_types:
|
|
1108 | 1162 |
copy.append(node_copy(item))
|
1109 |
- elif isinstance(item, list):
|
|
1163 |
+ elif item_type in __list_types:
|
|
1110 | 1164 |
copy.append(list_copy(item))
|
1111 |
- elif isinstance(item, Provenance):
|
|
1165 |
+ elif item_type in __provenance_types:
|
|
1112 | 1166 |
copy.append(item.clone())
|
1113 |
- else:
|
|
1167 |
+ elif item_type in __string_types:
|
|
1114 | 1168 |
copy.append(item)
|
1169 |
+ else:
|
|
1170 |
+ raise ValueError("Unable to be quick about list_copy of {}".format(item_type))
|
|
1115 | 1171 |
|
1116 | 1172 |
return copy
|
1117 | 1173 |
|
... | ... | @@ -2053,11 +2053,14 @@ class Element(Plugin): |
2053 | 2053 |
}
|
2054 | 2054 |
|
2055 | 2055 |
self.__cache_key_dict['fatal-warnings'] = sorted(project._fatal_warnings)
|
2056 |
+ self.__cache_key_dict['dependencies'] = []
|
|
2057 |
+ self.__cache_key_dict = _yaml.node_sanitize(self.__cache_key_dict)
|
|
2056 | 2058 |
|
2057 |
- cache_key_dict = self.__cache_key_dict.copy()
|
|
2058 |
- cache_key_dict['dependencies'] = dependencies
|
|
2059 |
+ # This replacement is safe since OrderedDict replaces the value,
|
|
2060 |
+ # leaving its location in the dictionary alone.
|
|
2061 |
+ self.__cache_key_dict['dependencies'] = _yaml.node_sanitize(dependencies)
|
|
2059 | 2062 |
|
2060 |
- return _cachekey.generate_key(cache_key_dict)
|
|
2063 |
+ return _cachekey.generate_key_pre_sanitized(self.__cache_key_dict)
|
|
2061 | 2064 |
|
2062 | 2065 |
# __can_build_incrementally()
|
2063 | 2066 |
#
|
... | ... | @@ -147,6 +147,44 @@ The default mirror is defined by its name, e.g. |
147 | 147 |
``--default-mirror`` command-line option.
|
148 | 148 |
|
149 | 149 |
|
150 |
+Local cache expiry
|
|
151 |
+~~~~~~~~~~~~~~~~~~
|
|
152 |
+BuildStream locally caches artifacts, build trees, log files and sources within a
|
|
153 |
+cache located at ``~/.cache/buildstream`` (unless a $XDG_CACHE_HOME environment
|
|
154 |
+variable exists). When building large projects, this cache can get very large,
|
|
155 |
+thus BuildStream will attempt to clean up the cache automatically by expiring the least
|
|
156 |
+recently *used* artifacts.
|
|
157 |
+ |
|
158 |
+By default, cache expiry will begin once the file system which contains the cache
|
|
159 |
+approaches maximum usage. However, it is also possible to impose a quota on the local
|
|
160 |
+cache in the user configuration. This can be done in two ways:
|
|
161 |
+ |
|
162 |
+1. By restricting the maximum size of the cache directory itself.
|
|
163 |
+ |
|
164 |
+For example, to ensure that BuildStream's cache does not grow beyond 100 GB,
|
|
165 |
+simply declare the following in your user configuration (``~/.config/buildstream.conf``):
|
|
166 |
+ |
|
167 |
+.. code:: yaml
|
|
168 |
+ |
|
169 |
+ cache:
|
|
170 |
+ quota: 100G
|
|
171 |
+ |
|
172 |
+This quota defines the maximum size of the artifact cache in bytes.
|
|
173 |
+Other accepted values are: K, M, G or T (or you can simply declare the value in bytes, without the suffix).
|
|
174 |
+This uses the same format as systemd's
|
|
175 |
+`resource-control <https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html>`_.
|
|
176 |
+ |
|
177 |
+2. By expiring artifacts once the file system which contains the cache exceeds a specified usage.
|
|
178 |
+ |
|
179 |
+To ensure that we start cleaning the cache once we've used 80% of local disk space (on the file system
|
|
180 |
+which mounts the cache):
|
|
181 |
+ |
|
182 |
+.. code:: yaml
|
|
183 |
+ |
|
184 |
+ cache:
|
|
185 |
+ quota: 80%
|
|
186 |
+ |
|
187 |
+ |
|
150 | 188 |
Default configuration
|
151 | 189 |
---------------------
|
152 | 190 |
The default BuildStream configuration is specified here for reference:
|
... | ... | @@ -27,4 +27,5 @@ def test_parse_size_over_1024T(cli, tmpdir): |
27 | 27 |
patched_statvfs = mock_os.mock_statvfs(f_bavail=bavail, f_bsize=BLOCK_SIZE)
|
28 | 28 |
with mock_os.monkey_patch("statvfs", patched_statvfs):
|
29 | 29 |
result = cli.run(project, args=["build", "file.bst"])
|
30 |
- assert "1025T of available system storage" in result.stderr
|
|
30 |
+ failure_msg = 'Your system does not have enough available space to support the cache quota specified.'
|
|
31 |
+ assert failure_msg in result.stderr
|