Re: Initial indicators for a build tool and how to get more detailed ones
- From: Agustín Benito Bethencourt <agustin benito codethink co uk>
- To: buildstream-list gnome org
- Subject: Re: Initial indicators for a build tool and how to get more detailed ones
- Date: Tue, 17 Apr 2018 13:44:38 +0200
Hi,
On Monday, 16 April 2018 17:35:47 CEST Jim MacArthur wrote:
Hi Agustín.
To try and summarise your email, it looks to me like what you are asking
for is:
* A record of build successes and failures of the trunk over time
* Aggregate statistics of the time needed to build, as suggested -
median and standard deviation (over all builds, or the last 10, or last
week, for example?)
* Separate timings for builds from scratch and build which use existing
artifact caches
It would be enough to expose the above information in an easy to process/
consume way for external tools at first. The, if Buildstream wants to provide a
simple way to process the data to provide values of the indicators or even
graph them, cool. I do not see is as a core feature though. I would focus first
on exposing the right information.
BuildStream doesn't (so far as I know) recommend any particular style of
project management or continuous integration, so these would seem to be
recommendations outside the scope of BuildStream.
This sentence worries me.
There are plenty of tools out there in the same space Buildstream is. Who is
Buildstream competing against? What is the main target?
Buildstream is a tool. Tools live in a production context. I recommend to
select one context in order to meet somebody's expectations early on. Choosing
the wrong one is better than not choosing any at all.
Where we can do things
to support those activities, though, we should do. Building from
scratch, for example, can be done unofficially at the moment but we
don't have an explicit method for doing it.
My intention was not to provide requirements but to provide input in what I
believe is an earlier stage in relation to metrics. I assume there are other
priorities at this point.
With respect to records of trunk builds, ideally, if your source has a
single version (i.e. one repository) then there should be no build
failures on trunk. Branches which fail testing shouldn't be merged at
all. Things get more complicated if you have several repositories, or if
you have known failures in your test set. In the latter case, you'll
need more detail than simple pass/fail since you'll probably want to
record the progress of going from 100 failures to 5.
I would differentiate two phases when analyzing delivery performance (aka
throughput and stability initially):
* Studying the indicators I provided will allow managers and development teams
to ask the right questions. What happened here? It is about identifying
potential wrong patters or events.
* A second phase is the inspection, where delivery teams (integrators in this
case) ask themselves why certain pattern is taking place.
Why has the failure rate increased in March compared to October, assuming in
both months the development stage were the same (2 months prior to the
release) and the product has not dramatically changed?
This questions will lead later on to the definition of further metrics.
Arguing that the initial set the indicators (what) is not enough because you
need the second one (why) might be true at small scale where the delivery team
is on top of everything, either because they fully understand how the system
works or because the product is inmature so very few metrics matter, because
of the introduction of a new technology with high impact in a specific area of
the product, etc..
It will not be true though when a team of 20 is managing a product portfolio
with 20 changes per day on each product and there are several development
teams from a variety of business units or providers feeding the pipeline. At
least not in my experience.
The question is who are you developing the tool for.
Recording the time
between trunk builds is also possible, but that alone wouldn't seem to
get you any more information than is in the git commit log, assuming you
run tests when branches are pushed, as the current BuildStream CI does.
The commit stage has its own metrics. Again, looking at delivery as a whole,
the performance (stability and throughput) indicators should be analogous to
those I described for the build tool, but applied to trunk/master.
The idea is to have the same indicators on each stage of the delivery to be
able to measure them end to end. The real value out of any indicator comes
when measure end to end. This is a similar approach you follow when "creating
the delivery pipelines ( skeleton)(*).
Perhaps you have a previous scenario in mind where these statistics
would have been useful - if so, can you share any details of it?
Performance indicators applied to git (lead times and intervals to evaluate
throughput and stability) target development patterns, allowing you to as the
questions that might lead to the detection of misbehavior and constrains:
branching policies inefficiencies, out of sync feature maturity across teams
when deadlines are approaching, capacity issues in development teams,
infrastructure/service deficiencies etc.
Another interesting consideration is about how we build a complex system. We
cannot assume that the act of building a product takes place on a single step.
In fact, as products get bigger and more complex, specially when the stability
for whatever reason is not uniform across the entire product, the building
stage might/should take place in two or more steps. In such case, trunk/master
and build performance indicators has no chance to be mistaken. You need both.
## Examples
Deadline approaches, so change rate increases and code maturity decreases
together with the availability time integrators have to analise complex
issues. The strategy then is to structure the system in such a way that you
can stage the build (build in two steps) by creating a core system every
feature should build against. The rejection of changes happens automatically
when certain conditions are not met during the initial step of the building
process, which happens in isolation (base system and the new feature),
reducing the amount of issues integrators need to inspect as well as the
number variables to analyse on each issue.
Yesterday I heard a manager from Codehtink talk about Machiavelli strategies.
This is one of them. You need to design against human natural inertia :-)
The analysis of the proposed indicators should take place for each build step
separately and as a whole since what is executed at build time during each
step is different, even without considering the tests. So in general, master/
trunk indicators cannot imply the building stage.
An automaker has a portfolio of Linux based systems that share most of the
base system components and dependencies. In such a case, that base system is
treated as a different product and becomes the core system of an scenario like
the one described above. In such case, you need to isolate the indicators
related with building that base system (base product) to take them in
consideration when this base system is built together with additional
components in a different pipeline (product or deployable unit) belonging to
the same portfolio, managed by the same delivery team.
Another interesting case is when transparency is limited by business reasons,
like different providers, so you need to analyse the behavior of the changes
of that provider in an isolated environment first, and then together with the
changes from other providers.
Another case is when some of the tests or checks run during the build stage
come from a third party or a team of testers (different set of tests or
different teams responsible for different tests/checks that are unrelated with
developers who created the code that is being tested). Then the build stage
might be decomposed in several steps. The license compliance case will be a
popular case soon. In the same way that some license checks were moved to the
package stage, additional ones will be moved to the build stage since
dependencies might be meaningful to license compliance in some cases.
So far for build time metrics, we've gone down the road of making a tool
which can be given various versions of the source and recreate the test
results and performance data, which is the main mode of operation for
the current 'benchmarks' repository. I would much rather keep historical
testing results, at least until we have strong evidence that
retroactively running benchmarks produces the same results, but this
hasn't been popular so far. It also requires people to keep their own
database of results, as GitLab's CI (for example) will not store results
indefinitely.
This is an example of a popular trap. To put as priority the logs over the
indicators (metrics). When dealing with systems at scale, you need to think
the other way around because there is only a limited amount of non-
contextualized information you can analyze. The rest becomes useless. When
lost in the sea, birds are saviors. Look for them, not for the coast itself.
But let's not get into this "endless discussion".
My overall point is that if you have to choose indicators related with a tool
that is just part of a bigger process , choose those who are coherent across
the whole process and not those specific to a specific stage at first. Coherent
end to end metrics maximize the relation between value provided and effort.
(*) Approaches: walking skeleton vs dancing skeleton vs skeleton on crutches
Best Regards
--
Agustín Benito Bethencourt
Principal Consultant
Codethink Ltd
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]