Initial indicators for a build tool and how to get more detailed ones



Hi,

here it goes my high level approach to the metrics question. 

In a mature software delivery environment, where release candidates are 
produced on regular basis, the initial requirement for the delivery team is:

1.- The production of Release Candidates is not interrupted.

2.- When it does, the system goes back to the previous state as fast as 
possible.

To find out how to define the initial metrics, I suggest to treat the build tool 
as a black box. So when thinking about metrics, I recommend to focus first in 
those which target the above use case, which business wise has a big impact.

The first requirement is met by using as indicator the Build Failure Rate, 
which could be measure using time as base, since speed is our main driver. So 
a percentage over time of the successful builds of trunk could be the obvious 
candidate.

The time to recover the build could also be measured using time as a base. In 
this case, keep it simple measuring the median the standard deviation.

Those two indicators would provide you an idea of reliability, recently 
pointed by Steve Smith as Stability Indicator.

So a requirement for Buildstream would be to provide a simple way to collect 
data about when the build of trunk (or any other repo(**)) fails and succeed, 
including the time. I think this is a simple case, but very useful and with 
big impact for delivery teams.

These two indicators, studied over time, can allow you to identify misbehavior 
of teams, can hint on bad practices of development teams, misconfigurations of 
the teams across different time zones leading to constant (usually long) 
recovery times, etc. They are easy to graph and understandable by managers and 
dev teams to explain misbehavior. 

Once the software delivery is stable, the focus is to speed up the production 
of release candidates. Continuous Delivery practitioners talk about 
throughput, focusing on reducing feedback loops.

Again, if we think about Buildstream as a black box, the obvious indicators 
are:
* How long does the build takes, or Build lead time(*), which I would call 
build lapse (I like latin).
* Time between two consecutive builds or build interval.

The first indicator of throughput seems obvious. It has a dramatic impact on 
the (overall lead time). Sadly it usually is the only metric used by many 
delivery teams to measure how good a build tool is for the same successful 
build.

I call the attention over the definition of a build being done/finished. When we 
talk about caches, distributed builds and artifacts creation and storage for 
instance, the definition might not be so obvious. I recommend to take the 
perspective again of Buildstream as a black box as a starting point when 
defining done. 

I would call the attention over how important the second measure is. Evaluated 
over time, it can say a lot about how the development/packaging/staging 
projects is going, when are the teams committing, when is heavy development 
taking place, etc.

To graph build lapse, the suggestion is to use the median and the standard 
deviation initially. The same applies for the build interval.

This information should be easy to collect for anybody consuming Buildstream. 
We can go wild providing tons of data, but if making graphs of the above 
becomes trivial, we have a big win, specially if it can also be easily done by 
tools designed to crunch data, draw graphs and generate reports.

Once these indicators are provided, the strategy to define further metrics to 
integrators, more focused on the integration stage itself, is to split the 
tool in several smaller sequential black boxes and evaluate the indicators 
that provide simple information about stability and throughput.

As example, assuming how limited my knowledge on the  Buildstream insides is, 
I would suggest to divide it thinking how source code and definitions are 
managed, how the build is created and finally how the build and artifacts are 
managed and made available. So the process to define the initial metrics for 
integrators is the same than for the delivery team, but at a smaller scale. 

Why these indicators?

Because they have a huge impact in the business, they are simple and teams can 
learn a lot from the questions risen when checking the data/graphs. New 
interesting indicators will arise out of that analysis process, which it will 
have more impact for the business than defining lots of technical centric 
metrics up front.

(*) I have mention before how sensitive the concept "lead time" is because 
people use it in many different ways. I recommend to use it only for measuring 
the time for a commit to be deployable or deployed, depending on the level of 
control you have on the deployment stage. In any case, it is referred to the 
whole pipeline. 

(**) Please, please, do trunk base development.

Note: Steve Smith (@SteveSmithCD) is writing a book about these topics which 
will be finished throughout this year.

Best Regards
-- 
Agustín Benito Bethencourt
Principal Consultant
Codethink Ltd


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]