Re: Initial indicators for a build tool and how to get more detailed ones



Hi,

On Monday, 16 April 2018 17:35:47 CEST Jim MacArthur wrote:
Hi Agustín.

To try and summarise your email, it looks to me like what you are asking
for is:

* A record of build successes and failures of the trunk over time
* Aggregate statistics of the time needed to build, as suggested -
median and standard deviation (over all builds, or the last 10, or last
week, for example?)
* Separate timings for builds from scratch and build which use existing
artifact caches

It would be enough to expose the above information in an easy to process/
consume way for external tools at first. The, if Buildstream wants to provide a 
simple way to process the data to provide values of the indicators or even 
graph them, cool. I do not see is as a core feature though. I would focus first 
on exposing the right information.


BuildStream doesn't (so far as I know) recommend any particular style of
project management or continuous integration, so these would seem to be
recommendations outside the scope of BuildStream.

This sentence worries me.

There are plenty of tools out there in the same space Buildstream is. Who is 
Buildstream competing against? What is the main target? 

Buildstream is a tool. Tools live in a production context. I recommend to 
select one context in order to meet somebody's expectations early on. Choosing 
the wrong one is better than not choosing any at all. 

Where we can do things
to support those activities, though, we should do. Building from
scratch, for example, can be done unofficially at the moment but we
don't have an explicit method for doing it.

My intention was not to provide requirements but to provide input in what I 
believe is an earlier stage in relation to metrics. I assume there are other 
priorities at this point.


With respect to records of trunk builds, ideally, if your source has a
single version (i.e. one repository) then there should be no build
failures on trunk. Branches which fail testing shouldn't be merged at
all. Things get more complicated if you have several repositories, or if
you have known failures in your test set. In the latter case, you'll
need more detail than simple pass/fail since you'll probably want to
record the progress of going from 100 failures to 5.

I would differentiate two phases when analyzing delivery performance (aka 
throughput and stability initially):
* Studying the indicators I provided will allow managers and development teams 
to ask the right questions. What happened here? It is about identifying 
potential wrong patters or events.  
* A second phase is the inspection, where delivery teams (integrators in this 
case) ask themselves why certain pattern is taking place.

Why has the failure rate increased in March compared to October, assuming in 
both months the development stage were the same (2 months prior to the 
release) and the product has not dramatically changed?

This questions will lead later on to the definition of further metrics.

Arguing that the initial set the indicators (what) is not enough because you 
need the second one (why) might be true at small scale where the delivery team 
is on top of everything, either because they fully understand how the system 
works or because the product is inmature so very few metrics matter, because 
of the introduction of a new technology with high impact in a specific area of 
the product, etc.. 

It will not be true though when a team of 20 is managing a product portfolio 
with 20 changes per day on each product and there are several development 
teams from a variety of business units or providers feeding the pipeline. At 
least not in my experience.

The question is who are you developing the tool for.

Recording the time
between trunk builds is also possible, but that alone wouldn't seem to
get you any more information than is in the git commit log, assuming you
run tests when branches are pushed, as the current BuildStream CI does.

The commit stage has its own metrics. Again, looking at delivery as a whole, 
the performance (stability and throughput) indicators should be analogous to 
those I described for the build tool, but applied to trunk/master. 

The idea is to have the same indicators on each stage of the delivery to be 
able to measure them end to end. The real value out of any indicator comes 
when measure end to end. This is a similar approach you follow when "creating 
the delivery pipelines ( skeleton)(*).

Perhaps you have a previous scenario in mind where these statistics
would have been useful - if so, can you share any details of it?

Performance indicators applied to git (lead times and intervals to evaluate 
throughput and stability) target development patterns, allowing you to as the 
questions that might lead to the detection of misbehavior and constrains: 
branching policies inefficiencies, out of sync feature maturity across teams 
when deadlines are approaching, capacity issues in development teams, 
infrastructure/service deficiencies etc. 

Another interesting consideration is about how we build a complex system. We 
cannot assume that the act of building a product takes place on a single step. 
In fact, as products get bigger and more complex, specially when the stability 
for whatever reason is not uniform across the entire product, the building 
stage might/should take place in two or more steps. In such case, trunk/master 
and build performance indicators has no chance to be mistaken. You need both. 

## Examples

Deadline approaches, so change rate increases and code maturity decreases 
together with the availability time integrators have to analise complex 
issues. The strategy then is to structure the system in such a way that you 
can stage the build (build in two steps) by creating a core  system every 
feature should build against. The rejection of changes happens automatically 
when certain conditions are not met during the initial step of the building 
process, which happens in isolation (base system and the new feature), 
reducing the amount of issues integrators need to inspect as well as the 
number variables to analyse on each issue.

Yesterday I heard a manager from Codehtink talk about Machiavelli strategies. 
This is one of them. You need to design against human natural inertia :-)

The analysis of the proposed indicators should take place for each build step 
separately and as a whole since what is executed at build time during each 
step is different, even without considering the tests. So in general, master/
trunk indicators cannot imply the building stage.

An automaker has a portfolio of Linux based systems that share most of the 
base system components and dependencies. In such a case, that base system is 
treated as a different product and becomes the core system of an scenario like 
the one described above. In such case, you need to isolate the indicators 
related with building that base system (base product) to take them in 
consideration when this base system is built together with additional 
components in a different  pipeline (product or deployable unit) belonging to 
the same portfolio, managed by the same delivery team.

Another interesting case is when transparency is limited by business reasons, 
like different providers, so you need to analyse the behavior of the changes 
of that provider in an isolated environment first, and then together with the 
changes from other providers.

Another case is when some of the tests or checks run during the build stage 
come from a third party or a team of testers (different set of tests or 
different teams responsible for different tests/checks that are unrelated with 
developers who created the code that is being tested). Then the build stage 
might be decomposed in several steps. The license compliance case will be a 
popular case soon.  In the same way that some license checks were moved to the 
package stage, additional ones will be moved to the build stage since 
dependencies might be meaningful to license compliance in some cases. 


So far for build time metrics, we've gone down the road of making a tool
which can be given various versions of the source and recreate the test
results and performance data, which is the main mode of operation for
the current 'benchmarks' repository. I would much rather keep historical
testing results, at least until we have strong evidence that
retroactively running benchmarks produces the same results, but this
hasn't been popular so far. It also requires people to keep their own
database of results, as GitLab's CI (for example) will not store results
indefinitely.

This is an example of a popular trap. To put as priority the logs over the 
indicators (metrics). When dealing with systems at scale, you need to think 
the other way around because there is only a limited amount of non-
contextualized information you can analyze. The rest becomes useless. When 
lost in the sea, birds are saviors. Look for them, not for the coast itself.

But let's not get into this "endless discussion".

My overall point is that if you have to choose indicators related with a tool 
that is just part of a bigger process , choose those who are coherent across 
the whole process and not those specific to a specific stage at first. Coherent 
end to end metrics maximize the relation between value provided and effort.

(*) Approaches: walking skeleton vs dancing skeleton vs skeleton on crutches 

Best Regards
-- 
Agustín Benito Bethencourt
Principal Consultant
Codethink Ltd


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]