Re: [BuildStream] Exploratory effort related with metrics in BuildStream: are we growing?



Hi,

in an effort to open a discussion about the conclusions of the report, I add here the initial questions and 
some of the most obvious conclusions:

## Questions

There are other motivations for doing this exercise and for choosing the above driver question:
    I. Is there an increasing number of people participating in the project? Past present and future 
processes might be affected by the answer to this question.
    II. Is the code base growing? If yes, can be the growth characterised?
    III. We have no data from our user base so we cannot determine directly if it is growing. We might get 
interesting information though about the interactions between the project and our user base that can lead us 
to find out if the user base is growing and how. Can we identify effects?
    IV. The project is putting effort in a release process. Does it have any influence in our potential 
growth?
    V. Can we identify taken actions that had an effect in the project growth? Which effect?
    VI. Which metrics has shown clear potential to characterise growth? What additional metrics and actions 
can be identified to potentially improve the analysis?
    VII. Are there elements or metrics that lead us to further interesting questions?

## Conclusions

The following conclusions will refer to those questions.
    I. There is an increasing number of people participating in the project. The nature of that growth 
indicates the following:
        A. The accumulated-unique-authors-issues show a behaviour that invites to think that a controlled 
number of contributors are using the ticketing system (creating issues at least) intensively in opposition 
with having an increasing number of contributors interacting with the ticketing system in a more extensive 
way.
            1. Which metrics can help in answering/confirming this idea?
            2. These questions might help to characterise the growth in issue creators:
                a) Bug reporters vs enhancement/requests/requirements reporters.
                b) Users (reporters/testers) vs developers (read below)
        B. The growth of unique authors is not linear which would be a sign of organic growth.
    II. The code base is growing. The number of tests and comments are growing too. 
        A. The data reflects a significant increase of effort in increasing test coverage (unit tests).
            1. The data invites to think that something triggered a change in January 2018 that has its 
reflection in February 2018. Was it the BuildStream v1.0 release?
            2. Is the required management effort of tests increasing? Is it a problem today?
        B. There is no detected correlation between the lines of code and the lines of technical 
documentation included in merged commits.
        C. The technical documentation graph does not respond to any model described by a simple enough 
function.
            1. This might be a sign of lack a well defined, known and followed processes to link both, 
development and technical documentation creation.
    III. There are no trends or data that can tell us significant information about the growth of our user 
base.
        A. The number of bugs create is growing so the bugs close in a rate that the remaining open bugs 
growth rate is small. This can be a sign of:
            1. The overall effort in bug fixing is outstanding.
            2. Most bugs are identified during the development/delivery  process, which is robust.
            3. Bugs from users that land BuildStream are already processed.
            4. There is a flaw in the process between issues opened and being identified as a bug.
            5. There might be other reasons worth investigating. 
        B. The conclusions described in (I), together with the fact that the number of issues and commits 
authors are very similar invite to think that the number of reporters is small compared to the number of 
developers (are also reporters) which might be a sign of:
            1. BuildStream has a very limited number in users.
                a) Users report through contributors instead of directly so any indirect evaluation have to 
be done across these contributors.
            2. Both of the above are true. 
            3. BuildStream is disconnected from its user base.
    IV. When it comes to analyzing issues in general and bugs in particular, the effect of the releases is 
clearly perceived.
        A. BuildStream releases push the creation of issues and bugs.
            1. Is this because the tracked batches of work are smaller, because the amount of effort 
increases or because there is a change in behaviour when it comes to tracking efforts as deadlines approaches.
        B. BuildStream releases increase the rate of issues and bugs resolution.
            1. Is this because contributors pay more attention to already open bugs, because the overall 
testing effort increases or is it a change in the behaviour when it comes to closing bugs?
        C. The month after the release, the increase in the number of lines of tests is significant.
            1. The seem to be a link with the increase of bugs closed that might indicate a direct 
connection. It will require confirmation in the coming release.
    V. There are several actions or elements that has been identified as having an effect in the project 
growth:
        A. Christmas vacations have a clear effect in the project. The activity drops. This might be a sign 
of the nature of the contributors: professionals appointed by organizations to contribute vs. volunteers.
        B. Releases has been mentioned already.
        C. Answer required: What happened in September with the number of committers?
        D. Answer required: What happened in Jan’17, Jul’17 and Apr’18 that commits grew so much?
        E. Answer required: What trigger the increase of lines of tests in Feb 2018, the release?
        F. Answer required: What trigger the increase of lines of docu in May 2018?
    VI. It seems that the question asked lead us to some meaningful metrics that provide us information we 
can use to answer the question. Those metrics are::
        A. Issues
            1. Monthly number of Issues created, closed and opened.
            2. Monthly number of bugs created, closed and opened.
            3. Unique issues authors.
            4. Accumulated data of the above metrics.
            5. Issues per milestone.
        B. Git commits
            1. Number of commits, lines of code, comments and tests.
            2. Unique authors of commits
            3. Accumulated data of the above metrics.
            4. Metrics related with files has led to interesting analysis but not to meaningful conclusions 
compared to the above ones. This does not mean they are useless, just that correlation with further metrics 
are probably required. 
        C. There are some additional metrics that might be useful to dig into the driver question:
            1. Out of the issues, a subset comes from requests or requirements. It would be interesting to 
analyze these.
            2. Extend the study to participants (not just authors) in both, commits and issues, would provide 
additional information to answer the driver question.
            3. Extend the study related with commits to other repositories.
            4. Analyze the activity in the buildstream community mailing list and the IRC channel following a 
similar approach that the one followed to analyse the git repository and issues trackers.
        D. It is interesting to dig in the effect that adding more full time paid developers has in the 
project growth vs getting contributions from testers and users.
    VII. Based on this experience, once consolidated the study of the project growth, the following studies 
could be performed:
        A. Software production process metrics that can tell us information about how the software is 
produced and if the overall process is providing high levels of stability and throughput at once. This study 
would allow us to move faster towards Continuous Delivery/deployment. The initial metrics to evaluate could 
be:
            1. Code (Master) Throughput Indicator.
            2. Integration stability indicator.
            3. Integration Throughput Indicator 
        B. Static code analysis that can provide us information about the architecture, code style and 
others. This study can provide us information towards improving code readability and quality, among other 
characteristics.
        C. Characterization of BuildStream contributors and users to better understand who is contributing to 
BuildStream, who is using it and who is is serving as interface between users and contributors. This analysis 
can lead us to implement betters policies and take actions to increase the number of contributors and users 
as well as reducing the gap between those two groups.

I look forward to your comments.

Links to the report and additional information:
* Link to the full report (.pdf) including tables and graphs (download): 
https://gitlab.com/BuildStream/nosoftware/communication/raw/master/statistics-reports/buildstream_statistics_report_nov_2018/BuildStream_exploratory_data_analysis_report_nov_2018_full.pdf?inline=false
 
* Link to the report document only (.pdf), that include links to the the tables and graphs:   
https://gitlab.com/BuildStream/nosoftware/communication/blob/master/statistics-reports/buildstream_statistics_report_nov_2018/BuildStream_exploratory_data_analysis_report_nov_2018_document_only.pdf
* Link to the report folder: 
https://gitlab.com/BuildStream/nosoftware/communication/tree/master/statistics-reports/buildstream_statistics_report_nov_2018
* Link to the scripts used to extract the data from git: 
https://gitlab.com/BuildStream/nosoftware/communication/tree/master/statistics-tools

I will write a blog post about this effort in a few days, so I can include references to the discussion 
derived from it.

I posted a few days ago a blog post summarising the info included in the report about how it was done:

https://toscalix.com/2018/12/17/buildstream-metrics-exploration/

 
Best Regards


-- 
Agustín Benito Bethencourt
Principal Consultant
Codethink Ltd
We respect your privacy.   See https://www.codethink.co.uk/privacy.html


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]