Re: Automated benchmarks of BuildStream

From: Jim MacArthur <jim macarthur codethink co uk>
To: Tristan Van Berkom <tristan vanberkom codethink co uk>, buildstream-list gnome org
Subject: Re: Automated benchmarks of BuildStream
Date: Mon, 02 Apr 2018 16:39:22 +0100



On 2018-03-31 09:20, Tristan Van Berkom wrote:

I made some comments on:

    https://gitlab.com/BuildStream/benchmarks/merge_requests/7

But that is really the wrong place, and to be clear: I don't want my
discussion there to be perceived as blocking the landing of that patch.

That said, I think this is important and I'd like to reiterate on that
comment in this thread a bit more clearly.

While I'm very happy with your response in general, I would really like
to see rendered output as a first class citizen of this repository and
not just an exercise left to the user.

I'll just reply to some of the things from your mail inline, too:

On Mon, 2018-03-26 at 12:05 +0100, Jim MacArthur wrote:
[...]

I don't think output presentation is being considered at the moment;we intend to produce data in a convenient manner and you can displaythat
with Excel, R or gnuplot as you see fit. There will be some graphs 
produced as part of the CI process, but I consider those to be an 
example of what can be done with the benchmarks tool, rather than a 
standard.


[...]

In my experience almost all performance analysis 
requires custom benchmarks - even on previous projects when we've had 
1/3rd of the team dedicated to benchmarking and about 10 bare-metal 
servers per engineer running tests continuously, it was rare that theCI
system would tell us anything useful for analysis. In most cases we'd 
have to alter the code under test to add extra instrumentation. So I 
wouldn't expect the benchmarks run as continuous integration toidentify bottlenecks or point at ways to improve things; they should be thereas
a guard against unexpected performance changes.


Ok so the more I think of the perspective shown in the above two
points, the more I perceive this to be a problem.

One problem is, if the benchmarks are mostly data, and graphs are just
an exercise left to the user, then the distance between running
benchmarks and viewing the results is just too far - in other words,
nobody is going to notice performance differences in various previously
analyzed places - unless they go ahead and write the code to plot the
graph themselves - which just wont happen until a problem is observed
and investigated.

Another problem with this is that we are writing this with the
expectation of doing "throw away" work - which I don't like.

I fully intend to produce some graphs automatically. However, I've neverconsidered graphs anything other than indicative. In many ways they'rethe opposite of automation - taking data which can be used toautomatically flag performance reduction and turning them into someoneonly human-readable.

We can add any type of graph later if anyone asks for one, and in themeantime we can produce the types which have already been suggested. Thework to create a graph from a CSV table is about 30 seconds - and ifanyone is doing that regularly, I'm happy to automate it. I also thinkwe should add new *metrics* anyone can create (as you describe later),but I'd like to be more selective about graphs. 1-5 graphs are useful,100 graphs aren't.


Where you say:

  "In most cases we'd have to alter the code under test to add extra 
   instrumentation."

I fully agree, what I'd like to see here though is:

  o A strategy for upstreaming of new log messages to support new
    analysis being landed in both BuildStream and the benchmarks repo.

  o The benchmarking plotting should just ignore versions of
    BuildStream which do not yet have support for a given measurement.

  o Ability to fine tune which kinds of messages BuildStream will be
    emitting, such that one can run benchmarks against buildstream
    with only certain messages turned on (in the case that some
    benchmarking of micro activities slows down the whole processes
    significantly and results in skewing of other results).

While the above is not going to be possible for 100% of analysis, I
expect that it will come very, very close (otherwise, we are straying
from benchmarking territory, and moving into profiling territory).

It would be a shame I think, if when one does the analysis of:

  "The time it takes to run integration commands in compose elements
   in the case where a file is moved into a new directory as a result
   of the integration command - benchmarking whether time is
   exponential depending on the number of new directories created"

... and the result of this is not integrated into a "benchmark suite".

Without thinking first about policy for extending the suite, the
analysis in this scenario would be done only once; and after that point
the exercise is just lost - this would really be lacking foresight. 

Rather; once we do the analysis of this the first time, we should have
policy in place for how to add this to a suite, such that we continue
to see these results in rendered graphs every time the full benchmarks
are run, and keep seeing these results in 3, 5, or 10 years time.>
Does this make sense ?

Yes. As above, adding new metrics to the benchmark suite should be aneasy process. Adding more log entries or more instrumentation toBuildStream will have to be done with some care to avoid causingperformance problems and code bloat. We'll probably want either a morefinely-grained log level or some toggles which can show or hidedifferent output.

As for the benchmarks repository, I think we can put any number of testsor analysis scripts into the repository as long as they're curated well.We'll need some mechanism to prioritise tests as the time taken to runthem will be limited sometimes. If we keep all the metrics and analysiswe've ever done to avoid duplicating work, then we will need toperiodically review them and archive or adjust priorities of oldertests. Perhaps we'll do this every time BuildStream is released.


Cheers,
    -Tristan

Follow-Ups:
- Re: Automated benchmarks of BuildStream
  - From: Tristan Van Berkom

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]