Re: Automated benchmarks of BuildStream



I made some comments on:

    https://gitlab.com/BuildStream/benchmarks/merge_requests/7

But that is really the wrong place, and to be clear: I don't want my
discussion there to be perceived as blocking the landing of that patch.

That said, I think this is important and I'd like to reiterate on that
comment in this thread a bit more clearly.

While I'm very happy with your response in general, I would really like
to see rendered output as a first class citizen of this repository and
not just an exercise left to the user.

I'll just reply to some of the things from your mail inline, too:

On Mon, 2018-03-26 at 12:05 +0100, Jim MacArthur wrote:
[...]
We'll break down the results in terms of time per record and have some 
means of observing linearity. Some people, however, will only be 
concerned with how long it takes to build one project, e.g. freedesktop-sdk.

This is definitely interesting as a macro perspective on performance,
I'll just state the obvious and mention that this kind of benchmark
should use a frozen version of freedesktop-sdk as a sample.

[...]
I don't think output presentation is being considered at the moment; we 
intend to produce data in a convenient manner and you can display that 
with Excel, R or gnuplot as you see fit. There will be some graphs 
produced as part of the CI process, but I consider those to be an 
example of what can be done with the benchmarks tool, rather than a 
standard.

[...]
In my experience almost all performance analysis 
requires custom benchmarks - even on previous projects when we've had 
1/3rd of the team dedicated to benchmarking and about 10 bare-metal 
servers per engineer running tests continuously, it was rare that the CI 
system would tell us anything useful for analysis. In most cases we'd 
have to alter the code under test to add extra instrumentation. So I 
wouldn't expect the benchmarks run as continuous integration to identify 
bottlenecks or point at ways to improve things; they should be there as 
a guard against unexpected performance changes.

Ok so the more I think of the perspective shown in the above two
points, the more I perceive this to be a problem.

One problem is, if the benchmarks are mostly data, and graphs are just
an exercise left to the user, then the distance between running
benchmarks and viewing the results is just too far - in other words,
nobody is going to notice performance differences in various previously
analyzed places - unless they go ahead and write the code to plot the
graph themselves - which just wont happen until a problem is observed
and investigated.

Another problem with this is that we are writing this with the
expectation of doing "throw away" work - which I don't like.

Where you say:

  "In most cases we'd have to alter the code under test to add extra 
   instrumentation."

I fully agree, what I'd like to see here though is:

  o A strategy for upstreaming of new log messages to support new
    analysis being landed in both BuildStream and the benchmarks repo.

  o The benchmarking plotting should just ignore versions of
    BuildStream which do not yet have support for a given measurement.

  o Ability to fine tune which kinds of messages BuildStream will be
    emitting, such that one can run benchmarks against buildstream
    with only certain messages turned on (in the case that some
    benchmarking of micro activities slows down the whole processes
    significantly and results in skewing of other results).

While the above is not going to be possible for 100% of analysis, I
expect that it will come very, very close (otherwise, we are straying
from benchmarking territory, and moving into profiling territory).

It would be a shame I think, if when one does the analysis of:

  "The time it takes to run integration commands in compose elements
   in the case where a file is moved into a new directory as a result
   of the integration command - benchmarking whether time is
   exponential depending on the number of new directories created"

... and the result of this is not integrated into a "benchmark suite".

Without thinking first about policy for extending the suite, the
analysis in this scenario would be done only once; and after that point
the exercise is just lost - this would really be lacking foresight. 

Rather; once we do the analysis of this the first time, we should have
policy in place for how to add this to a suite, such that we continue
to see these results in rendered graphs every time the full benchmarks
are run, and keep seeing these results in 3, 5, or 10 years time.

Does this make sense ?

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]