Re: Automated benchmarks of BuildStream
- From: Jim MacArthur <jim macarthur codethink co uk>
- To: Tristan Van Berkom <tristan vanberkom codethink co uk>, buildstream-list gnome org
- Subject: Re: Automated benchmarks of BuildStream
- Date: Mon, 02 Apr 2018 16:39:22 +0100
On 2018-03-31 09:20, Tristan Van Berkom wrote:
I made some comments on:
https://gitlab.com/BuildStream/benchmarks/merge_requests/7
But that is really the wrong place, and to be clear: I don't want my
discussion there to be perceived as blocking the landing of that patch.
That said, I think this is important and I'd like to reiterate on that
comment in this thread a bit more clearly.
While I'm very happy with your response in general, I would really like
to see rendered output as a first class citizen of this repository and
not just an exercise left to the user.
I'll just reply to some of the things from your mail inline, too:
On Mon, 2018-03-26 at 12:05 +0100, Jim MacArthur wrote:
[...]
I don't think output presentation is being considered at the moment;
we
intend to produce data in a convenient manner and you can display
that
with Excel, R or gnuplot as you see fit. There will be some graphs
produced as part of the CI process, but I consider those to be an
example of what can be done with the benchmarks tool, rather than a
standard.
[...]
In my experience almost all performance analysis
requires custom benchmarks - even on previous projects when we've had
1/3rd of the team dedicated to benchmarking and about 10 bare-metal
servers per engineer running tests continuously, it was rare that the
CI
system would tell us anything useful for analysis. In most cases we'd
have to alter the code under test to add extra instrumentation. So I
wouldn't expect the benchmarks run as continuous integration to
identify
bottlenecks or point at ways to improve things; they should be there
as
a guard against unexpected performance changes.
Ok so the more I think of the perspective shown in the above two
points, the more I perceive this to be a problem.
One problem is, if the benchmarks are mostly data, and graphs are just
an exercise left to the user, then the distance between running
benchmarks and viewing the results is just too far - in other words,
nobody is going to notice performance differences in various previously
analyzed places - unless they go ahead and write the code to plot the
graph themselves - which just wont happen until a problem is observed
and investigated.
Another problem with this is that we are writing this with the
expectation of doing "throw away" work - which I don't like.
I fully intend to produce some graphs automatically. However, I've never
considered graphs anything other than indicative. In many ways they're
the opposite of automation - taking data which can be used to
automatically flag performance reduction and turning them into someone
only human-readable.
We can add any type of graph later if anyone asks for one, and in the
meantime we can produce the types which have already been suggested. The
work to create a graph from a CSV table is about 30 seconds - and if
anyone is doing that regularly, I'm happy to automate it. I also think
we should add new *metrics* anyone can create (as you describe later),
but I'd like to be more selective about graphs. 1-5 graphs are useful,
100 graphs aren't.
Where you say:
"In most cases we'd have to alter the code under test to add extra
instrumentation."
I fully agree, what I'd like to see here though is:
o A strategy for upstreaming of new log messages to support new
analysis being landed in both BuildStream and the benchmarks repo.
o The benchmarking plotting should just ignore versions of
BuildStream which do not yet have support for a given measurement.
o Ability to fine tune which kinds of messages BuildStream will be
emitting, such that one can run benchmarks against buildstream
with only certain messages turned on (in the case that some
benchmarking of micro activities slows down the whole processes
significantly and results in skewing of other results).
While the above is not going to be possible for 100% of analysis, I
expect that it will come very, very close (otherwise, we are straying
from benchmarking territory, and moving into profiling territory).
It would be a shame I think, if when one does the analysis of:
"The time it takes to run integration commands in compose elements
in the case where a file is moved into a new directory as a result
of the integration command - benchmarking whether time is
exponential depending on the number of new directories created"
... and the result of this is not integrated into a "benchmark suite".
Without thinking first about policy for extending the suite, the
analysis in this scenario would be done only once; and after that point
the exercise is just lost - this would really be lacking foresight.
Rather; once we do the analysis of this the first time, we should have
policy in place for how to add this to a suite, such that we continue
to see these results in rendered graphs every time the full benchmarks
are run, and keep seeing these results in 3, 5, or 10 years time.>
Does this make sense ?
Yes. As above, adding new metrics to the benchmark suite should be an
easy process. Adding more log entries or more instrumentation to
BuildStream will have to be done with some care to avoid causing
performance problems and code bloat. We'll probably want either a more
finely-grained log level or some toggles which can show or hide
different output.
As for the benchmarks repository, I think we can put any number of tests
or analysis scripts into the repository as long as they're curated well.
We'll need some mechanism to prioritise tests as the time taken to run
them will be limited sometimes. If we keep all the metrics and analysis
we've ever done to avoid duplicating work, then we will need to
periodically review them and archive or adjust priorities of older
tests. Perhaps we'll do this every time BuildStream is released.
Cheers,
-Tristan
[Date Prev][
Date Next] [Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]