Re: Automated benchmarks of BuildStream

From: Tristan Van Berkom <tristan vanberkom codethink co uk>
To: Jim MacArthur <jim macarthur codethink co uk>, buildstream-list gnome org
Subject: Re: Automated benchmarks of BuildStream
Date: Wed, 04 Apr 2018 15:48:11 +0900

On Mon, 2018-04-02 at 16:39 +0100, Jim MacArthur wrote:
[...]

One problem is, if the benchmarks are mostly data, and graphs are just
an exercise left to the user, then the distance between running
benchmarks and viewing the results is just too far - in other words,
nobody is going to notice performance differences in various previously
analyzed places - unless they go ahead and write the code to plot the
graph themselves - which just wont happen until a problem is observed
and investigated.

Another problem with this is that we are writing this with the
expectation of doing "throw away" work - which I don't like.


I fully intend to produce some graphs automatically. However, I've never 
considered graphs anything other than indicative. In many ways they're 
the opposite of automation - taking data which can be used to 
automatically flag performance reduction and turning them into someone 
only human-readable.


I don't take much stock in automated flagging and reporting of
performance changes; if that was ever a goal for this project, I was
never made aware of it.

Performance degradation in one specific area can often be justified by
a performance gain in a different area. I'm skeptical that having an
automated system to tell me that some performance aspect has degraded
will be more helpful than annoying, except for perhaps some very
global, few metrics (like the time it takes to "build" a very specific
sample project).

Further, we're going to have to run these in an extremely controlled
environment for any automated system to report anything useful (gitlab
is almost certainly out of the question here).

If you want to work on automated reporting and flagging of potential
issues, that sounds like an interesting experiment, but producing human
readable graphs is an area where I am certain we can get a real value
add.

We can add any type of graph later if anyone asks for one, and in the 
meantime we can produce the types which have already been suggested. The 
work to create a graph from a CSV table is about 30 seconds - and if 
anyone is doing that regularly, I'm happy to automate it.


First, I certainly don't share your optimism that the work involved for
generating a graph from a CSV table is around 30 seconds for a human,
you have to assume that developers who want to see the output have
never looked at the benchmarks repo code base at all, nor do they have
a desire to unless they are doing new investigative work.

I would expect a minimum ramp up of 30 minutes for a newcomer to the
project just to understand how things work, and then a 30 seconds to
generate the graph they want.

Secondly, the reason to generate human readable output should not be
driven by the idea that people are currently doing it repetitively; if
we don't have the results in human readable form, then the desire to
view them will just not happen, nobody will bother except for a very
select few.

We should instead be focusing on a policy that:

  o If someone needed to do investigative work once

    o Which might involve adding new metrics
    o Or might involve using existing metrics to produce a new view

  o Then we should have a procedure for that work to be preserved,
    such that anyone can later look and compare that new metric/view
    in the future, without ever even understanding the benchmarks repo
    in any level of detail, except for how to run them, or view the
    results on a web page.

I also think 
we should add new *metrics* anyone can create (as you describe later), 
but I'd like to be more selective about graphs. 1-5 graphs are useful, 
100 graphs aren't.


Again disagree.

100 graphs can be useful if sorted correctly; if I want to see a page
showing me performance aspects strictly related to `filter` elements or
`compose` elements, these should be easily located.

I do agree however that a directory with 100 differently named png
files is not very useful, a minimal HTML document as a part of the
generation process might do well to help us sort things into something
more humanly consumable as the number of outputs increase over time.


In any case, it looks like you are focused on one thing while I am
focused on another, but the two focuses are not mutually exclusive.

To clarify, I am not asking for a lot of graphs to be produced on day
one; I am only asking for the initial work to consider a policy for
extensibility and preservation of future work, such that we can always
see the human readable output of an exercise which has already been
done once and was interesting to someone.

Cheers,
    -Tristan

References:
- Re: Automated benchmarks of BuildStream
  - From: Jim MacArthur

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]