Re: Benchmarking BuildStream commands from user point of view



Great !

On Thu, 2017-11-02 at 16:50 +0000, Angelos Evripiotis wrote:
Here's an early look at how I think the benchmarking can work, I'm looking for
feeback to steer this work in progress.

I've put together a Python script here:
https://gitlab.com/BuildStream/buildstream/merge_requests/136

It outputs delimiter seperated values for easy handling on the command-line.
It's not CSV just now, as the number of fields is variable.

Here you can see that compared to starting Python, BuildStream has a lot of
work to do, running time in seconds:

    python,0.03
    python_import_buildstream,0.57
    help,0.69 ('bst help')

Personally I find this start-up pause to be quite noticable when working with
'bst'.

Ok so we're off to a good start.

I haven't looked at the script, but I can say that we want to shoot for
maintainability and longevity here, in other projects it's taken me a
long time to get off the ground to run benchmarks after they have bit
rotted for a year or so.

I *think* that it probably makes sense to maintain this as a separate
git repository beside buildstream in the buildstream 'group' - because
one thing we'll certainly want to do is run the benchmarks against
multiple versions of buildstream and plot the results together so that
we can easily observe performance improvements, regressions and
tradeoffs.

While this is not extremely helpful, I feel like I should point to the
benchmarks I worked on back with Openismus, the code is entirely not
useful for this, but the same concepts apply:

git repo: https://gitorious.org/openismus-playground/phonebook-benchmarks
sample blog post: https://blogs.gnome.org/tvb/2013/12/03/smart-fast-addressbooks/

So first thing I notice in your graph is that you are measuring time of
session, not time of record - if for example, we are benchmarking
BuildStream parse and initial load performance (it's a great place to
start !); we should be measuring the time *per element*, this way we
can easily observe:

  o The actual time it takes to load a single element
  o Whether the algorithm is linear or not, ideally we want it to take
    approximately the same time to load a single element in a 1 element
    pipeline as it takes to load a single element in a 100,000 element
    pipeline.

Also in your graph, I notice that the graph seems exponential at first
and then becomes linear, this is because we are counting time to load
the BuildStream program itself in the python interpretor (I think you
mentioned this already).

To fix this, I would be willing to expose some internal API entry
points in the BuildStream core, allowing us to measure things more
precisely, i.e. in this case it will be useful for a benchmarking
program to know:

  o When buildstream starts to actually run
  o When buildstream has internally completed the load of the pipeline

If you can come up with something simple enough, perhaps some kind of
handle that buildstream can call into and mark significant events which
your benchmarking program can observe, this will get us on the right
track I think.

Also, this gets a bit more complex with Source plugins which exercise
host tooling - it should be noted that if we run the benchmarks in 2017
and then a year later on a different host - the performance man improve
or regress depending on developments which have occurred in third party
tooling - I dont think it's important to setup an identical environment
for benchmarking though; just that we run the full benchmarking suite
against every interesting version of BuildStream on the same host
setup, and know that external debris might change performance.

For load time performance specifically, it is interesting to exercise
the benchmarks with different initial element states and source
consistency states, i.e.:

  o Is the element cached, or must it be built, this will effect
    codepaths determining such at load time
  o Is the source code available, or must it be fetched, this
    will also effect initial interrogation of element state

I have noticed a regression in load performance in recent months, this
seems to happen specifically when running a command on a pipeline where
nothing is locally cached and sources need to be downloaded - to be
fair after diagnosing, it was not a load time regression, but rather it
seems that in that case - *after* loading the whole pipeline, cache
keys seem to be redundantly re-calculated after the load completes and
while displaying the summary (e.g. doing `bst show`).

Looks like a fun project, I'm excited about this :)

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]