Re: Automated benchmarks of BuildStream



On 27/02/18 18:15, Sam Thursfield wrote:

We are keen to make BuildStream as fast as possible. An important
precursor for doing that is to have a way of measuring how well it
performs for the important use cases.

This has been discussed since November last year[1] and we now have
some initial benchmarks that run nightly. You can view the benchmark
pipelines here:

    https://gitlab.com/BuildStream/benchmarks/pipelines

To view the results of the latest benchmark run, click on the menu
button on the far right next to a job (with a cloud + downwards arrow
icon) and select 'Download benchmarks artifacts'. This gets you a .zip
file containing a `results.json` file and these are the "raw"
benchmarking results.

Dominic Brown is currently working on post-processing the results so
that you get a nicely rendered table instead of an unordered JSON dump
of the data. For now, you just have to read the JSON but it's pretty
simple:

 * the 'versions' section lists the different versions of BuildStream
   that were tested against each other.

 * each block in the 'tests' section has "name" and "results" fields;
   the "results" sections have a block per version that was tested;
   and within there you see the actual measurements from each test run.

The tests run on a dedicated laptop here in the Codethink office
(pictured here[3]). More runners would be welcome, these need to be
dedicated bare metal machines of some sort to ensure we get meaningful
numbers.

Currently we only run two tests. One is called "Startup time" and just
runs `bst --help`; a nice touch is you can see the improvements we got
from fixing issue #172 in the results. The other test is called "Build
of Baserock stage1-binutils for x86_64", and runs a very simple build
pipeline. (This test uses the Baserock definitions project[2] as a
testcase because that project is still buildable with BuildStream
1.0.0).

The test runner is a simple Python program that lives here:

    https://gitlab.com/BuildStream/benchmarks/

It's intended to be usable from the commandline as well as from CI, see
the README.md for instructions on how to run it.

The tests are defined in that repo too, in this file:


https://gitlab.com/BuildStream/benchmarks/blob/master/bst_benchmarks/default.benchmark

The format is also partially documented in the benchmarks.git README.
Currently the tests are pretty simplistic. We aim to make the results
much more granular, as sketched out in this spreadsheet[4]. To do this
we need to start parsing BuildStream's log output and Jim MacArthur will
be looking into that.

Another step is to have a project generation script to which we can say
"give me a project with 100,000 elements" and such things, so we can
for example test real scalability limitations. Work on this is in
progress at Bloomberg.

All that work is tracked under
<https://gitlab.com/BuildStream/buildstream/issues/205> so subscribe
there for updates. Future work should be tracked as issues against
the 'benchmarks' project.

So that is the current state of our performance improvements. Once the
benchmarking tool is up to scratch, we can get back to the performance
roadmap[5] and do some optimizing :-)

Sam

[1].
https://mail.gnome.org/archives/buildstream-list/2017-November/msg00001.html

[2]. https://gitlab.com/baserock/definitions/

[3].
https://wiki.gnome.org/Projects/BuildStream/Infrastructure?action=AttachFile&do=view&target=benchmark-laptop.jpg

[4].
https://docs.google.com/spreadsheets/d/1fDNXX1roADg-GbhpRbvWSla0Y2J_jv9GLU3oz38ATbw/edit#gid=574863445

[5]. https://wiki.gnome.org/Projects/BuildStream/Roadmaps/Performance2018



Hey,

to add to this, I have been working on converting the current JSON output from BuildStream
benchmarks into something more human readable, currently the JSON files look
something like this:

{
    "start_timestamp": 1520986910.445044,
    "tests": [
        {
            "name": "Startup time",
            "results": [
                {
                    "measurements": [
                        {
                            "max-rss-kb": 21496,
                            "total-time": 0.27
                        },
                        {
                            "max-rss-kb": 21596,
                            "total-time": 0.31
                        },
                        {
                            "max-rss-kb": 21584,
                            "total-time": 0.28
                        }
                    ],
                    "repeats": 3,
                    "version": "master"


Repeated for each of the different BuildStream versions. I have condensed them
down to something that looks like this in CSV format:

Name,Version,Total Time,Max RSS
Startup time,master,0.2866666666666667,21558.666666666668
Startup time,1.1.0,0.19666666666666666,21368
Startup time,1.0.1,0.45,31501.333333333332

Attached: the results.csv

If you would prefer you can open the .csv in some spreadsheet software
(e.g. LibreOffice Calc) and it should look like a nice table.

The results.csv currently show an average number for the three tests for each version
of BuildStream.

If anyone has any thoughts/changes they think should be made please let me know
by replying to this email, or pinging me on #buildstream, my IRC nick is
"dominic"

Thanks,
Dom

Attachment: results.csv
Description: Text Data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]