Re: [BuildStream] Weekly performance update - Regenerated data for a different project




On 01/04/2019 16:28, James Ennis via buildstream-list wrote:
Hello,

This week's weekly performance update will be much more verbose than previous posts as I'd like to take the time to talk about the changes we've made to the project we have been using to generate the benchmarks data and I will also try to mention some of the interesting trends/observations.

So, we have made a tweak to the Debian-like [0] project. The master branch of this repo has remained unchanged, ~75,000 *local* import elements whose dependency relationship reflects that of the source and binary packages in Debian-stretch. However, there now exists the branch: "jennis/use_remote_file" [1], which uses "remote" imports (with file urls) of uniquely generated files (each of size 4.0K and located in the project's files/ directory). These files are generated by the ./prepare_project.sh script on that branch.

Note that we are still testing a subset of this project - base-files/base-files.bst is the target which produces a pipeline of ~6000 elements.

The reason we back-generated all the data for an altered project was because Raoul's source-cache MR (!1124) [2] reminded us that local imports were not being fetched and staged into the local cache (pre-!1124), so we saw a spike in build times from when this landed. Therefore, in order to more accurately simulate a 'real-world' project, we altered the Debian-like project to use 'remote' file imports.

In addition to this, we are now also measuring the build times and peak memory usage for 4, 8 and 12 builders.

-----

All results were obtained on the same hardware (a Codethink dev machine configured to be a gitlab runner - not being used for anything else), and the results files are being appended to by a CI job. Specs as follows:

* Linux (Debian stable)
* x86_64
* 16G RAM
* 500Gb SSD
* Intel i7-3770
* 8 cores @ 3.40 GHz

-----

Now, bearing in mind all the results have been completely regenerated, here are the times and max-rss (peak memory usage) results for a `bst show` of base-files/base-files.bst in the Debian-like project (jennis/use_remote_file branch) [1] (Note that I've set the column width of the table to 72 in the hope that this will wrap nicely for everyone):

┌─────┬─────────────────┬─────────────────┬───────────────────────────┐
│     │      Show       │ YAML cache show │      Show once built      │
├─────┼───────┬─────────┼───────┬─────────┼─────────────────┬─────────┤
│Dates│Time   │Memory   │Time   │Memory   │Show once built  │Memory   │
│     │(s)    │(M)      │(s)    │(M)      │(s)              │(M)      │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│01/14│49.23  │1089     │27.17  │1115     │32.82            │1117     │
│01/20│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│01/21│48.85  │1089     │27.35  │1115     │32.84            │1117     │
│01/27│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│01/28│49.50  │1088     │27.17  │1115     │32.76            │1117     │
│02/03│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│02/04│48.08  │561      │26.68  │588      │31.90            │589      │
│02/10│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│02/11│45.46  │408      │23.75  │435      │29.50            │437      │
│02/17│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│02/18│43.17  │417      │21.65  │443      │27.28            │445      │
│02/24│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│02/25│41.75  │493      │19.43  │519      │25.69            │625      │
│03/03│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│03/04│40.96  │517      │18.81  │543      │25.20            │696      │
│03/10│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│03/11│40.44  │519      │18.06  │545      │24.50            │698      │
│03/17│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│03/18│38.87  │521      │16.74  │547      │23.19            │696      │
│03/24│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│03/25│22.63  │422      │13.40  │433      │17.36            │563      │
│03/31│       │         │       │         │                 │         │
└─────┴───────┴─────────┴───────┴─────────┴─────────────────┴─────────┘


# Interesting observations

* On the whole, times and peak memory usage are decreasing.

* Our peak memory has not been consistently decreasing. This is due to the fact that we have made a couple of time vs memory trade offs, favouring a reduction in `show` time. The most notable culprit here being !1164 [4]: Altering the behavior of the garbage collector - in favor of a faster show.
     - This is very noticeable in the attachment 'all_show_memory_results.png', there is a spike in memory near the beginning of March.
     - This corresponds to a dip in the time taken in the 'all_show_times_results.png' attachment.

* Please do not be alarmed by the fact that the most recent `show` is much higher than the most recent `show with YAML cache`, even though we no longer support the YAML cache, YAML New World Order (!1257) [4] landed mid-week last week, so the average has been calculated with results from pre-!1257.

* Our peak memory usage for `bst show` once the artifact cache is hot is much higher than for all other shows.

-----

Here are the times and max-rss results for a `bst build` of base-files/base-files.bst in the Debian-like project [1]

┌─────┬─────────────────┬─────────────────┬───────────────────────────┐
│     │   4 Builders    │   8 Builders    │        12 Builders        │
├─────┼───────┬─────────┼───────┬─────────┼─────────────────┬─────────┤
│Date │Time   │Memory   │Time   │Memory   │Show once built  │Memory   │
│     │(s)    │(M)      │(s)    │(M)      │(s)              │(M)      │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│01/14│1383.67│1119     │1078.76│1119     │735.41           │1119     │
│01/20│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│01/21│561.12 │1117     │602.20 │1117     │680.10           │1117     │
│01/27│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│01/28│560.7  │1117     │601.57 │1117     │678.03           │1117     │
│02/03│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│02/04│373.29 │590      │425.09 │589      │500.71           │590      │
│02/10│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│02/11│308.75 │436      │357.69 │436      │412.90           │436      │
│02/17│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│02/18│292.78 │445      │350.70 │445      │398.30           │445      │
│02/24│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│02/25│342.28 │521      │387.28 │521      │436.51           │521      │
│03/03│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│03/04│354.64 │544      │400.00 │544      │441.61           │544      │
│03/10│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│03/11│352.95 │546      │395.50 │546      │442.52           │546      │
│03/17│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│03/18│348.53 │549      │383.93 │549      │431.18           │549      │
│03/24│       │         │       │         │                 │         │
├─────┼───────┼─────────┼───────┼─────────┼─────────────────┼─────────┤
│03/25│286.67 │435      │302.77 │435      │332.87           │435      │
│03/31│       │         │       │         │                 │         │
└─────┴───────┴─────────┴───────┴─────────┴─────────────────┴─────────┘

# Interesting observations

* Both peak memory usage and time have decreased dramatically since mid-Jan

* From late Feb/early March, our build times increased again by ~60 seconds and our peak memory by ~100M... Upon investigation, we found that !1164 [4]: Altering the behavior of the garbage collector - in favor of a faster show. was responsible for this.
    - There is an obvious spike which shows this in the 'all_build_time_results.png' attachment.
    - This also has a corresponding spike in memory usage, see the 'all_build_memory_results.png' attachment.
    - So the discussion needs to be had whether this commit is worth reverting.

* The most notable of observations is in mid-January, where our build times were very high, but, 4 builders were slower than 8 builders which were slower than 12 builders, as one might expect. Then, *all* of the build times were heavily reduced by Tristan's change to the scheduler: !1067 - Fix element processing order [4]. Now, rather interestingly, all times are significantly lower but 12 builders are consistently slower than 8 builders which are consistently slower than 4 builders. So the behavior has flipped, and is perhaps now behaving non-intuitively.
    - See the all_build_time_results.png attachment for clarity.
    - This suggests that if I tried 24 builders, for example, this patch would have likely made my build slower.

Given that this is a import project you would expect that if the scheduler got very fast then the projects time would eventually be bound by IO and so adding more works would just slow up the project as more and more IO tasks fought each other.

This being said, do we have an estimate of the number of copy operations that this project has to perform and how long they take so we can use this as a absolute limit on the performance of this project? I think that would help give us much more interesting context to the numbers.

Another important factor to consider is that we have been focusing on the scheduler, and rightly so. But do we have a measure of how much time the scheduler takes vs the elements? or indeed dose adding works affect the build time per element, i presume snake vis can tell use this? or dose it get confused by multiple cores?. on your 8 core box I would expect 4 builder to be faster than 8 only if the task was not IO bound and i would expect the 12 builder task to be slower as the builders fight for resources. however the big deltas suggest the scheduler is to blame rather than the workers. However as we have tools in bst-benchmarking like https://gitlab.com/BuildStream/benchmarks/merge_requests/14 that could very easily be tweaked to give the min max and avarage element build time it seem strange not to calculate this information when i would be so easy to do so. If anything it may well highlight how badly the scheduler performs, but it might also give some interesting less expected results that might be worth knowing.

For these very large jobs were the only chance of building them in a realistic time if they had realistic build times would be to use RE would we be better bench marking builds with a sleep but many many many workers? that way we are not confused by the IO performance and the total time should be dominated by (sleep time) * (elements in critical path) which will be more deterministic. would this give use a much more real world test case? I would think this would highlight the inadequacies of the scheduler even more starkly, or not.



-----

As usual, the notebook displaying the results for last week's data has been attached. Please see my previous email [5] if you're unable to open this. If you're comparing this to previous weeks, just bear in mind that the data has been regenerated. I've also attached a compressed tarball that contains all of the results. This might be of more interest.

I hope that you found this brief overview of these benchmarks helpful.
It's great to see that, in general, we seem to be heavily improving BuildStream's performance!

Thanks,
James

[0] https://gitlab.com/jennis/debian-stretch-bst
[1] https://gitlab.com/jennis/debian-stretch-bst/tree/jennis/use_remote_file
[2] https://gitlab.com/BuildStream/buildstream/merge_requests/1124
[3] https://gitlab.com/BuildStream/buildstream/merge_requests/1257
[4] https://gitlab.com/BuildStream/buildstream/merge_requests/1067
[5] https://mail.gnome.org/archives/buildstream-list/2019-February/msg00046.html

_______________________________________________
buildstream-list mailing list
buildstream-list gnome org
https://mail.gnome.org/mailman/listinfo/buildstream-list


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]