[BuildStream] Optimising BuildStream - Semi-finals.



Hi all,

Some of you are aware, others may have noticed, others may be unaware, but I,
Ben Schubert, and a few others, have recently been working on optimising the
CPU and memory performance of BuildStream.

We've managed to make some significant gains, particularly in the realm of
memory consumption on large projects, and we've eliminated a large number of
the obvious bottlenecks in CPU performance.  Sadly we're starting to run out of
obvious low-hanging fruit so I thought I'd give a bit of a summary of potential
optimisations I see left (in case anyone else wants to take them on) and then a
summary of how I think we proceed from here.

My focus has been on CPU performance optimisation, so the following is all from
that point of view.  My hope is that Ben might be able to follow up with some
notes about memory optimisation later, or in a week or two.

Low hanging fruit remaining
----

This is not an exhaustive list, merely some of the things I see which are at
least 5% of runtime of `bst show debian-stack.bst` from the performance repo
I published earlier this year...

* `Element.dependencies()` - This is called in a number of places and somehow
  manages to be 10.22% of the runtime
  It figures a lot via `Element._update_state()`

* `_yaml.py::node_sanitize` - This is called from all over BuildStream when we
  need sanitised content, though its primary use is for cache keys (and I wonder
  if we might eliminate its use there by means of ujson's in-built sorting).
  If so, we could recoup around 6% of runtime

* `_yaml.py::node_get_project_path` - This is important to ensure that paths for
  things like import elements do not stray outside of a project's base directory.
  Sadly it consumes 5.5% of runtime.

Other avenues to proceed down
----

Sometimes what we have left needs thought about how to resolve by refactoring
or sometimes entirely rearchitecting parts of the codebase.  For example, I
don't think I can make `_yaml.py::node_chain_copy` be that much faster, but if
we can find a way to call it fewer times (currently we average about nine times
per element) we might be able to knock it down from its current 12% of total
runtime.

* `_yaml.py::node_chain_copy` - see above.  I wonder if we might find a cheaper
  abstraction for some of its uses.

* `_yamlcache.py` - We currently spend around 40 seconds unpickling YAML
  (nearly 7% total runtime).  I don't doubt that this is currently the cheapest
  way we can do this but perhaps there's a way to reduce the complexity of the
  data objects it has to unpickle?  Also 2.5% of runtime there is `persistent_load`
  where there might be some optimisation possible?

* `Element._update_state` is a well known "problem" in terms of cost.  I'm honestly
  not sure if there's much we can do to improve each individual part of it, but
  perhaps we can break it down and be smarter over when we do stuff to reduce the
  total amount of work we actually do, or even just when we do it.  This function
  represents 20% of the runtime of the test case.

Final words
----

I'm sure there's more that can be done, but this is where my thinking has led me so
far.  I'd love to see people contribute to this thread with ideas of further ways to
improve performance, or indeed memory consumption.  If anyone wants to take on
any of the above (particularly the remaining possibly low hanging fruit) then I'm
very happy to discuss and mentor.  Find me by email or as 'Kinnison' on IRC.

D.

-- 
Daniel Silverstone                          https://www.codethink.co.uk/
Solutions Architect               GPG 4096/R Key Id: 3CCE BABE 206C 3B69


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]