Re: [BuildStream] Optimising BuildStream - Semi-finals.



On Fri, Feb 15, 2019 at 13:29:22 +0000, Daniel Silverstone via BuildStream-list wrote:
Low hanging fruit remaining
----

Further micro-optimisations have been looked at:

* Possible microoptimisation - `Loader._valid_chars_name` might be faster
  if a precompiled regex is used? (perhaps 2.5% of runtime)

* `OptionPool.process_node` might have a bunch of isinstance to clear out?

* Ditto `Includes._process_value` (maybe 0.5% ?)

* `_yamlcache.py::persistent_load` if we can not use path join, it might be
  up to 1% faster.

* `_yamlcache.py::_get_filepath` uses `relpath` which we might be able to
  remove if we're careful for another 1%.

* `_variables.py::_expand_expstr` if there's a faster way to combine the string
  elements then that'd be nice.  String joining here is nearly 5% of the
  runtime of `bst show`

* During `Element.dependencies` we end up hashing Enum instances a *LOT* - if
  there were a way to reduce that we could reclaim another 2% of runtime
  perhaps.

If we got all the speedups there that'd be another 10 to 12 percent of
pre-scheduler runtime potentially.


Other avenues to proceed down
----
* `_yaml.py::node_chain_copy` - see above.  I wonder if we might find a cheaper
  abstraction for some of its uses.

This is almost entirely provenance cloning which costs us.  So disposing of the
creation of provenance data in the normal case of not needing to report an
error might be sufficient.
 
* `_yamlcache.py` - We currently spend around 40 seconds unpickling YAML
  (nearly 7% total runtime).  I don't doubt that this is currently the cheapest
  way we can do this but perhaps there's a way to reduce the complexity of the
  data objects it has to unpickle?  Also 2.5% of runtime there is `persistent_load`
  where there might be some optimisation possible?

I looked hard at this and got a chunk of time back (nowhere near it all) by disabling
the GC (See MR 1164).


In addition, James Ennis and I, building on work started by Gökçen and Tristan
(M), are working on possible ways to improve YAML handling even further,
perhaps even eliminating provenance entirely in the general case, though this
will take a lot more thought and design consideration.  I hope we'll be able to
post about this next week some time.

Again, if anyone wants to tackle any of the smaller optimisations listed above,
please feel free to rope me into helping / reviewing by contacting me by email or
on IRC.

D.

-- 
Daniel Silverstone                          https://www.codethink.co.uk/
Solutions Architect               GPG 4096/R Key Id: 3CCE BABE 206C 3B69


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]