Re: [BuildStream] Speeding up buildstream init



On 2018-08-16 13:50, Sander Striker wrote:
Hi Jonathan,

First of all, thanks for the detailed write up.

Loading the pipeline - parsing yaml from files
==============================================

Problem
-------

The yaml parser we use (ruamel) is slow, but we have been unable to
find
a faster one that is capable of doing round-trips (i.e. read yaml
from
file, make changes to it, write out yaml with mostly the same
structure).

Do we need the round tripping in the common case (bst build).  The
reason I ask is because I assume the number of reads is going to well
outweigh the number of writes.  And only in the case of writes do we
want to do the actual round tripping.  What are the cases other than
track where we are doing round tripping?

Using two ways to deal with the yaml files may introduce unwanted
complexity; I'm trying to understand how much of that we would
actually face.  Or whether it would be relatively contained.
However, going into detail on that will only make sense if we have a
yaml parser that is significantly faster than ruamel.

We spend a lot of time loading yaml. In the simple pipeline of
10'000
elements that took 262 seconds, 51 of those were spent loading yaml.

That's a lot (~19%).  What happens to that time with a pipeline that
is an order of magnitude bigger?

I tried with 100'000 elements, but my answer was "it takes up to 12.5GB, computer locks up and the OOM killer goes wild" :)
I'm going to have to try it on some better hardware first.


Extracting the elements' environment, variables, public data and
sandbox
config

===============================================================================

Problem
-------

This is the next-largest time-sink, and in a simple pipeline of
10'000
elements that took 262 seconds, 44 of those were spent on extracting

those fields for the element.

Unfortunately, there doesn't seem to be a simple solution - the
majority
of it appears to be string and dict manipulation, and caching the
result
would be complicated by the sheer number of ways that it can be
affected.

I think that in the case of an edit-compile cycle, the structure of
the .bsts nor the configuration in project.conf will vary much.  The
churn will be in the code (a workspaced element).  I think that even
caching for the case where nothing has changed would result in a
benefit, as consecutive runs of bst build would start up quickly.  Am
I missing something there?


Yep, it'll help, and in the absence of finding a better approach I'll be figuring out how to do it in the next iteration.

--
Jonathan Maw, Software Engineer, Codethink Ltd.
Codethink privacy policy: https://www.codethink.co.uk/privacy.html


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]