Re: Improving staging performance



On Wed, 2017-11-15 at 13:20 +0000, Angelos Evripiotis wrote:
Hello!

In this message I hope to present a convincing case that staging delay is
significant for a reasonable amount of files, and that for rebuilds it can be
drastically reduced.

I ran an experiment locally with an alpine sysroot and 100k files. I think this
is a reasonable amount of files because it's roughly the same number I work
with on another project, which is DPKG-based.

Similar to the results in previous posts on performance, I see that staging
takes around 20 seconds. This is a significant speed bump in the workflow.

Have we tried at least caching the results of the filesystem walks
locally and reusing the cached data structure to stage the hard links ?


I found that with a trivial hack, it's possible to cache the results of
staging. This means that for rebuilds, or builds that use the same
dependencies, we don't pay the cost of linking files into the sandbox. With the
hack applied, the trivial re-build feels near-instant.

I dont like the idea of leaving old staged directories around to get
stale, even without intentionally leaving these around; I have run into
the "too many hardlinks" scenario multiple times when I forget to
remove the failed build directories for around a month and build
regularly, much cleaner to cleanup asap and as much as possible.

Also, how often can it really happen in real life that we can correctly
use staged dependencies from a previous round ?

Only for the build sandbox of the most recently changed element ?

And all the reverse dependencies of the most recently changed element
must anyway create fresh sandboxes anyway, so this seems like a lot of
contorting of the code base for a single case which does not happen
often.


I think a lot of work would need to be done to make the approach
production-ready.

My hope is that this message will start a conversation to explore the
possibilities for speeding up staging :)

I dont really have the space for it right now but I hope people will
join in and keep this conversation alive.

I think Sander's idea of stage-on-demand fuse layer might be the most
productive so far, but it might take some "actual real work" to get
that done, and to get it optimized to a level where the python layer
does not add significant enough overhead to cancel out the
optimization.

Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]