Re: Large base environments and delays during staging deps

From: Tristan Van Berkom <tristan vanberkom codethink co uk>
To: Angelos Evripiotis <angelos evripiotis gmail com>, buildstream-list gnome org
Subject: Re: Large base environments and delays during staging deps
Date: Wed, 25 Oct 2017 16:34:55 +0900

On Tue, 2017-10-24 at 21:00 +0100, Angelos Evripiotis wrote:

Hi!

I'm looking for some advice and a discussion about BuildStream performance with
large base environments.


I am staging ~1GB runtimes in *much* less time, which leads me to
ask... Are you at *least* using SSD ? Does anyone try to build huge
things on HDD anymore ?

Not that it's a reason to ignore performance altogether, but still disk
I/O will be a bottleneck for staging under any scenario.

I would suggest at least looking into fixing this:

   https://gitlab.com/BuildStream/buildstream/issues/82

Which will tremendously reduce the loop time I think for traversing
lists of files, before getting too far into that idea of stage-on-
demand fuse layer.

I'm reluctant to go to such extravagant lengths for optimizing builds,
there should be ways of doing things better without making buildstream
complex (I think if we stick to fuse, we will never become the hellish
crazy incomprehensible/unexplainable; ergo unmaintainable thing which
was scratchbox2 - but we *have* to be careful if we're going to steer
clear of that snake pit).

One thing I have to wonder; even with a system that has 10GB is dev
headers and static libraries and suchlike

  o Do you need *all* dependencies to be staged at once ?

  o Do you need every part of each dependency staged ?

I think you gloss over this already in your mail, which I read last
night (sort of running a race right now sorry)...

One thing to keep in mind, is that there will always be a point where
the workload is too much for the tooling, even if the tooling is
perfect - when you get into needing a ridiculously large amount of
dependencies staged at the same time, I have the feeling that making
BuildStream more complex to cater to that need, is working around a bug
that is in fact in the design of your dependency layout (which must be
a very, very complex dependency layout).

So, is it worth it to add burden to tooling which needs to be stable
and reliable, to work around inherent problems which already exist in
your dependency model ?

For instance, lets explore things like, which static libraries you need
for a given element, and ensuring they are neatly available as a build-
only dependency and only ever staged when you *need* to static link
against that thing ?


Sorry for long mail... no time to trim it down... just a few more
mountains to move this week...

Best Regards,
    -Tristan

I've been finding that for my use-case for BuildStream, I've ended up with a
fairly large 'platform.bst', which most other .bsts depend on. This contains
the basic operating system and necessary build tools.

This platform.bst has about 70k files in it, and is around 2GB in size. It
takes 30-40 seconds to stage for a build on my relatively powerful laptop. My
initial tests suggest this is due more to the number of files than the size.
It is very noticable when building small .bst's, and becomes the main cost.

This is noticable both when building and when invoking 'bst shell' to run
tests after building.

I haven't seen this mentioned as a concern in GitLab issues or on the mailing
list, so I thought I'd raise it here now. I plan to raise an issue on the repo
after we make some progress here.

I found a good write-up by Sam on using Alpine as a base in BuildStream, with
similar motivations and concerns [1].

Following these steps I've seen that staging times can become very reasonable
indeed. I'm also a fan of using Alpine in Docker, smaller is better in many
ways!

Unfortunately, for my use-case I don't think that I will be able to achieve
such a minimal image, in fact it seems destined to grow over time. Also, in a
number of cases I will need to pull in >100,000 supporting files to do builds.

I wonder the following:

o Might we be able to use something like OverlayFS to achieve better
staging performance? Given that this would need root permissions and
there are some incompatabilities [2], perhaps it could be an opt-in
thing. Sander hints at using fuse to achieve something similar in [3],
except for avoiding surplus rebuilds.

o It seems that for my use-case BuildStream will hardlink everything twice
during staging. First during ostree-checkout and then again into the
staging directory. During the second linking, we cater to splits and
determine if any overlaps have occurred. I haven't knowingly used the
splits feature yet, and I think we can compute overlaps another way. Do
you think it might be possible to optimize a bit here for non-split
cases?

o Can you think of other ways that we might be able to reduce staging
times?

Cheers!
Angelos

[1]: https://samthursfield.wordpress.com/2017/06/19/buildstream-and-host-tools/
[2]:
https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/#limitations-on-overlayfs-compatibility
[3]: https://gitlab.com/BuildStream/buildstream/issues/56
_______________________________________________
Buildstream-list mailing list
Buildstream-list gnome org
https://mail.gnome.org/mailman/listinfo/buildstream-list

Follow-Ups:
- Re: Large base environments and delays during staging deps
  - From: Angelos Evripiotis

References:
- Large base environments and delays during staging deps
  - From: Angelos Evripiotis

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]