Large base environments and delays during staging deps



Hi!

I'm looking for some advice and a discussion about BuildStream performance with
large base environments.

I've been finding that for my use-case for BuildStream, I've ended up with a
fairly large 'platform.bst', which most other .bsts depend on. This contains
the basic operating system and necessary build tools.

This platform.bst has about 70k files in it, and is around 2GB in size. It
takes 30-40 seconds to stage for a build on my relatively powerful laptop. My
initial tests suggest this is due more to the number of files than the size.
It is very noticable when building small .bst's, and becomes the main cost.

This is noticable both when building and when invoking 'bst shell' to run
tests after building.

I haven't seen this mentioned as a concern in GitLab issues or on the mailing
list, so I thought I'd raise it here now. I plan to raise an issue on the repo
after we make some progress here.

I found a good write-up by Sam on using Alpine as a base in BuildStream, with
similar motivations and concerns [1].

Following these steps I've seen that staging times can become very reasonable
indeed. I'm also a fan of using Alpine in Docker, smaller is better in many
ways!

Unfortunately, for my use-case I don't think that I will be able to achieve
such a minimal image, in fact it seems destined to grow over time. Also, in a
number of cases I will need to pull in >100,000 supporting files to do builds.

I wonder the following:

    o Might we be able to use something like OverlayFS to achieve better
      staging performance? Given that this would need root permissions and
      there are some incompatabilities [2], perhaps it could be an opt-in
      thing. Sander hints at using fuse to achieve something similar in [3],
      except for avoiding surplus rebuilds.

    o It seems that for my use-case BuildStream will hardlink everything twice
      during staging. First during ostree-checkout and then again into the
      staging directory. During the second linking, we cater to splits and
      determine if any overlaps have occurred. I haven't knowingly used the
      splits feature yet, and I think we can compute overlaps another way. Do
      you think it might be possible to optimize a bit here for non-split
      cases?

    o Can you think of other ways that we might be able to reduce staging
      times?

Cheers!
Angelos

[1]: https://samthursfield.wordpress.com/2017/06/19/buildstream-and-host-tools/
[2]: 
https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/#limitations-on-overlayfs-compatibility
[3]: https://gitlab.com/BuildStream/buildstream/issues/56


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]