Re: BuildStream, Distributed Builds, Bazel Build Farm



Hi Sander !

This is indeed exciting. I will have to set some time aside to properly
digest these linked papers in advance of FOSDEM.

For a preliminary round, I wanted to share some of my thoughts in
advance, not particularly on the technical details which I have yet to
explore in depth, but mostly an advance taste of the nature of points I
want to keep in consideration throughout this endeavor - in the hope
this can help us be productive and start off on the right foot.


On Wed, 2018-01-17 at 13:06 +0000, Sander Striker wrote:
Hi,

I've been following both BuildStream and Bazel, and am seeing some
potential overlaps where we could look into leveraging the collective
experience and brain power, rather than solve it individually.

I'm interested in distributed builds in BuildStream.  Recent
developments around Bazel (http://bazel.build/), specifically Build
Farm (https://github.com/bazelbuild/bazel-buildfarm/) lead me to
believe there are some common patterns.

So, distributed builds has been a recurring and hot topic - and there
are some mixed opinions floating around, I think Emmet is usually in
favor of leveraging something external to do it, while I have been in
favor of (mostly) rolling our own.

My objective is to find the best middle ground which doesn't result in
a needlessly complex system - where complexity here is measured as the
total count of possible points of failure, not as lines of code in the
BuildStream repo. I think you and I already mostly agree on this.

I also like the way you have raised this; there is a clearly defined
goal which we can design a solution for while carefully weighing the
cost against the benefits of various approaches.

At the risk of getting too technical too soon, I'll try to lay out some
of my initial thoughts I've had so far, so we have a head start - but
before that I should outline a very simplistic and abstract draft for
context:

  BuildStream already has an artifact cache server and a scheduler.

  The least complex machinery I can think of is to just allow the
  BuildStream CLI to act both as a master and as a slave; where slaves
  can be run remotely with permission to access the same artifact
  cache. The master need only assert that dependencies of a given
  element be satisfied in the remote cache atomically, before
  dispatching a job to an available slave.

  Slave implementation is as simple as running `bst` to build the
  desired element, seeing as dependencies are cached, no time is
  wasted downloading sources that are unneeded. The logging needs
  to forward messages back to the master in order to aggregate a
  nice master session log.

  Individual logs for failed builds is more tricky, *but* we
  already have issue #76 on the roadmap; "Cache Failed Builds",
  which when implemented, then makes this detail simple again
  anyway.

Seeing as scheduling builds on remote machines is staggeringly similar
to what we already do with local processes, you can understand my
reluctance to make the whole process doubly complex by using any off
the shelf turn key solution (something I think Emmet has been a
proponent of thus far).

That said, other solutions which externalize the problem of distributed
building entirely can also be interesting as a way to setup barriers
against feature creep - we should still keep both avenues in mind.


Now for some sample thoughts to chew on, in advance of a more thorough
conversation:


  Remote execution API
  ~~~~~~~~~~~~~~~~~~~~
  Here is something that we clearly lack in the above picture,
  so this *seems* to be a clear win should this be a suitable
  external dependency (see further below for an elaboration on
  this).

  In general, I am optimistic that the software you propose will
  probably check these boxes nicely, and feel that it's desirable
  to use something external for this.


  Content Addressable Storage
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~
  Here we already have an artifact cache and server - by all rational
  logic I can think of, I would avoid adding yet another one unless
  it were to either satisfy some real use case - or if it runs well
  on multiple platforms, potentially allowing us to replace the
  multiple artifact caches we have with a single one.

  Artifact storage facility is an entirely private detail of
  BuildStream and can have it's implementation swapped without issue.

  Of course similar concerns regarding suitability of dependency
  need to be examined as with anything external.

  There may be some benefit to adopting this part, but it should
  be demonstrated and in this case it should probably supersede
  something that we already have.

  This could be either of the following, but ideally all three:

    o Local ostree cache for Linux.

    o Local tarball cache for Non-Linux fallback platform.

    o Artifact cache remote server. Needless to say, if this is
      a remote service thing, it should still have a compatible license
      such that anyone can build it and install it on their own
      hardware.


Dependency Suitability
~~~~~~~~~~~~~~~~~~~~~~
In the above I referred vaguely to dependency suitability, to clarify,
some aspects I would consider to weigh various degrees of suitability
include:

  o LGPLv2 Compatible License.
  o Has reasonably small set of dependencies itself; contributing to
    overall repeatability of your setup on a modern distro in 10
    years time.
  o Has a very stable API, being a responsible citizen towards
    it's downstream consumers.
  o Is an isolated software which does one thing well.
  o Requires no additional setup by the user (BuildStream can
    configure it using API, but the user need only install and
    configure one thing).


Cheers,
    -Tristan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]