Re: Fwd: wip/new-model review

From: "Jasper St. Pierre" <jstpierre mecheye net>
To: Colin Walters <walters verbum org>
Cc: ostree-list gnome org
Subject: Re: Fwd: wip/new-model review
Date: Sat, 20 Jul 2013 14:36:23 -0400

On Fri, Jul 19, 2013 at 1:14 PM, Colin Walters <walters verbum org> wrote:

On Thu, 2013-07-18 at 15:43 -0400, Jasper St. Pierre wrote:

>
> I think this is wrong. Every single task, including from resolve to
> zdiff, knows about a general concept of a "build".

Ok, that's possible.

> The thing that I want to do is remove this central lookaside of
> "builds" that everything pulls from, and require the developer
> commands to specify which build they're working on.

I guess I'd envisoned entirely separate directories for multiple builds.
Something like:

~/builds/ostbuild/buildmaster/builds/20130719.0/
~/builds/ostbuild/buildmaster/builds/20130719.1/
~/builds/ostbuild/3.10/builds/20130710.5/
~/builds/ostbuild/tmp-bug728719/builds/20130719.0

I want to clarify again: the concept of a "build" is a well-defined directory with very specific files in it (the most important one being the snapshot). In the implementation I'm using, the autobuilder names its builds based on the time and date, like "20130719.5", but that's just a naming scheme the autobuilder uses.

We can certainly make the autobuilder or something else have layers on top of this so it can track multiple branches at a time and put them in whatever directory structure we want.

And then we have one explicit toplevel cache directory:
~/builds/ostbuild-cache/

And one explicit toplevel source directory:
~/builds/ostbuild-src

This is like how OpenEmbedded's bitbake works; the cache is SSTATE_DIR,
and the src is DL_DIR.

One neat thing that bitbake can do (as well as Baserock) is pull cached
binaries from a network server.

Yeah, "convention vs. configuration" and all.

Now at least bitbake will default to using SSTATE_DIR/DL_DIR inside your
build directory. Maybe this is wrong through, and we should emphasize
branching a bit more. So the developer flow could look like this:

$ ostbuild make-build-directory ~/build/ostbuild
# makes ~/build/ostbuild/{src,cache,repo}
$ cd ~/build/ostbuild
$ ostbuild make-build buildmaster
# Copies in or symlinks manifest.json

Then as much as you want:

$ cd ~/build/ostbuild/buildmaster
$ ostbuild build
# Reads/writes to both ~/build/ostbuild/{src,cache,repo} as well
# as ~/build/ostbuild/buildmaster.
$ <hack hack hask>
repeat

The fundamental difference between this and your proposal is that you'd
have me type:

$ ostbuild build --from=buildmaster
$ <hack hack hack>
repeat.

Uh, well, not really. The build process takes inputs: snapshot.json, last-build/, src/, and produces binaries which are put into repo/, along with log files, etc. I see "last-build/" as an optimization of not wasting CPU for things we've already compiled. It's not strictly necessary, it will just take a lot longer if it's missing.

"--from=" was just one way I'd imagine you specifying what last-build/ points to. That was just one suggestion for high-level UI. Since I think we're focusing on this way too much, I'm going to refrain from using the "ostree" command in my command lines from here on out, and simply focus on the low-level way of creating a build. Of course there should be high-level UI for this, but I need to make sure the core concepts come across correctly.

If I had a completed build, let's say gnome-3.9.4/, and I wanted to modify gnome-settings-daemon to fix a bug, I'd create a new build. At a low level, this would be:

# Here's our "build" directory. Doesn't matter where we put it.

$ mkdir gnome-3.9.4-fixed/

# So the build task knows what to build

$ cp snapshot-with-fixed-gnome-settings-daemon.json gnome-3.9.4-fixed/snapshot.json

# So we don't rebuild the entire world, and only rebuild what's changed

$ ln -s gnome-3.9.4/ gnome-3.9.4/last-build/

# Our source repos, all up to date and resolved

$ ln -s /path/to/src/repos/ gnome-3.9.4/src/

# Our ostree repo, the "output" of the build process

$ ln -s /path/to/ostree/repo/ gnome-3.9.4/repo/

And that's it. Now, I can point the "build" task in that directory, and then it will run "make" and do everything like that, and then eventually write to the repo, and write some log files as well.

I want to make sure you understand that. The autobuilder, of course, does all of this automatically when it sets up a build, and we can make the high-level UI do whatever we want. We can pass it in on the command line, work from a cwd, or even enforce a directory structure.

It might seem a bit insensitive to say this, but I really don't care what the high-level UI is. We can figure out if we want a git-like one based off of the CWD, or one based on command line arguments, or config files, or envvars later.

Getting the right low-level plumbing is important to me.

But typing the --from seems like it'd get annoying fast...

> I want to basically take "build management" out of the tasks and into
> the invoker -- in this case, the autobuilder has its own idea of how
> to manage these builds (with the builds/, tasks/, results/ structure
> we talked about in-person, which I should probably write up somewhere)

Yeah, need that written down. Specifically what happens to say the
repo? If we split that into separate directories, then we lose the
advantage of content-addressed storage between the builds, and branches
would no longer be cheap.

This directory structure is only about how the autobuilder manages and arranges its builds. It is not relevant to the build process at the core, which is based around the builds themselves. It's possible we might want to steal some of this for a local process, but again, we can tack on high-level UI and make those decisions on if we want to enforce a directory structure like this for build management from the high-level later.

Anyway, this directory structure

* builds/ contains all the builds the autobuilder will ever do

Browsing from the web, you'll see a directory that has:

   20130720.2/
   20130720.1/
   20130720.0/
   20130719.1/
   20130719.0/
   20130718.2/
   20130718.1/
   20130718.0/

... all the way back through time.

This is it. These are all that we need to build. You could, if you wanted to, build one of these locally.

$ wget http://ostree.gnome.org/buildmaster/builds/20120506.2/snapshot.json

$ mkdir my-gnome-build/

$ mv snapshot.json my-gnome-build/

# Assuming src repos are resolved, then just do the steps above

# to symlink the inputs into the build dir, and then go for it.

Of course, you'd have to rebuild the world if you can't use the cached builds, and you might have to fetch the src/ builds, but you should (in an ideal world) end up with the same build the autobuilder created.

Right now, we contain the last 5 successful and last 5 failed results for each task.

I want to change that so that builds/ contains an archive of builds. We don't need to keep disk images or screenshots or anything that's "huge", but I think keeping around snapshots and smoke/integration test results are important for archival purposes.

* tasks/ contains symlinks to the current build for each task.

Browsing from the web, you'll see a directory that has:

build/ -> ../builds/20130720.6/

builddisks/ -> ../builds/20130720.2/

integrationtest/ -> ../builds/20130720.2/

smoketest/ -> ../builds/20130720.2/

...

When a task starts, it immediately resolves the symlink for its own task using realname and works on that build exclusively. When a task finishes, the taskmaster notices and updates the next task's working directory. In this example, build finished, so we update builddisks/

# Build finished, so update the next task's working directory

$ ln -f -s $(readlink -f build/) builddisks/

This is used to keep each task informed about what it's supposed to be building. Let's imagine a world where builddisks takes three hours, and a build takes half and hour. In this world, when a builddisks starts, six more builds may finish before builddisks finishes.

We don't want builddisks to get backed up, so whenever build finishes, we swap out the build, and thus the intermediate builds become *abandoned*. They'll clutter up the build server, but I don't think that's too important.

One of my immediate TODOs is to have a status.json for each build, which contains what status the overall build is (successful, failed, abandoned, in-progress), and the status for each task.

For simplicity of web browsing, we should probably have an empty file called something like "STATUS\: ABANDONED" so it's easy to tell at a glance what a build is :)

* results/ is just convenience for web browsing, and for API querying. It's not at all necessary for proper building. My initial branch doesn't contain this at all. We can change this as we want, too.

Browsing from the web, you'll see a directory that has:

successful/

index.json

last-completed/ -> ../../builds/20130720.5/

20130720.5/ -> ../../builds/20130720.5/

        20130720.2/ -> ../../builds/20130720.2/
        ...
    failed/

index.json

last-completed/ -> ../../builds/20130720.4/

20130720.4/ -> ../../builds/20130720.4/

20130720.3/ -> ../../builds/20130720.3/
...

current/ -> ../../builds/20130720.6/

last-completed/ -> ../../builds/20130720.5/

We can put whatever we imagine in here. It's just symlinks to the builds/ directory. I'm imagining that the web UI would simply query "last-completed/status.json" to know whether the last build was successful or failed, and how it failed.

> The issue with "resolve" is that it's not really part of a build. The
> only essential part of a build is the snapshot, and in the new model,
> it's the "id" of a build. If you have the snapshot, you have a
> complete description of a build, and can remake the rest of everything
> at any time. (ignoring things like network status, repositories moved,
> build dates in binaries, etc.)

Right.

> Right now, the autobuilder creates a skeleton dir, runs resolve, and
> if resolve says nothing changed, we remove the directory and don't
> bump the build number.

I definitely like that.

> Additionally, for disk space / network optimization reasons, we keep
> the git repos outside of a build, and symlink them into the build when
> the autobuilder creates it, along with the other build skeleton files
> (last build, ostree repo).

Ok right, so I guess what you're saying here is things are shared, but
it's kind of hidden because we symlink them.

But the problem with that is it doesn't solve the concurrency issue.
Like what happens if two "builddisks" try to run at the same time and
follow the symlink to the last-build qcow2 disk? Then you have two
kernels writing to the same ext4 filesystem, and disaster will result.

Well, the issue is that the disk image is shared across builds, as an "optimization" to prevent us from copying it. It's impossible to use the same disk image for both this build and that build and have them both produce correct results.

That said, we might be able to do some clever tricks. First, we need to understand that an ostree repo, like git, is content-addressed. If we ignore the "checkout" of the filesystem, then we're guaranteed that two filepaths never point to different locations.

This means that we can have one service that can take writes to a repo, and import it into a disk image. When we prune the repo, we prune the disk image, etc.

When we want to boot the disk image, we use libguestfs and make our ostree checkouts. Since this is entirely hardlinks, I assume this is very, very cheap, and will allow us to use qcow2 effectively.

Now we can boot a system, and then throw away the hardlinks overlay and test results overlays afterwards.

This means we can have concurrent writes, simply by serializing them through one writer which simply takes commands to synchronize the repo.

(I sort of wonder if we can't push this further, and set up a jail for qemu where our native filesystem has read-only hardlinks to the repo contents on our own disk, and then we don't have to create a disk image at *all*)

Now OSTree is robust against both concurrent reading and concurrent
"adders". However, to delete data from a repo (ostree prune), all
readers and writers need to be offline at the moment.

To handle this correctly, something like:

$ ostbuild make-build bug741328

would look in ~/ostbuild/cache/disks, lock out any readers, and do a
*copy* of the file. Likewise we'd need to atomically "cp -al" the
bulidroots and other cache data.

Or hmm. Maybe we have a model where starting a build "steals" disks
from the cache, and then puts them back when done.

> I thought passing "last-build" in was fine, but for the other two, we
> should probably just load everything from a config file, similar to
> jhbuildrc (~/.ostbuildrc?)

Mmmm...that's the way it used to be, since I was modeling ostbuild after
jhbuild, but I switched to the bitbake model of "build directory is
context". It's really been a lot saner.

Sure, we can make that decision too.

--
Jasper

Follow-Ups:
- Re: Fwd: wip/new-model review
  - From: Colin Walters

References:
- wip/new-model review
  - From: Colin Walters
- Fwd: wip/new-model review
  - From: Jasper St. Pierre
- Re: Fwd: wip/new-model review
  - From: Colin Walters
- Re: Fwd: wip/new-model review
  - From: Jasper St. Pierre
- Re: Fwd: wip/new-model review
  - From: Colin Walters
- Re: Fwd: wip/new-model review
  - From: Jasper St. Pierre
- Re: Fwd: wip/new-model review
  - From: Colin Walters

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]