gnome-continuous: speeding up "builddisks"



[ This mail is somewhat specific to gnome-continuous, but CC'ing the
  ostree mailing list, since the question of how to efficiently deploy
  ostree content into a VM is general ]

Speeding up builddisks
======================

The "builddisks" step of gnome-continuous is really slow. We optimize
it by reusing the last set of disk images and pulling the new
repository content into that, but it still takes a long time to create
the disks - about 20 minutes on build.gnome.org. I don't know how long
it takes to create them from scratch, but from local timings I'd
estimate it somewhere in the range of 100 minutes.

But yet it this is at least 10 times slower than it needs to be. The
basic problem is that the 'guestfs' fuse filesystem used to export the
image to the host so we can run ostree commands on it is very slow to
do the sort of things that 'ostree pull' and 'ostree deploy' do.

The basic idea behind the current approach, as Colin explained it to me
at one point, is that he wanted to avoid loopback mounts on the host
because they couldn't be safely allowed for unprivileged users, since
filesystem code has not been well audited against untrusted
filesystems. (Obviously, we trust the filesystem code to handle USB
thumbdrives but at least there we're restricting potential exploits to
local users.)

So this means that we should be accessing the image filesystem through
a VM. Which also has the advantage of being able to work directly with
a qcow, rather than having to do a conversion step later. If the
guestfs approach doesn't work, what options are there?

 1) Do the ostree operations outside the container on the host
    filesystem, then copy the entire results into the container.

    The main problem with this is one of filesystem ownership: as
    non-root, we can't create files with the ownership that will be
    needed inside the VM. It would be possible to enhance ostree to
    write ownership as xattrs, and then convert when copying into the
    VM, or to even have ostree directly stream a tarfile to be unpacked
    in the VM, but either would be significant work.

    A secondary issue is that this only applies to the initial
    creation of the VM, and doesn't allow for optimizing updates.

 2) Do the ostree operations inside a VM with the image mounted.

    The issue here is that we need an operating system with ostree
    in it to do the deployment, causing a bootstrapping issue - we
    need an operating system to make an operating system. Using
    a fixed non-GNOME-continuous operating system image for this
    would be possible, though with some potential for ostree version
    skew.

    But the approach that I prototyped out is creating a
    "bootstrap" image by checking out the destination ostree,
    copying it into the VM at / instead of as a proper ostree
    deployment, then doing the necessary fixups to make that boot.
    (E.g., adding a fstab that mounts a tmpfs on /var.) This is a 
    bit hacky, but not too bad.

    (Another approach would be to do this from the initramfs but
    getting ostree into the initramfs would require a gigantic
    pile of libraries.)

    Once you have that bootstrap image, it can pull from the repository
    on the host over HTTP, or by exporting the repository via 9p.

Timing for just the x86_64-runtime image. (This is testing on my Core
i5 laptop with the IO on an external SSD via USB 3.0)

 Current method:
  Initial creation: 40m
  Small update (modified 45 files): 7m50s

 Operations outside, copy in: (hacking file ownership issues)
  Initial creation: 1m41s
  Small update: N/A

 Operations inside, pull via 9p:
  Initial creation: 2m1s
  Small update: 37s

 Operations inside, pull via http (lighttpd, bridged networking):
  Initial creation: 2m0s
  Small update: 35s

Creating the bootstrap image from scratch takes 24s; there's no real
reason to recreate it except for changes to ostree or syslinux, though
not recreating at all would risk other aspects of it bit-rotting.

Other than the current method, everything is pretty close. Pulling via
9p is significantly easier than the HTTP version since it avoids
complications of setting up networking - I wouldn't expect the timings
to be competitive with user-space networking, though I didn't actually
measure that.

Does this seem like something that is worth pursuing? Do people have
any other ideas for improvements?

Thanks,
Owen


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]