Re: How do we store/install apps?



On Fri, 10.10.14 13:52, Alexander Larsson (alexl redhat com) wrote:

So, I've got some kind of initial runtime going, and its now time to
look at how we want to package these runtimes/apps. There are a few
requirements, and a bunch of nice to have.

This is what we absolutely require:

* Some kind of format for an application that is delivered over the
  network. This will contain metadata + content (a set of files).

* A format for the application when installed on a system. This has to
  be done in such a way that we can access content via the normal
  kernel fs syscalls.

I am pretty sure these two formats need to be very close to each
other, otherwise all the stuff like signatures that checked on access
area really hard to do.

Also note, that I want to keep an eye on the big picture. I want the
same delivery for the OS itself, as well as OS containers. To me the
delivery of apps and their runtimes/frameworks is just one usecase of
the scheme..

* Install that does not require root. It would be nice if a user could just
  download an app and not require root to be able to run it.

I am not convinced really that this is necessary, nor even
desirable. I think installation of apps to normal users should be
permitted, but I am also very sure we should not completely open this
up. More specifically, I think having some PolicyKit-style check when
an app is installed or removed is a *good* thing.

Moreover, we need to think about updating schemes as well, and I think
those are better done in a single system-service than individually by
unpriviliged user code (which would be really nasty on multi-user
systems with many users). Hence: app
installation/uninstallation/update under priviliged control is a good
thing, not a bad thing.

* Minimal need for setuid helpers.

Well, true. But I think having a polkit-enabled priviliged service is
a much better design than setuid helpers anyway, at least for the
installation. For execution I fear some minimal setuid code is not
avoidable though.

* Don't pass untrusted data to the kernel. For instance, it is risky
  to download raw filesystem data and then mount that, or mount a
  loopback file that the user can modify. The raw filesystem data is
  directly parsed by the kernel and weird data there can cause kernel
  panics.

Well, this is unavoidable if we ever want to allow fully signed
systems. I mean, again, I would not isolate the problem of app images
so much from the problem of OS images. I want to solve this at the
same time, as the problems with verification, distribution and so on
are pretty much the same. 

I also really don't believe that the kernel would be any worse with
verifying structural integrity of images than userspace code...

* Regular directory

  We require an install phase that explodes the app bundle into
  separate files.

  For multi-version storage we can use hardlinks which results in
  sharing both disk and page cache between versions at a file-granular
  level.

  Install and mounting is doable as non-root, doesn't pass untrusted
  data to the kernel and once done allows easy access to exported files.

  However, installation is not atomic, and there are no lazy checking
  of checksums or signatures.

Also, the hardlink farms are certainly not pretty.

* Download filesystem images and loopback mount

  In this model the app is a single file containing both metadata and
  a filesystem image. The filesystem image can be mounted as loopback
  directly from the app file, given just the offset and the length.

  Installation/Removal is atomic, so the app is never in a partially
  installed mode, and removal/replacement of the file doesn't bother
  actively running instances as the inode will not be removed until
  the final mount is removed.

  However, you have to be root to do the loopback mount (or a setuid
  helper), and loading an untrusted fs image into a kernel is pretty
  risky.

  In a naive approach there is no sharing of data between different
  installed versions of the same thing, but there are approaches
  like devicemapper or btrfs loopback images with snapshots that
  can give you disk-space sharing (but not page cache sharing).

Oh god, devicemapper!

* btrfs volumes

  If the filesystem where we're installing the app is btrfs (either natively
  or via a loopback mounted file) we can install the apps in subvolumes.
  If the root is btrfs this is easy, but the loopback mounted case is pretty
  tricky, as it requires resizing the loopback when needed, etc.

  This is similar to exploding the files, but we can use the subvolume
  to share data between different versions of an app. This will share
  disk space, but not page cache.

  Removal of apps is atomic, although you can't remove a btrfs volume
  until its not mounted anymore (i.e. the app is not in use anymore).

  Also, btrfs volume removal requires root rights, as do mounting a
  loopback btrfs image so some level of setuid helper is needed.

  btrfs also has an interesting feature where you can btrfs-send a
  subvolume, which creates a file describing the diff from the parent
  volume and the subvolume. This can then be applied with
  btrfs-recieve which is a userspace app that applies a set of file
  ops to convert the parent to the new child state. This is imho, not
  super interesting for our usecase. Btrfs-send is rarely what you
  want anyway as a newly built version of an app is built from scratch
  anyway and not based on the previous version. One can use rsync to
  create a new subvolume based on the old one, but then you're using
  rsync, not btrfs-send to generate the diffs.

I absolutely disagree. Kay and I have been discussing this stuff with
the btrfs folks. The thing is that we want the signatures for the
files be transferred in-line. While the signature stuff doesn't exist
right now for btrfs they guys working on it are ensuring that the
signatures can be serialized from btrfs as part of the btrfs send/recv
image, and then deserialized again on the destination, while staying
fully valid.

Harald has been playing around with some build logic that makes sure
that rebuilt app updates are efficiently shipped as btrfs send/recv,
with stable inode numbers and stuff.

You know, this is explicitly something where we shouldn't reinvent the
wheel. It's quite frankly crazy to come up with a new serialization
format, that contains per-file verification data, that then somehow
can be deserialized on some destination system again back into the fs
layer...

What do people think of the various approaches here? Did i miss any
interestion option? I will probably start looking into a more detailed
proposal for how an "explode-files-on-install" approach could work,
including how it looks when delivered as a file (full and
incremental).

Keeping the big picture in mind I don't think any but the btrfs
approach (including btrfs send/recv) even comes close to what we want.

btrfs will not deliver from day #1 what we want (the signature stuff
is currently vaporware), but the path towards it is clear and somewhat
clean, and the guys hacking on it are friendly and helpful.

I know that the Red Hat fs crew hates btrfs like it was the devil, and
loves LVM/DM like it was a healthy project. But yuck, just yuck!

Lennart

-- 
Lennart Poettering, Red Hat


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]