How do we store/install apps?

From: Alexander Larsson <alexl redhat com>
To: gnome-os-list <gnome-os-list gnome org>
Subject: How do we store/install apps?
Date: Fri, 10 Oct 2014 13:52:05 +0200
So, I've got some kind of initial runtime going, and its now time to
look at how we want to package these runtimes/apps. There are a few
requirements, and a bunch of nice to have.

This is what we absolutely require:

* Some kind of format for an application that is delivered over the
  network. This will contain metadata + content (a set of files).

* A format for the application when installed on a system. This has to
  be done in such a way that we can access content via the normal
  kernel fs syscalls.

There are also various things that would be nice to have. Depending on
the implementation we may be able to fulfil a subset of these.

* An efficient delta format for updates that share lots of data.

* Efficient multi-version storage in installed form. e.g. multiple
  installed versions of the same app would share disk/ram for shared
  files.

* Atomic install/uninstall. If the installed form of the app is a single
  file, then installation is a single write and remove is a single rm,
  and additionally if the app is in use then removing the file will keep
  in-use versions of the app running (since the inode is not freed until
  last use). This is a very useful feature because we never end up with
  half-installed apps.

* Install that does not require root. It would be nice if a user could just
  download an app and not require root to be able to run it.

* Minimal need for setuid helpers.

* Don't pass untrusted data to the kernel. For instance, it is risky
  to download raw filesystem data and then mount that, or mount a
  loopback file that the user can modify. The raw filesystem data is
  directly parsed by the kernel and weird data there can cause kernel
  panics.

* Lazy integrity checks. If we download a file we can always run a checksum
  on it to verify that the download is ok, and that nothing modified the
  data. However, this is costly to do up-front. Some filesystems allow
  checksum to happen as each file is read, which avoids a large check initially.

* Trusted signature checks. This is similar to the integrity checks, but even
  more powerful, as it verifies not only integrity, but also trust. If we
  enroll some kind of key in the bios, we can then securely inherit it all
  the way down to the app and have the kernel verify the file is trusted.
  This is not only more efficient than signature verification up-front, but
  also more secure, as it detects changes to the files post-install.

* Easy to export files to the host for integration. For instance, if
  an installed app includes a desktop file and icon we need to be able
  to read those files from the desktop. If this is an easy operation
  that doesn't require mounting a filesystem or parsing a specialized
  file format that is a plus.

For the network format we're not very constrained, so i think the
design here revolves mostly around how an app looks in installed form,
and the network form will follow naturally from that. So, what are the
alternatives here?

* Regular directory

  We require an install phase that explodes the app bundle into
  separate files.

  For multi-version storage we can use hardlinks which results in
  sharing both disk and page cache between versions at a file-granular
  level.

  Install and mounting is doable as non-root, doesn't pass untrusted
  data to the kernel and once done allows easy access to exported files.

  However, installation is not atomic, and there are no lazy checking
  of checksums or signatures.

* Download filesystem images and loopback mount

  In this model the app is a single file containing both metadata and
  a filesystem image. The filesystem image can be mounted as loopback
  directly from the app file, given just the offset and the length.

  Installation/Removal is atomic, so the app is never in a partially
  installed mode, and removal/replacement of the file doesn't bother
  actively running instances as the inode will not be removed until
  the final mount is removed.

  However, you have to be root to do the loopback mount (or a setuid
  helper), and loading an untrusted fs image into a kernel is pretty
  risky.

  In a naive approach there is no sharing of data between different
  installed versions of the same thing, but there are approaches
  like devicemapper or btrfs loopback images with snapshots that
  can give you disk-space sharing (but not page cache sharing).

  If the filesystem used supports integrity checking (like btrfs) that
  can be used.

  Exporting files to the host requires either that all installed
  apps are mounted, or that we explode file from the filesystem
  at install time.

* Create filesystem images locally

  This is similar to the above approach, except we create the filesystem
  from the data files at install time rather than using a pre-created
  filesystem image. This can be done easily in userspace with e.g.
  the squashfs tools to create a filesystem.

  This requires an extra step, but otoh it lowers the risk of passing
  untrusted data to the kernel. That said, it still requires trust
  in the filesystem creation tool, and that the user doesn't modify
  the filesystem image once created.

* btrfs volumes

  If the filesystem where we're installing the app is btrfs (either natively
  or via a loopback mounted file) we can install the apps in subvolumes.
  If the root is btrfs this is easy, but the loopback mounted case is pretty
  tricky, as it requires resizing the loopback when needed, etc.

  This is similar to exploding the files, but we can use the subvolume
  to share data between different versions of an app. This will share
  disk space, but not page cache.

  Removal of apps is atomic, although you can't remove a btrfs volume
  until its not mounted anymore (i.e. the app is not in use anymore).

  Also, btrfs volume removal requires root rights, as do mounting a
  loopback btrfs image so some level of setuid helper is needed.

  btrfs also has an interesting feature where you can btrfs-send a
  subvolume, which creates a file describing the diff from the parent
  volume and the subvolume. This can then be applied with
  btrfs-recieve which is a userspace app that applies a set of file
  ops to convert the parent to the new child state. This is imho, not
  super interesting for our usecase. Btrfs-send is rarely what you
  want anyway as a newly built version of an app is built from scratch
  anyway and not based on the previous version. One can use rsync to
  create a new subvolume based on the old one, but then you're using
  rsync, not btrfs-send to generate the diffs.

I personally very much appreciate the atomicity of a the loopback
mounted single-file app-bundle, but given what I've written above,
especially with the risks with pushing non-trustworthy data to the
kernel I feel that the simple approach of just exploding the files
is probably the best.

Even that approach has several options. For instance, one could have a
common repostory with files that have filenames based on (say) the
sha1 hash of the content, and then each app could hardlink from
those. Or one could have completely separate trees for each app which
are only hardlinked when we do an incremental update and keep the old
version. I'm probably favouring the later, as there is the remote risk
of hash collisions that could let you attach another app, and because
there is unlikely to be much sharing between non-related apps anyway,
so it won't give you much.

What do people think of the various approaches here? Did i miss any
interestion option? I will probably start looking into a more detailed
proposal for how an "explode-files-on-install" approach could work,
including how it looks when delivered as a file (full and
incremental).
Follow-Ups:
- Re: How do we store/install apps?
  - From: Greg KH
- Re: How do we store/install apps?
  - From: Colin Walters
- Re: How do we store/install apps?
  - From: Lennart Poettering
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]