Re: linux-user-chroot: Mounting more stuff inside the chroot

On 04/06/15 21:42, Colin Walters wrote:
On the linux-user-chroot side of things, this means making it possible
to mount more stuff inside the chroot. Previously we would call 'mount'
in the build tool, but that obviously requires 'root'. My first step was
to allow mounting a tmpfs at /dev/shm. Attached is a patch that adds a
--mount-tmpfs option, similar to --mount-proc. you gain anything here over:
tempdir=$(mktemp -d /run/user/$(id -u)/l-u-c.shm.XXXXXXXXXX)
linux-user-chroot --mount-bind $(tempdir) /dev/shm

The advantages of this approach are that the storage space is under
space already writable by the user.  I'm not aware right now
of any support for quotas on tmpfs, but were it to be implemented,
it would transparently Just Work with the logged in session, whereas
if l-u-c mounted a tmpfs, the accounting to the uid would be lost.

We could take the code to do the mkdtemp inside l-u-c, but since
it can be done outside of it, it seems like something better to just

Good idea, I hadn't thought of that. I'm actually wrapping linux-user-chroot in a little Python library called sandboxlib (I know, "solving" problems by adding new abstraction layers...), so can put the this mktemp code in there.

One option is to bind-mount /dev from the host, which is already
possible with --mount-bind. This seems like a hole in build
reproducibility, though: a ./configure script could conceivably change
its behaviour based on the presence of a file in /dev, or something. I
really want to control what is in /dev more tightly.

Continuous uses the host's /dev; I'm a bit sceptical that there's
any (reasonable) program that would be attempting to access the host physical
devices to *build*.  Aside from /dev/{u,}random of course, which
you already have to think about how to handle for reproducibility.

The mindset we've always had for reproducibility is "everything must always be reproducible", i.e. try to force it for the unreasonable programs as well. But it's only now that we've actually started to focus on having things built bit-for-bit the same each time. It seems to me that unless we run each build in an identical VM where the clock doesn't tick, there'll always be *some* way of introducing uncertainly into a build... but we'll see where our current efforts end up :)

One idea I had was to add a new option like '--standard-mounts' or
'--mount-systemd-container-interface' to linux-user-chroot that would do
exactly what systemd-nspawn does. It seems good to be aligned with an
existing spec.

The above said, I'd definitely take a patch for the above, it should
be pretty simple and safe.  I *do* like the idea that I could cut off
access to e.g. the GPU from software I'm building, even if udev
has granted my id access to `/dev/dri/card0`.

OK, good to know. I'll see if I can sell this as a good idea to everyone else!

I'd like to hear thoughts on the changes I've proposed above, especially
which changes you think would be generally useful if added to
linux-user-chroot, and which changes sound like hacks that we might as
well keep to ourselves.

Though before we add *too* much more we should probably look
around and see whether there's any other projects that are doing
something similar that we could use instead.

There's a ton of container tooling out there.

Most other container frameworks like systemd-nspawn, Docker,
rocket all support running code as uid 0, which dramatically
increases the attack surface. and xdg-app are conceptually closer to l-u-c,
although they both do a lot in particular
has a ton of functionality, and it's not really designed for
use by build systems.

One thing both of those do is use seccomp, which would also
make a lot of sense for l-u-c.

I should probably talk to Alex about sharing some code with
xdg-app for things like the seccomp blacklist.

So...I guess the conclusion is let's keep hacking on l-u-c but
keep pretty conservative about feature additions?

Great, I agree with that.

As I said above I've moved our existing sandboxing code into a standalone Python library: <>. This adds a new level of abstraction, but that means we can move away from linux-user-chroot if something better appears (it also means we can fall back to 'chroot' on non-Linux). So I'll only want to add code to linux-user-chroot if that's the best place for it.

I've done a bit of digging around other tools as well. You're right that it's hard to find a container tool that doesn't require 'root' at present. Seems weird that nobody in the has visibly attacked that problem yet, but I guess it's quite intractable with all the networking and namespace setup that needs to be done. I do think the App Container spec has promise as it's so simple. I was happy to find out that 'rkt' wraps 'systemd-nspawn' as well...

I confess I've punted looking closely at xdg-app and so far. I had in my head that SECCOMP would be too limiting for a build system, but now I see there's a trick where you have a privileged thread communicating with the SECCOMP sandbox.

I've been collecting a list of 'related sandboxing projects' at <>. I'll be very surprised if 'sandboxlib' somehow grows to usefully wrap all possible sandboxing mechanisms, but hopefully the list will be useful to people. (I shall add a link to <>).

As I said above, probably the only 'perfect' solution for build sandboxing is to virtualise the entire machine. <> describes some work Intel have done with really fast-loading KVM containers. But that's x86 only, and it seems quite a way off being ready (I couldn't actually get the "Clear Containers" demo to work at all).

Thank you for your feedback!


Sam Thursfield, Codethink Ltd.
Office telephone: +44 161 236 5575

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]