I'm done with O_CLOEXEC



karaj,

For those unfamiliar with the issue: when a process is created on UNIX
via naive fork() and exec(), the default is that the process will
inherit all of the file descriptors of the parent.  This makes a lot of
sense for stdin, stdout and stderr, but almost nothing else.

This has been the cause of a lot of strange problems over the years. 
The typical example: a process will open a listener socket at some point
and sometime later will call a library function that does a naive
fork()/exec() of a helper process that hangs around past the lifetime of
the original program.  When you try to restart the first program, the
socket is still being held open by the helper and the new instance can't
bind it again.

There are two fixes to this problem.

The first one, which we have been pursuing during the past several
years, is to try to mark every file descriptor that we create as
O_CLOEXEC.  This is particularly fun in multi-threaded programs because
it means that we have a race between the creation of a file descriptor
and marking it O_CLOEXEC vs. a fork() that may be happening in another
thread.  This has led to the creation of a whole bunch of new syscalls
to allow creation of file descriptors that already have O_CLOEXEC set
from the start, thus avoiding the race.  We have tried to use these
syscalls where possible, but they usually are not part of POSIX. 
Somethings they are completely unavailable, even in Linux, or when they
are available, they have other annoying limitations.

The other fix to the problem is one that we have had in place for a long
time in the g_spawn_*() family of APIs, and also in the new GSubprocess
API.  The trick involves close()ing all fds (except stdin/out/err) each
time we do a fork()/exec() pair.

Assuming it is practised universally, only one of these fixes is
necessary.

Today I am suggesting that we completely abandon our attempts to follow
the first approach.  I'm done with O_CLOEXEC.

What led me to this was the dup3() system call.  This is a variant of
dup2() that was added (as part of the efforts mentioned above) to avoid
the O_CLOEXEC race:

  int dup3(int oldfd, int newfd, int flags);

unfortunately:

  dup3(0, -1, 0)                          = -1 EBADF (Bad file
  descriptor)

which means that using this as a stand-in for dup() is a no-go.  I could
probably work around that by creating a new eventfd() or unbound UNIX
socket in order to get a new fd number (while being careful to mark it
as O_CLOEXEC as well) before using dup3().  We could probably also get
this fixed in Linux, but dup3() has already been widely copied and we
would then have to go about detecting which implementations are working
and which aren't, and include a fallback (which would have to be
implemented using the same dirty hacks mentioned above).  I've had
enough with these games, and this isn't really about dup3() anyway.

O_CLOEXEC is useless.

Okay.  O_CLOEXEC is useful for one thing: when spawning a new process
using fork()/exec(), you may want to know if exec() worked.  An old
trick for this is to create a pipe and mark the writer end O_CLOEXEC. 
The reader end will read EOF (due to the close of the writer) once
exec() has succeeded.  Otherwise, you can indicate the error by sending
some other data through the pipe and calling exit().

Aside from that, O_CLOEXEC is useless.

So: starting today I'm going to stop worrying about O_CLOEXEC being set
on every file descriptor that GLib creates.  I'm not going to go and
retroactively tear things out where they are already working, unless it
would provide a substantial cleanup or fixes an actual bug.  I'm not
just going to go around looking for #ifdefs to remove.

I believe this is justified for a few reasons:

 - during the GSubprocess discussion, I originally held the opposite
 opinion, but eventually became convinced (by Colin) to see the
 inherit-by-default behaviour of exec() as nothing more than a
 questionable implementation detail of the underlying OS.  Consequently,
 at the high level, GSubprocess provides an API that gives the caller
 direct control over what is inherited and what is not, and that's just
 the way that it should be.

 - this behaviour is not limited to GSubprocess.  Closing all fds before
 calling exec() is a common practice in modern libraries and runtimes,
 and for good reason.

 - fixing the few places that we spawn other programs is massively
 preferable to fixing the hundreds or thousands of places that we create
 new file descriptors

 - in the world of D-Bus activation, direct spawning of long-lived
 helper processes is just not something that we do anymore anyway.  fds
 are not the only thing we have to worry about here.  How about which
 cgroup the helper ends up in?

 - it's 2015 and I'm wondering why system() hasn't been deprecated
 already


Unfortunately, the drawback: it will no longer be safe, in general, to
use GLib with broken old libraries that don't close() fds before exec().

I'm sure others will have an opinion about this.  Did I miss something
important in my argument?

Cheers


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]