Making ostree robust wrt concurrent use



I've been looking at making ostree robust for concurrent operations on
a shared (local) repository, as this will be needed for xdg-app. What
I mean by this is multiple processes doing read and/or write
operations to a repo at the same time on the same machine.

We want to make this safe, in the sense that no operation will fail
unexpectedly and we never end up with objects in the repo that are
broken or that reference objects not in the repo. As an exception to
the previous we *do* allow commits to refer to parent that are not in
the repo, as this is very useful for partial local history.

At the same time, we want to allow as much parallelism as possible in
the individual operations. In particular we never want slow operations
like network downloads (pull) to block a local operation. We achieve
this by using lock-less approaches like fsyncs, atomic filesystem
operations and careful ordering of things. And when that fails we fall
back to filesystem locks. In particular, all slow operations are
structured as a transaction where everything is staged in a
temporary directory and not moved into the repository proper until
everything is available and properly synced to disk.

So, what are the risks we have to be aware of?

* Concurrent pulls can step on each others toes, if they contain the
  same object, and we use a shared temporary space for downloads.

* Unclean shutdown can cause the files in temporary areas to be broken.

* Aborted (clean or unclean) moves into the repository can cause the
  repository to contain (correct) objects that refer to non-existent
  objects. (Because moving multiple files is not an atomic operation.)

* A prune/gc operation can be working on stale information about the
  liveness of objects if new objects are moved into the repo or refs
  are modified after the prune operation was started).

* A prune/gc operation (after a ref was removed or changed or the
  prune dropped history) can remove an object that an ongoing
  operation such as checkout, commit or pull depends on.

All these races can currently happen with ostree. Here is a proposal
on how to fix this.

First of all we move to a non-shared per-transaction tmp dir. We still
want to support resumes, so we use a lock file while the transaction
is ongoing and look for unlocked transaction dirs when starting up, as
proposed by Colin in:
  https://bugzilla.gnome.org/show_bug.cgi?id=757611#c1

The temp dir should also be contain the current boot-id, like the
current one so we never accidentally pick up a tmp dir from a
transaction that may have been uncleanly shut down causing partial
files.

Once a pull or commit transaction is staged (and fsynced), and ready
to be moved into the repository we need to make sure that we never
move an object into the repo unless all the objects it references are
already in the repo. We do this by moving all objects in stages:

1) move all leaf object types (FILE, DIR_META, COMMIT_META and
   COMMIT_TOMBSTONE) objects into repo.

2) fsync all object dirs that was touched so we know the filenames are
   stable.

3) During the operation we naturally build up a topological sorting of
   all the objects involved in the operation. For instance, during a
   pull we do a recursive discovery of all objects that are not
   available in the local repo. We save this order, and then use the
   reverse order when moving non-leaf (DIR_META & COMMIT) objects into
   the repo, doing an fsync of the object directories whenever we leave
   all children of some object have been moved.

Since all moves into the repo are last-one-wins of immutable files i
believe this is safe for the case of concurrent pull/commit/checkout
operation (with the exception of commitmetas which are not technically
100% immutable, but I think this is OK in practice).

It is not safe wrt prune operations though.

To make that safe, we introduce a per-repo lock file. During a prune
operation (all the time from starting to scan for lives objects to the
end) we take an exclusive lock. During the transaction commit phase of
pull/commit, and during the entirety of read-only operations such as
checkout/cat/ls/refs and ref-modifying operations like reset or remove
we take a shared lock.

This will make the prune operation safe wrt the other operations, and
it will make the non-transaction operations safe wrt prune. However,
it will not make the transactional operations safe wrt a prune that
happens during a transaction, before we take take the lock. Therefore,
all transactional operations must record what objects they rely on in
the repo during the scan phase, and verify them after they have taken
the repo lock. If some object the transaction relied on are not
available any more, then the transaction has to be aborted and
restarted.

Additionally, we add a config_lock file which is taken in exclusive
mode during the read-modify-write cycle when modifying the config file
(like when adding a remote). Also, after that lock is taken we check
the config file mtime to see if we need to re-read the config before
modifying it.

Does this seem safe? Am I missing some race condition?

Anyway, I'll start working on these.


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                            Red Hat, Inc 
       alexl redhat com            alexander larsson gmail com 
He's a short-sighted small-town astronaut with a winning smile and a way 
with the ladies. She's a hard-bitten junkie magician's assistant with a 
flame-thrower. They fight crime! 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]