Re: How do we store/install apps?



On tis, 2014-10-21 at 13:31 +0200, Lennart Poettering wrote:
On Mon, 13.10.14 08:44, Alexander Larsson (alexl redhat com) wrote:

In some sense it is unavoidable. We have to tie the exact file data to
the signature. However, does this mean we have to shove random bits at
the kernel rather than going through the syscall interface?

btrfs-receive is a userspace tool that uses the regular userspace i/o
syscalls to do its modifications. How does this propose to handle the
signatures? If it can do it, why would it not be possible to do
ourselves?

Sure, it's possible to implement our own btrfs send/recv
implementation in userspace. 

At LPC we sat down with Chris Mason about this, and it's certainly an
option for us, the code for serializing/deserializing things is
supposedly not that difficult.

btrfs-send is a kernel-space tool (a syscall), but all it generates is
an array of "op + data" tuples which btrfs-receive applies using the
normal syscalls (i.e. op=write, data=file,offset,content, or op=rename,
data=src,dest).

This is very nice for things like e.g. an incremental backup of a
database where only some blocks changed of the file. However, for an app
upgrade you generally rebuild from scratch, you don't actually modify
the previous release. The delta must be generated by a userspace tool
like e.g. rsyncing the new release over the old, so the use of
btrfs-send is really just a way to encode the output of rsync.

Also, the hardlink farms are certainly not pretty.

They are not pretty, sure. However they are very widely available, and
the *only* solution that allows page-cache sharing between images, and
"trivial" deduplication between unrelated images. I don't think we
should to easily dismiss it.

So, we asked Chris about dedup. He basically said that online dedup is
there, and will be done implicitly when you do btrfs recv hence. Or in
other words, dedup is really nothing we need to actviely think about
if we use btrfs, it's just there.

That is only dedup of the parent<->child though, not between unrelated
images. And the dedup is only on the disk, not in page cache.

Harald has been playing around with some build logic that makes sure
that rebuilt app updates are efficiently shipped as btrfs send/recv,
with stable inode numbers and stuff.

How exactly do you envision this would work in practice for updates? Say
you have an application that receives regular updates (major and minor).
At any time the user comes in an does a fetch-from-scratch, or an update
between two essentially "random" versions.  What does the server store?
A copy of each full image? Only for major versions? Delta inbetween each
consecutive image? Delta between each possible image pair?

Well, it could certainly generate the diffs on the fly, by looking at
the actual btrfs volumes with their subvolumes. However, I'd assume
we'd pre-generate relevant deltas in advance, maybe in logarithmic
increasing distances.

You don't want the servers to be doing "smart" things, they are
typically very dumb mirroring systems that just store and deliver plain
files. So, in the btrfs case one would have to store minimally the
initial version and all incremental deltas, and then to decrease the
amount users have to download you have to start duplicating this by
adding various kinds of deltas and full versions, plus some kind of
indexing system for these so you know what are available. Not
impossible, but its not trivial either, and you'll have to duplicate a
lot of data to avoid the initial download of a "random" (i.e. not the
first) version to be fast.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]