Re: How do we store/install apps?



On Mon, 13.10.14 08:44, Alexander Larsson (alexl redhat com) wrote:

In some sense it is unavoidable. We have to tie the exact file data to
the signature. However, does this mean we have to shove random bits at
the kernel rather than going through the syscall interface?

btrfs-receive is a userspace tool that uses the regular userspace i/o
syscalls to do its modifications. How does this propose to handle the
signatures? If it can do it, why would it not be possible to do
ourselves?

Sure, it's possible to implement our own btrfs send/recv
implementation in userspace. 

At LPC we sat down with Chris Mason about this, and it's certainly an
option for us, the code for serializing/deserializing things is
supposedly not that difficult.

Also, the hardlink farms are certainly not pretty.

They are not pretty, sure. However they are very widely available, and
the *only* solution that allows page-cache sharing between images, and
"trivial" deduplication between unrelated images. I don't think we
should to easily dismiss it.

So, we asked Chris about dedup. He basically said that online dedup is
there, and will be done implicitly when you do btrfs recv hence. Or in
other words, dedup is really nothing we need to actviely think about
if we use btrfs, it's just there.

Harald has been playing around with some build logic that makes sure
that rebuilt app updates are efficiently shipped as btrfs send/recv,
with stable inode numbers and stuff.

How exactly do you envision this would work in practice for updates? Say
you have an application that receives regular updates (major and minor).
At any time the user comes in an does a fetch-from-scratch, or an update
between two essentially "random" versions.  What does the server store?
A copy of each full image? Only for major versions? Delta inbetween each
consecutive image? Delta between each possible image pair?

Well, it could certainly generate the diffs on the fly, by looking at
the actual btrfs volumes with their subvolumes. However, I'd assume
we'd pre-generate relevant deltas in advance, maybe in logarithmic
increasing distances.

You know, this is explicitly something where we shouldn't reinvent the
wheel. It's quite frankly crazy to come up with a new serialization
format, that contains per-file verification data, that then somehow
can be deserialized on some destination system again back into the fs
layer...

The hard part obviously having the kernel verify the signatures, that
requires deep kernel FS works, which doesn't exist yet, and only the
btrfs people are working on. However, when they come up with something
it could very well be that it can be used for other things than
btrfs-recive (as btrfs-recive is just essentially a stream of syscalls).
Is the design discussions on this happening in the open somewhere?

Yes, we had a couple of phone calls with Chris in the past and met
with him at LPC about this. But we are not involved in the actual
implementation of this, we just make sure we are in sync regarding our
requirements. 

Facebook's requirements and ours are thankfully not too far off. While
they only care for the verified OS, we also want to solve things more
generically.

I know that the Red Hat fs crew hates btrfs like it was the devil, and
loves LVM/DM like it was a healthy project. But yuck, just yuck!

I'm not particularly fond of a device-mapper approach either, but I was
listing all options, so it needed to be in there. That said, I'm also a
btrfs user on all my development machines, and I can't say my experience
with it has been exactly stellar...

Well, true. But again, this won't change unless we actually push it
out to people. And I am very sure that doing this this way is a pretty
nice way, since we will initially only store redundant data in it that
we access only for read pretty much. btrfs really should handle that,
and even if it didn't we can easily reconstruct everything by
downloading the image again.

Lennart

-- 
Lennart Poettering, Red Hat


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]