rsyncing repositories



[Splitting out this rsync topic]

On Fri, Oct 7, 2016, at 05:12 PM, Dan Nicholson wrote:

Oh, I thought the issues here were well known.

We should have a bug/issue to reference, otherwise they may
get forgotten.  I'm sure we've talked about this in the past,
but let's try to be sure issues get filed.
 
Now, two things, first I've created:

https://github.com/ostreedev/ostree-releng-scripts

And submitted a PR:

https://github.com/ostreedev/ostree-releng-scripts/pull/2

I've lightly tested this script, and I'd like to support rsyncing
repositories, even though I think we can get sshfs to perform
better ( https://bugzilla.gnome.org/show_bug.cgi?id=756540 )

1. Works entirely by chance because objects sorts before refs, which
sorts before summary and rsync publishes in alphabetical sort order.

Yes, but that also isn't a bug =)

2. Objects are pushed directly into place. If there's a crash or
network interruption in the middle of the sync, you have a possibly
corrupt repo. If you smartly turn on the --delay-updates option so
that files are uploaded to a temporary name and renamed into place,
you now might leave behind a bunch of hidden file cruft if there's an
interruption.

This doesn't seem true to me;  it requires --inplace to be specified.
Were you doing that?

3. If you want to support removing commits, then you have to use
--delete. Due to the above ordering, you'll now remove objects before
the refs have been removed and you now have an invalid repository if
anyone pulls during that window.

This indeed is a serious issue, and my rsync PR above doesn't address
it, but I want to lay the groundwork there first, and then handle --delete
afterwards.

4. Since there's no locking of the ostree repo on the source end, you
can publish broken commits. This would happen if a source ref got
updated after rsync had completed part of the objects sync (this has
definitely happened to us before).

Yes, although I think of this as a pipeline...you have multiple internal
workers which are generating content into an internal repo, then
that repo is locked when publishing.

We definitely need locking to do pruning of the repo.  One way to
implement this would simply be to create a temporary snapshot
via pull-local of the repo to do a prune + rsync.  That means the
"base repo" will accumulate space, but now it's possible to
replace it at any point with the "public repo".

You can probably think of lots more. It's really only safe if you know
both sides won't be accessed during the sync.

Let's leave aside concurrency on the write+sync side for the
moment, and focus on these steps:

1) Implementing an rsync script without delete support
2) Enhance script to do deletes
3) Create a higher level script which does snapshotting for
    clones or something as sketched out above (I think it'll work,
   but baby steps first)

The other idea I've been thinking about is another round of
ostree-push script where you use ssh to tunnel a local HTTP port to
the remote and and use pull. Haven't had time to play with that,
though.

What do you think about merging in your push script into
ostree-releng-scripts?  Actually, I'm uncertain - are you using
ostree-push right now?  How do you think of it handling deletions?
Or to flip this around, is it worth doing the push script if we
can enhance the rsync wrapper enough and/or sshfs?


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]