Re: [PATCH] Add --disable-fsync option to both commit and pull (non-local) commands.



On Tue, Jun 3, 2014, at 10:21 PM, James Antill wrote:

 I can't think why it would be a per-remote config. 

Right, we could just use the global repo config.  Something
like the attached patch?

 Wow. ... that is faster :).

Investing in a good SSD is *really* worth the money =)  Also
I think we're seeing some XFS-versus-ext4 here.

 This is about doing an fsync() in another context while you continue to
append to the file ... AFAIK in the path we are on, it's a pure write
once and then fsync(). So we should be good.

True, though I think in at least some cases fsync() can force a flush
of "unrelated" content.

 Right, with the batching and the dir. syncs. I was hoping that the
kernel would get the message and start writing all the data to disk as
soon as we called fsync on anything, alas. no :).
 With the forking bit, I was just more explicit with my prodding of the
kernel (and it seems to get the message then ;). Alas. AFAIK in 2014 we
still can't do aio much better than this (have you used the asyncronous
part of gio?).

There's two paths here in ostree for writing to the repo.  One is
"ostree pull" which is asynchronous (using threads).  We parallelize
downloading via HTTP with checksumming/fsync.  

"ostree commit" is totally serial right now, though the API would allow
a consuming app to parallelize itself.  But I've been meaning to make
commit multithreaded/async as well.

 No need to be that fancy. As soon as you can group a lot of objects
into a single file, like git's pack files, then one fsync for the file
should solve the problem well enough.

We can't do pack files for bare repositories - the whole point is
to be able to hard link content.  However we could do metadata
pack files.  I think I have a bug for that.  

 But, yeh, git certainly takes the approach of "don't fsync, as we can
always fsck/pull again later".

I've actually lost work after "git commit" and then a kernel crash.  But
for the local developer case it's fine, I didn't loose too much work. 
Though
I think you *could* lose an entire repository if you lost power right
after
a pack file was written and the loose objects were deleted.

And If you were implementing a git server side, I think you'd want to
commit to some level of durability before acknowledging completion. 
This type of thing is why Google wrote custom Subversion and
git backends, I think backed by GFS.


See attached patch.

Attachment: 0001-repo-Support-fsync-false-configuration.patch
Description: Text Data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]