Re: fsync in glib/gio



On Thu, 2009-03-12 at 21:27 +0000, Michael Meeks wrote:
> On Thu, 2009-03-12 at 21:11 +0100, Alexander Larsson wrote:
> > With all the recent yahoo about ext4 data loss and fsync I felt I had to
> > look at glib and make sure we're doing this right.
> 
> 	Hmm; is this not just a database guy ? ;-) presumably if -all- file I/O
> should be synchronous, the kernel would do this for us ?

If you want to you can make all i/o sync by mounting it as such. But
thats of course really slow. Generally the gio file write operations are
used for saving files, and people sort of expect that when save returns
the file is ok on disk. 

And to make matters worse, its perfectly ok for a filesystem (ext4, xfs
and btrfs do this atm) to reorder the metadata and the data writes such
that writing to a temp file, renaming over target file and then crashing
can end up with an empty file. This happens if metadata was saved but
not the new file contents, and the window for this is about a minute, so
its not a small race condition.

So, you save and the system hangs 10 seconds later. What do you expect?
Ideally the new file, less problematic the old file. But, if you're left
with *no* version of the file I'd be pretty pissed off.

> > Attached is a patch that makes sure we fsync before closing in the gio
> > file saving code and in g_file_set_contents().
> 
> 	Isn't it the case that with ext3 and below fsync is an impossibly
> expensive operation that gums up the whole system - by taking some
> obscure kernel lock on some other piece of somethingummy and causes
> everything to grind to a halt, your audio to skip, and instant hair
> loss ? ;-)

With the data=ordered setting in ext3 (the default), any fsync will
result in all dirty data being flushed, not just the data in that file.
This can be pretty expensive if there is a lot of outstanding I/O.
However, this is only a problem if such an operation happens often, and
file saving is just not that common. And if something constantly is
saving something that is a problem for multiple other reasons too and
should be fixed. 

Of course, not all file writes are saves. For example, it could be
nautilus copying 10000 files. This is why I added the ASYNC_WRITE flag
and used it in the file copy case.

> 	I believe they fixed this for ext4, which is nice for them; but ... for
> everyone else ? What data-loss case are we really trying to protect
> against ? of course, if you hard yank the power, bad things can happen;
> but how often does that occur ?

It occurs often enought that there were several people in the ubuntu
ext4-eats-my-data bug that had it happen to them multiple times.

> 	AFAIR we spent some cycles in evolution recently to reduce the
> ridiculous number of fsync's that sqlite was injecting into each
> transaction to make the message store perform reasonably and not grind
> the whole system to a halt ;-) at 10ms per fsync, that makes some sense.

The sqlite case is slightly different, basically same as the firefox
case:
http://shaver.off.net/diary/2008/05/25/fsyncers-and-curveballs/

Basically, once you're syncing the database regularly we're talking
about constantly syncing, not just syncing when you're finished saving.
So, the problem is much worse. I think in the firefox case it synced for
every key you pressed in the awesome bar.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]