Re: fsync in glib/gio

From: Mark Mielke <mark mark mielke cc>
To: Alexander Larsson <alexl redhat com>
Cc: gtk-devel-list gnome org
Subject: Re: fsync in glib/gio
Date: Sat, 14 Mar 2009 13:38:54 -0400

Alexander Larsson wrote:

2) such filesystems are broken

Clearly the answer to 1 is yes. Anything else would be a disservice to
our users data. However, that doesn't mean such filesystems aren't
broken, in the sense that I would never let a filesystem like that near
any of my data.

For instance, any script doing sed -i s/foo/bar/ file.conf on such a
filesystem risks ending up with a zero byte file.conf. (sed uses rename
but doesn't fsync.) Is this what users except? Should that script be
rewritten in C so it can use fsync? Should sed fsync? That kind of
reasoning will lead to all apps implementing fsync-on-close manually,
and we're then worse off than if the fs just guaranteed
data-before-metadata-on-rename.

Broken file system or not - all portable applications that require thefile system to be in a particular state before continuing should usefsync(). rename() to accomplish the effect of atomic-change-in-place isexactly the sort of scenario where you want to guarantee a particularstate before continuing, and it is exactly the sort of place wherefsync() should be performed explicitly. This is without regard towhatever general safety is provided by the file system. It is wrong toconclude that fsync() is unnecessary. Should sed -i use fsync()? If itis promising atomic-change-in-place, then it certainly should.

Also, if the user chooses a file system which makes fewer guaranteesduring a "pull the plug" test, they should be willing to live with theirchoices. In ext2 days and FAT16/32 days, the effects could be very bad.

This thread has focused on the rename() case, often used to have theatomic-change-in-place effect. There are other cases that even your mostfavourite file system mode may not "protect" you from. Most file systemswon't guarantee a write() order to disk, as I listed before. Heck, evenif you write() 2 x 512-byte blocks in a row - you are not guaranteedthat the first block will be written before the second. The systemprobably tunes for sequential writes to improve performance, but it'snot a guarantee, and if you wrote the second before the first - youmight find that the first still writes before the second. This isprobably worse in the mmap() case where pages are dirtied. Which pagewill be flushed to disk first? The only way you know for sure is withbarriers like fsync(). This promises that it will not proceed until thestate has been sent to disk.

File system journalling was introduced to improve file system recoveryspeed and accuracy. It was not introduced to provide the ACID guaranteesassociated with database systems. There is room for it to take thisdirection - but the expectation that is must is unrealistic. If a lot ofcode out there happens to be buggy (for example - close()/rename() fromatomic-change-in-place), then the file system can try to work aroundthese bugs, but I think it's wrote to say that it must.


Cheers,
mark

--
Mark Mielke <mark mielke cc>

Follow-Ups:
- Re: fsync in glib/gio
  - From: Alexander Larsson

References:
- fsync in glib/gio
  - From: Alexander Larsson
- Re: fsync in glib/gio
  - From: Michael Meeks
- Re: fsync in glib/gio
  - From: Alexander Larsson
- Re: fsync in glib/gio
  - From: Sven Neumann
- Re: fsync in glib/gio
  - From: Mathias Hasselmann
- Re: fsync in glib/gio
  - From: Sven Neumann
- Re: fsync in glib/gio
  - From: Brian J. Tarricone
- Re: fsync in glib/gio
  - From: Alexander Larsson
- Re: fsync in glib/gio
  - From: Brian J. Tarricone
- Re: fsync in glib/gio
  - From: Alexander Larsson

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]