Re: fsync in glib/gio
- From: Mark Mielke <mark mark mielke cc>
- To: Alexander Larsson <alexl redhat com>
- Cc: gtk-devel-list gnome org
- Subject: Re: fsync in glib/gio
- Date: Sat, 14 Mar 2009 13:38:54 -0400
Alexander Larsson wrote:
2) such filesystems are broken
Clearly the answer to 1 is yes. Anything else would be a disservice to
our users data. However, that doesn't mean such filesystems aren't
broken, in the sense that I would never let a filesystem like that near
any of my data.
For instance, any script doing sed -i s/foo/bar/ file.conf on such a
filesystem risks ending up with a zero byte file.conf. (sed uses rename
but doesn't fsync.) Is this what users except? Should that script be
rewritten in C so it can use fsync? Should sed fsync? That kind of
reasoning will lead to all apps implementing fsync-on-close manually,
and we're then worse off than if the fs just guaranteed
data-before-metadata-on-rename.
Broken file system or not - all portable applications that require the
file system to be in a particular state before continuing should use
fsync(). rename() to accomplish the effect of atomic-change-in-place is
exactly the sort of scenario where you want to guarantee a particular
state before continuing, and it is exactly the sort of place where
fsync() should be performed explicitly. This is without regard to
whatever general safety is provided by the file system. It is wrong to
conclude that fsync() is unnecessary. Should sed -i use fsync()? If it
is promising atomic-change-in-place, then it certainly should.
Also, if the user chooses a file system which makes fewer guarantees
during a "pull the plug" test, they should be willing to live with their
choices. In ext2 days and FAT16/32 days, the effects could be very bad.
This thread has focused on the rename() case, often used to have the
atomic-change-in-place effect. There are other cases that even your most
favourite file system mode may not "protect" you from. Most file systems
won't guarantee a write() order to disk, as I listed before. Heck, even
if you write() 2 x 512-byte blocks in a row - you are not guaranteed
that the first block will be written before the second. The system
probably tunes for sequential writes to improve performance, but it's
not a guarantee, and if you wrote the second before the first - you
might find that the first still writes before the second. This is
probably worse in the mmap() case where pages are dirtied. Which page
will be flushed to disk first? The only way you know for sure is with
barriers like fsync(). This promises that it will not proceed until the
state has been sent to disk.
File system journalling was introduced to improve file system recovery
speed and accuracy. It was not introduced to provide the ACID guarantees
associated with database systems. There is room for it to take this
direction - but the expectation that is must is unrealistic. If a lot of
code out there happens to be buggy (for example - close()/rename() from
atomic-change-in-place), then the file system can try to work around
these bugs, but I think it's wrote to say that it must.
Cheers,
mark
--
Mark Mielke <mark mielke cc>
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]