Re: fsync in glib/gio



Alexander Larsson wrote:
2) such filesystems are broken

Clearly the answer to 1 is yes. Anything else would be a disservice to
our users data. However, that doesn't mean such filesystems aren't
broken, in the sense that I would never let a filesystem like that near
any of my data.

For instance, any script doing sed -i s/foo/bar/ file.conf on such a
filesystem risks ending up with a zero byte file.conf. (sed uses rename
but doesn't fsync.) Is this what users except? Should that script be
rewritten in C so it can use fsync? Should sed fsync? That kind of
reasoning will lead to all apps implementing fsync-on-close manually,
and we're then worse off than if the fs just guaranteed
data-before-metadata-on-rename.

Broken file system or not - all portable applications that require the file system to be in a particular state before continuing should use fsync(). rename() to accomplish the effect of atomic-change-in-place is exactly the sort of scenario where you want to guarantee a particular state before continuing, and it is exactly the sort of place where fsync() should be performed explicitly. This is without regard to whatever general safety is provided by the file system. It is wrong to conclude that fsync() is unnecessary. Should sed -i use fsync()? If it is promising atomic-change-in-place, then it certainly should.

Also, if the user chooses a file system which makes fewer guarantees during a "pull the plug" test, they should be willing to live with their choices. In ext2 days and FAT16/32 days, the effects could be very bad.

This thread has focused on the rename() case, often used to have the atomic-change-in-place effect. There are other cases that even your most favourite file system mode may not "protect" you from. Most file systems won't guarantee a write() order to disk, as I listed before. Heck, even if you write() 2 x 512-byte blocks in a row - you are not guaranteed that the first block will be written before the second. The system probably tunes for sequential writes to improve performance, but it's not a guarantee, and if you wrote the second before the first - you might find that the first still writes before the second. This is probably worse in the mmap() case where pages are dirtied. Which page will be flushed to disk first? The only way you know for sure is with barriers like fsync(). This promises that it will not proceed until the state has been sent to disk.

File system journalling was introduced to improve file system recovery speed and accuracy. It was not introduced to provide the ACID guarantees associated with database systems. There is room for it to take this direction - but the expectation that is must is unrealistic. If a lot of code out there happens to be buggy (for example - close()/rename() from atomic-change-in-place), then the file system can try to work around these bugs, but I think it's wrote to say that it must.

Cheers,
mark

--
Mark Mielke <mark mielke cc>



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]