Re: fsync in glib/gio
- From: Mark Mielke <mark mark mielke cc>
- To: Behdad Esfahbod <behdad behdad org>
- Cc: Federico Mena Quintero <federico novell com>, gtk-devel-list gnome org, Alexander Larsson <alexl redhat com>, Morten Welinder <mortenw gnome org>
- Subject: Re: fsync in glib/gio
- Date: Fri, 13 Mar 2009 17:35:58 -0400
Behdad Esfahbod wrote:
Its well explained in the various discussions about this. Essentially,
the metadata for the rename is written to disk, but the data in the file
is not (yet, due to delayed allocation) and then the system crashes. On
fsck we discover the file is broken (no data) and set the file size to
0.
That's clearly a broken filesystem (screw standards. If it doesn't do
what users expect, it's broken). Why work around it in gio? Have the
filesystem guys "fix" it for whatever that means.
All we need the few major distros handling it properly.
There are different definitions of broken. I might consider it "broken"
if my glib/gio application which writes out thousands of little files
and suddenly starts taking twice as long as my Perl program. fsync() has
a real measurable cost.
The documentation for the file systems is usually quite clear, although
people may not understand it. When a file system such as ext3 offers
different journal modes, it's presumed that the user understands the
effect of their choice. It the user must be absolutely safe - they
should use 'journal' mode - but this may be slow, as all data is written
to disk at least twice. If 'ordered' mode is used, this means they are
willing to accept lesser guarantees for increased performance. The ext3
'ordered' mode works pretty well - data before metadata, and metadata
updates are ordered. But - it's not perfect! What order is the data
written in?
Do you intend to patch glib/gio if somebody reports that glib/gio used
write() to one part of a file, then another part of a file, then pulls
the plug, and is able to prove to you that it is possible for the second
write to finish while the first write hasn't started? Will you call it
absolutely broken and demand a file system fix?
The ext3 'writeback' mode provides even fewer guarantees. Meta data is
ordered, data is not. The behaviour you are talking about right now
seems to be 'writeback' mode (not sure - I guess ext4 is doing
'writeback' mode by default otherwise I don't understand the complaint?).
Tell your users if they expect the right thing to use 'journal' mode.
glib/gio cannot and should not be second guessing the file system choice
of the user. Taking this argument to its extreme, you may as well run
fsync() after every single I/O operations that performs a modification.
This would be horrible for performance, and the user has this capability
already by defining the file system as completely journalled and using
synchronous writes. They don't need glib/gio to simulate this.
My opinion is that glib/gio shouldn't be doing this stuff. The problem
is not with glib/gio. glib/gio should offer an fsync() wrapper (not sure
if it does or not - I don't use it), such that applications with special
requirements such as a database application, can use fsync() at
strategic points where the application wishes to make greater promises
than the file system. A database file applies here. For databases
specifically, fsync() on close() is not good enough. fsync() needs to be
done at any point that the data needs to be consistent and written to
disk before the application continues to do another write(). glib/gio
cannot guess where these points are.
Putting fsync() on close() is a hack.
Just my opinion. :-)
Cheers,
mark
--
Mark Mielke <mark mielke cc>
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]