Re: fsync in glib/gio

From: Mark Mielke <mark mark mielke cc>
To: Behdad Esfahbod <behdad behdad org>
Cc: Federico Mena Quintero <federico novell com>, gtk-devel-list gnome org, Alexander Larsson <alexl redhat com>, Morten Welinder <mortenw gnome org>
Subject: Re: fsync in glib/gio
Date: Fri, 13 Mar 2009 17:35:58 -0400

Behdad Esfahbod wrote:

Its well explained in the various discussions about this. Essentially,
the metadata for the rename is written to disk, but the data in the file
is not (yet, due to delayed allocation) and then the system crashes. On
fsck we discover the file is broken (no data) and set the file size to
0.
That's clearly a broken filesystem (screw standards. If it doesn't dowhat users expect, it's broken). Why work around it in gio? Have thefilesystem guys "fix" it for whatever that means.
All we need the few major distros handling it properly.

There are different definitions of broken. I might consider it "broken"if my glib/gio application which writes out thousands of little filesand suddenly starts taking twice as long as my Perl program. fsync() hasa real measurable cost.

The documentation for the file systems is usually quite clear, althoughpeople may not understand it. When a file system such as ext3 offersdifferent journal modes, it's presumed that the user understands theeffect of their choice. It the user must be absolutely safe - theyshould use 'journal' mode - but this may be slow, as all data is writtento disk at least twice. If 'ordered' mode is used, this means they arewilling to accept lesser guarantees for increased performance. The ext3'ordered' mode works pretty well - data before metadata, and metadataupdates are ordered. But - it's not perfect! What order is the datawritten in?

Do you intend to patch glib/gio if somebody reports that glib/gio usedwrite() to one part of a file, then another part of a file, then pullsthe plug, and is able to prove to you that it is possible for the secondwrite to finish while the first write hasn't started? Will you call itabsolutely broken and demand a file system fix?

The ext3 'writeback' mode provides even fewer guarantees. Meta data isordered, data is not. The behaviour you are talking about right nowseems to be 'writeback' mode (not sure - I guess ext4 is doing'writeback' mode by default otherwise I don't understand the complaint?).

Tell your users if they expect the right thing to use 'journal' mode.glib/gio cannot and should not be second guessing the file system choiceof the user. Taking this argument to its extreme, you may as well runfsync() after every single I/O operations that performs a modification.This would be horrible for performance, and the user has this capabilityalready by defining the file system as completely journalled and usingsynchronous writes. They don't need glib/gio to simulate this.

My opinion is that glib/gio shouldn't be doing this stuff. The problemis not with glib/gio. glib/gio should offer an fsync() wrapper (not sureif it does or not - I don't use it), such that applications with specialrequirements such as a database application, can use fsync() atstrategic points where the application wishes to make greater promisesthan the file system. A database file applies here. For databasesspecifically, fsync() on close() is not good enough. fsync() needs to bedone at any point that the data needs to be consistent and written todisk before the application continues to do another write(). glib/giocannot guess where these points are.


Putting fsync() on close() is a hack.

Just my opinion. :-)

Cheers,
mark

--
Mark Mielke <mark mielke cc>

Follow-Ups:
- Re: fsync in glib/gio
  - From: Mark Mielke

References:
- fsync in glib/gio
  - From: Alexander Larsson
- Re: fsync in glib/gio
  - From: Morten Welinder
- Re: fsync in glib/gio
  - From: Federico Mena Quintero
- Re: fsync in glib/gio
  - From: Alexander Larsson
- Re: fsync in glib/gio
  - From: Behdad Esfahbod

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]