Re: fsync in glib/gio



Alexander Larsson wrote:
On Fri, 2009-03-13 at 11:11 -0700, Brian J. Tarricone wrote:
Sven Neumann wrote:

It seems wrong to work around broken file-systems on the application
level. That only takes away pressure from the file-system developers to
address the problem properly.
>>
How is the file system broken? Read the man page for write(). If you want to guarantee that file data will hit disk (or at least the disk's HW buffer) by a certain time, you need to call fsync() (or fdatasync(), where available).

The fact that its documented doesn't make it not broken. If you read the
posix specs you'll see that its per specification for the implementation
of fsync() to be empty.

That's not the point I'm trying to make. It may be 'stupid' behavior, but it's at least specified. Saying "the filesystem guys should fix their filesystems to be less lame" just doesn't work, as they're compliant with the spec. So either the app developer can write their save routines to be robust *in the face of the spec*, or they can 'hope' that every new FS adopts a restriction on behavior that isn't specified anywhere, and every old FS is modified and updated to follow this fantasy restriction. Doesn't that sound a bit like unreasonable wishful thinking?

Now, we don't actually really need the data to be on the disk at a
certain time. On the contrary, its really fine if its delayed. But, what
we want is either the old file in place, or the new file in place, not
the old file deleted, the metadata for the new file and the new file
being empty. Thats what is broken, even if its allowed by POSIX.

Sure, but that's just a special case. So you (as the app developer) recognise this, understand how the spec interacts with your use-case, and write robust code accordingly.

Or, you take the "the spec/kernel/FS is broken" approach, and try to get a guarantee specified for the special case, something like "in the case where a file is renamed over top of an existing file, the source file must be flushed to disk before the rename takes place." And then the app developer doesn't have to worry about it, because the implementation should do the right thing.

Your patch to gio takes the first approach, which is fine, I think, if unfortunate in the sense that it forces behavior that may not be desired. A user of g_file_set_contents() may be writing a temp file or something that they don't care all that much about, and doing so arguably reduces performance. Of course, g_file_set_contents() is a decently high-level abstraction, so one could argue that people who want finer control over how the file gets written should use gio or open/write/close directly.

This isn't a Linux idiosyncrasy, even.  POSIX specifies this.

The only thing that's actually broken IIRC is ext3, in that a fsync() effectively acts as a full-FS sync() (see the Firefox 3.0/sqlite fiasco[1]), which is ridiculous. If anything should be fixed, *that* should be... as well as naive applications that think that open() -> write() -> close() is sufficient to get data to disk in a known amount of time.

Broken is a wider concept than you think. Things that are fully up to
some well documented spec can also be broken from the point of view of
common sense.

Yeah, I'd totally agree. But in the absence of an ability to change the spec, it's best to try to make things work as well as they can within the spec, no? It seems like some people are advocating "well, today everyone uses ext3, and there's no problem, so we shouldn't do this because it'll reduce performance there." And of course, a year from now (or less! obviously some already are), I'm sure most desktop distros will be shipping with ext4 default. (And I could be wrong, but it seems to me that ext3 is the only FS that, by coincidence will usually be immune to this problem, and, also coincidentally, is one of the only FSes that has crappy fsync() performance.)

I dunno... my vote/opinion would be to have a _SYNC flag, leave async as the default, and force _SYNC for g_file_set_contents() (maybe?) and for cases in gio where we know a rename is going to overwrite an existing file (if it's possible to know that without a perf hit).

	-brian


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]