On Tue, 2015-02-10 at 11:20 +0100, Lennart Poettering wrote:
On Mon, 09.02.15 10:47, Philip Withnall (philip tecnocode co uk) wrote:Hi all, From some feedback from real-world users of GLib/GIO (see the ‘Feedback from downstreams’ thread), and just from looking at our own code in GNOME, it seems that people persistently use sync versions of APIs in the main thread when they should be using async ones.*[...]* There are definitely legitimate uses of sync APIs, but not from the main thread, ignoring trivial command line utilities (and even then, if they want to handle signals properly, they should be running a main loop).Uh, oh. This is really oversimplifying things. Note that on Linux disk IO is generally synchronous, and it's good that way, and you cannot really avoid it. I mean, never forget that your executable and its shared libraries are all mapped into memory and access to their pages results in synchronous IO. Even if you wanted you couldn't make that async...
The difference there is that you cannot do _anything_ until the code you want to execute is paged in. For I/O operations on files, you can be redrawing the UI, accessing other files, etc.
I am pretty sure if you do async IO like gio does for every single file access you'll just complicate your program and make it substantially slower. For small files normal, synchronous disk access is a ton faster than dispatching things to background threads, and back...
The problem is that GIO can’t know which accesses are to small, local files, and which aren’t. It already optimises reads from pollable streams (sockets) by keeping them in the main thread and adding them into the main poll() call. How about using the distinction between GIO and gstdio.h? Functions like g_file_get_contents(), or g_open() + read() + g_close() which can safely be used on small, local files, can continue to be called from the main thread? That would be fine for system utilities which _know_ they will operate on a local file system. For typical desktop applications, though, the home directory could be an NFS share and all the ~/.config files could be hidden behind noticeable latency. For those applications, I think GIO should continue to be used, and used asynchronously.
Also, glib has wrappers for making mmaping available to programs, to improve seldom-accessed sparse databases efficient, do you want to prohibit that too?
No, mmap() is clearly a tool for a different kind of problem. If you’re accessing an mmap()ed file, you need to be sure it’s local anyway, I think? GMappedFile doesn’t have async versions of its methods, presumably for this reason.
Moreover on Linux file systems like /proc, /sys, /run, /tmp are known to not be backed by slow physical IO, hence its really pointless accessing them via async IO...
I suggest gstdio.h + normal POSIX read() (as above).
Then, during start-up of your app, when you need to read some config file or so before you can do your first steps, why bother with async stuff? You need to wait for for reading/parsing that file anyway before you can proceed?
This seems to be the only use case where sync I/O calls still seem, to me, to be reasonable. But in my opinion we could suffer the loss of that convenience if doing so means we can easily detect other sync calls from the main thread which _will_ cause problems.
Hence, my recommendation would be to draw the line somewhere between: "potentially unbounded user payload" data and "configuration/control" data. For the former it would be wise to encourage async IO but for the latter certainly not. If you follow what I want to say...
As above, how about making that line the distinction between calling functions from gstdio.h and using GIO? In the former case, you know you’re operating on local files. In the latter, you could be operating on files from the moon. Philip
Attachment:
signature.asc
Description: This is a digitally signed message part