I just had a chat with Alex on IRC. (I have attached the log if anyone wants to read it.) Some of my ideas have changed after this discussion, so I might contradict my previous mail now. Bear with me. I've come to understand now that Alex' and my idea for what a stream is and how it should work are very different because we have very different use cases. Alex comes from the File manager world. For him GVFS is primarily a way to interact with "files" all over the world, rename them, tag them, and access it's contents. For me it's all about getting a stream to/from somewhere and write/read data to/from it. I'm wondering is if there are use cases for an I/O library that are different from those 2 use cases. During the discussion the question of how to write an async version of g_file_get_contents came up. (That's the paste link in the log.) I have attached my quick'n'dirty showcase of how I'd imagine that would work here, too. The git repo from today contains an implementation from Alex. That one actually works. So here's some followup comments. I'd like to note that what I care most about, because that's important for a nice API is point 7 below.
> 1b) Cancelling operations from another thread doesn't look like good > design to me. I've learnt (both in theory and in practice) that > threads are supposed to be independant and not call into each other. Eh? How else would you cancel a blocking i/o call? From within the blocked thread? This is just bogus, cancellation is all about cross-thread operations.
I'd argue that if you do a blocking call, you're aware that it's blocking and don't want to cancel it. Otherwise you'd use async I/O with proper cancellation mechanism such as g_main_context_wakeup(). You've told me that it's main use case are transactions like calling the equivalent of gnome_vfs_xfer_async () which you'd want to implement in a thread by calling lots of sync operations one after another. I can see the use case, but even though I still don't like it, I can't come up with a better model.
> 2) GVFS seems to avoid the glib main loop for asynchronous operations > in favour of relying on threads. I'm not sure this provides any > benefits besides making apps way harder to debug. Is there a reason > for that? I wouldn't say it does. It gives you both sync (usefull if you do threads) and async operations (useful it your typical gui app). Almost all GUI apps are single threaded and async by desing, and integrating glib-style async i/o into such apps is generally *much* easier than manually mucking about with threads. However, the default async operations are implemented using threads, so in fact apps are often using threads already when using gvfs. Just in a way that fits well with a mainloop based gui app. And when doing async is *much* easier than using sync i/o we can avoid the thread cost by having a custom async implementation. (For instance the gvfs daemon protocol is very easy to do async without using threads.)
Yeah, I missed those implementations as I was only grepping for GMainContext which you don't use. I think it's a good idea to make the main context customizable so gvfs can be used from other threads.
There is a default implementation of async operations using the sync operations and threads. I don't think that means the model is only sync i/o. I think it gives equal width to both sync and async models.
I'd love if GVFS would advocate the async model over the sync model but provide both. So g_input_stream_read would be async and g_input_stream_read_async would exist, too. The reason for this is that I think in most cases you want the async behaviour and it helps to tell lazy programmers that this is the right way to go. It's purely psychological.
In fact, I agree with Linus (the syslet/threadlet thread on lkml is very interesting wrt to this argument) that there are many operations that just fit very badly with the asynchronous model, and things like pathname lookup and complicated filename operations are examples of that.
The relevant email can be found at http://lkml.org/lkml/2007/2/25/177 - the followup and previous mails might be interesting, too. I need ot digest this before commenting on it though.
There are two basic models when it comes to async i/o. I call them "push" and "pull". The "push" model is when you initiate a transfer, and then you get a new event each time there is more data availible. The "pull" model is that you initiate each read and then get an even when data for that read is availible. The main difference for the two models are: * flow control The push model doesn't give you much flow control. If you're copying from a fast source to a slow destination you will fill up memory with the data while waiting for the destination to accept your first block of data. You might also not really want to read the whole file (in for instance filetype sniffing), and this gets more complicated with push. * pipelining The pull model means you have to initiate a pull before actually getting any data, and this makes it harder to pipeline reads and writes when copying from a source to another. A naive implementation will not request new data until the last block is written to the destination. * buffer handling The pull model allows you to specify the target buffer before i/o happens, whereas the push model allocates buffers itself. This means that if we want the data to be in a specific place the push model requires an extra memory copy. Now, in theory both models are basically equivalent. You can add a flow control operation to the push model (pause()/unpause()) and you can add automatic readahead to the pull model to fix up pipelining.
I think that solution is fine. However, there is one thing I am missing: The read_all_available_data_right_now() function call. Or at least a read_x_bytes_if_available() call. This is interesting because in some cases I want to avoid calling back into the main loop. It's an issue in Swfdec with gnome-vfs where I'm supposed to display how many percent of the file is loaded and that display gets updated via the main loop. So after every read () of DEFAULT_SIZE I get my display updated. So loading seems slow even though it isn't, just because every read goes via the main loop.
> 6) Has there been thoughts about using a buffer-like struct? I seem to > remember having talked about that last Guadec, but don't remember. To > elaborate: A buffer structure is basically a refcounted memory region. > All the multimedia libs make use of a structure like it. Some examples > are at [1] and [2]. It allows some neat features like avoiding memcpys > to reference the data and subbuffers. I'm not sure exactly what this would buy us. We can already avoid extra memcpys. Care to give some more concrete examples of what this could be used for in gvfs?
I know David Schleef is much better at explaining the advantages of refcounted memory regions to others than I am. He was the one that explained it to me after all. ;) I'll just cc him and hope he does it here again.
> 7) The error handling in gvfs seems to be oriented towards the "old" > and in glib still common error model of having people supply a GError > to every function. I much prefer the cairo model, where basically the > object keeps its own GError and has a function like cairo_status [3] > that gives you the last error that occured. This has multiple > advantages such as that I don't have to check for errors in every > function, I can get the error on request and don't have to keep it > around manually and function calls have one parameter less. First of all, many objects in gvfs don't have that kind of state. For instance, a GFile is stateless (its equivalent to a path string), so an operation like g_file_copy() can't really modify any object error state. Furthermore, the errors in gvfs are generally serious i/o errors (missing file, etc) that should almost always be displayed to the user. As such, I think the GError model, with nice error messages and enforced error checking is a better model for gvfs.
I think Carl said it: There's two types of objects: objects you operate _with_ (like GInputStream and GOutputstream) and objects you operate _on_ (like GFile). So stream objects are basically like a cairo_t in that all functions called on them modify the object itself. As such it seems a lot nicer to me if everything that affects them would be attached to them - maybe even as properties. I'm thinking about the cancellable, the main context it operates in or the error it's been put in. I've also looked at the code a bit and it seems that currently a lot of operations that use streams internally (like g_file_get_contents_async) need to do the housekeeping with those objects anyway.
> 8) The API seems very forgiving to (IMO) obvious porogramming errors > that should g_return_if_fail. g_input_stream_read for example provides > a proper error message when there's already a pending operation or > when the size parameter is too large. Is this such a bad thing? Should we convert these to asserts?
I would advocate this. For one it's not an error message a user should be presented with ("Hey, someone passed a too large value to read()"). But it's even more interesting when you stick the error into the object as I'm advocating above. Another thing that came to my mind is if it would be better to use signals instead of providing callbacks for some operations. It seems somewhat weird for read(), but open and in particular close might be interesting to be implemented using signals. Benjamin
Attachment:
chat
Description: Binary data
Attachment:
prototype
Description: Binary data