Re: Some comments about GVFS



On Thu, 2007-05-03 at 11:11 +0200, Benjamin Otte wrote:
> Hi,
> 
> I had an initial look at gvfs in particular the Inputstream and
> Outputstream implementations, and some comments came to my mind, in
> particular about the API. So I thought I'd post them as early as
> possible.
> These comments are about the code in
> http://www.gnome.org/~alexl/git/gvfs.git from today - May 2nd.
> I should also note that these ideas are probably heavily influenced by
> where I'm coming from - namely GStreamer and Swfdec - which involve
> handling low level byte streams possibly caring about low latency. In
> this particular case I had been wondering about making the basic
> input/output objects in Swfdec GInput/OutputStream.
> 
> 1) I don't like GCancellable This is for several reasons:
> 
> 1a) GCancellable is far too visible. It's present in every function
> call, so it must be very important. Maybe exporting
> g_push/pop_current_cancellable () would be a better API because it
> decouples the cancelling from the actual operations?

There are two cases here. For async operations we clearly need to pass
the cancellable object to the operation, because we can't rely on a
thread-local storage setting over the lifetime of the async operations.

Now, for sync calls things are less obvious. At one time I had the
cancellable object as an implicit argument (via tls). However, I ran
into trouble with that and had to be explicit about it. I'm trying to
recall the details of that atm... I think the problem was that if your
implementation of an operation relies on other gio operations those
would always inherit the cancellable of the outer operation. This isn't
always what you want, as there are casese where you need to do a
non-cancellable i/o as part of a larger cancellable operation (an
example is that you don't want to send partial messages and mess up the
protocol state in the daemon backend, instead you want to send a cancel
request).

Maybe there were other reasons too, because it seems like this one would
be fixable if one allows to stack a NULL cancellable over a non-NULL
one.

> 1b) Cancelling operations from another thread doesn't look like good
> design to me. I've learnt (both in theory and in practice) that
> threads are supposed to be independant and not call into each other.

Eh? How else would you cancel a blocking i/o call? From within the
blocked thread? This is just bogus, cancellation is all about
cross-thread operations.

> 2) GVFS seems to avoid the glib main loop for asynchronous operations
> in favour of relying on threads. I'm not sure this provides any
> benefits besides making apps way harder to debug. Is there a reason
> for that?

I wouldn't say it does. It gives you both sync (usefull if you do
threads) and async operations (useful it your typical gui app). Almost
all GUI apps are single threaded and async by desing, and integrating
glib-style async i/o into such apps is generally *much* easier than
manually mucking about with threads.

However, the default async operations are implemented using threads, so
in fact apps are often using threads already when using gvfs. Just in a
way that fits well with a mainloop based gui app. And when doing async
is *much* easier than using sync i/o we can avoid the thread cost by
having a custom async implementation. (For instance the gvfs daemon
protocol is very easy to do async without using threads.)

Of course, threads have their place too. For instance, a complicated set
of file operations like a recursive directory copy is imho best
implemented using sync calls in a single thread. And gvfs is set up to
handle this case well too.

> 3) The whole API seems to be built around the model of synchronous
> operation. (I think so because of this code comment: /* Async ops:
> (optional in derived classes) */) I always thought everyone agrees
> that synchronous operation should be a thing of the past. and only be
> supported as an add-on for lazy coders or really simple apps.
> Almost everyone implements synchronous operations something like this:
>  void foo () { while (errno == EAGAIN) foo_async (); };
> The kernel sure does.

There is a default implementation of async operations using the sync
operations and threads. I don't think that means the model is only sync
i/o. I think it gives equal width to both sync and async models.

I also thing you are naive if you think sync i/o is a thing of the past
and only for "lazy" coders. There are many cases where asynchronous i/o
is way to complicated and doesn't give any advantages over sync i/o.
(Like the above recursive copy operation, i dare you to implement
gnome_vfs_xfer using only asynchronous primitives!). 

In fact, I agree with Linus (the syslet/threadlet thread on lkml is very
interesting wrt to this argument) that there are many operations that
just fit very badly with the asynchronous model, and things like
pathname lookup and complicated filename operations are examples of
that.

> 4) The asynchronous API seems to avoid the POSIX asynchronous model in
> favour of callbacks. Is there a reason why? I'd have guesses an
> asynchronous API looked like GIOChannel with some sugar on top.
> Something like this:
> if (!g_input_stream_read (stream, buf, size))
>   g_input_stream_wakeup_when_available (stream, size, func, data);
> And have that wake up in the glib main context when enough data is available.

There are two basic models when it comes to async i/o. I call them
"push" and "pull". 

The "push" model is when you initiate a transfer, and then you get a new
event each time there is more data availible. The "pull" model is that
you initiate each read and then get an even when data for that read is
availible.

The main difference for the two models are:
* flow control
  The push model doesn't give you much flow control. If you're copying
  from a fast source to a slow destination you will fill up memory with
  the data while waiting for the destination to accept your first block
  of data. You might also not really want to read the whole file (in
  for instance filetype sniffing), and this gets more complicated with
  push.
* pipelining
  The pull model means you have to initiate a pull before actually
  getting any data, and this makes it harder to pipeline reads and
  writes when copying from a source to another. A naive implementation
  will not request new data until the last block is written to the 
  destination.
* buffer handling
  The pull model allows you to specify the target buffer before i/o
  happens, whereas the push model allocates buffers itself. This means
  that if we want the data to be in a specific place the push model
  requires an extra memory copy.

Now, in theory both models are basically equivalent. You can add a flow
control operation to the push model (pause()/unpause()) and you can add
automatic readahead to the pull model to fix up pipelining.

I picked the pull model because i think its more efficient, and matches
what people do with a vfs better. A push model is probably better for a
web-browser i/o engine, but not for file management.

As for callbacks vs nonblocking calls and GSources, I think a callback
model is a more natural and generic way to structure a GUI-style app.
Its how all other uses of the mainloop works (idles, timeouts, gdk
events, etc). It is also a much more general model. Consider for
instance an operation like a file copy. How would you do an async file
copy using nonblocking calls and a GSource? Or an async mount operation?

Furthermore, it matches a pull model much better. Take for instance your
g_input_stream_wakeup_when_available(). How do you implement this for a
local file? Local files don't support select, so it would always be
readable. The only way to handle this truly async is to schedule a read
in a thread and then wait until that finishes. Now we're getting into a
push model automatically. And what if the next app operation was a seek,
not a read?

> 5) You seem to use void * in as the data pointers. All applications I
> know use guchar * (some use gchar *) to handle data. From my stream
> handling experience this is to encourage people to think about what
> they pass to such a function. This seems to encourage calling
> functions like this: write (mywidget, sizeof (MyWidget)) - with is a
> bad idea for multiple reasons, including but not limited to struct
> padding. MS formats seem to have done that a lot.

I'm not partial to any format here. Do people thing that using a char
type rather than a void type is better? 

> 6) Has there been thoughts about using a buffer-like struct? I seem to
> remember having talked about that last Guadec, but don't remember. To
> elaborate: A buffer structure is basically a refcounted memory region.
> All the multimedia libs make use of a structure like it. Some examples
> are at [1] and [2]. It allows some neat features like avoiding memcpys
> to reference the data and subbuffers.

I'm not sure exactly what this would buy us. We can already avoid extra
memcpys. Care to give some more concrete examples of what this could be
used for in gvfs?

> 7) The error handling in gvfs seems to be oriented towards the "old"
> and in glib still common error model of having people supply a GError
> to every function. I much prefer the cairo model, where basically the
> object keeps its own GError and has a function like cairo_status [3]
> that gives you the last error that occured. This has multiple
> advantages such as that I don't have to check for errors in every
> function, I can get the error on request and don't have to keep it
> around manually and function calls have one parameter less.

First of all, many objects in gvfs don't have that kind of state. For
instance, a GFile is stateless (its equivalent to a path string), so an
operation like g_file_copy() can't really modify any object error state.
Furthermore, the errors in gvfs are generally serious i/o errors
(missing file, etc) that should almost always be displayed to the user.
As such, I think the GError model, with nice error messages and enforced
error checking is a better model for gvfs.

> 8) The API seems very forgiving to (IMO) obvious porogramming errors
> that should g_return_if_fail. g_input_stream_read for example provides
> a proper error message when there's already a pending operation or
> when the size parameter is too large.

Is this such a bad thing? Should we convert these to asserts?

> 9) I got the feeling that the API is somewhat designed to make one
> Stream object be usable from multiple threads at once. I can't
> pinpoint that, it's just a feeling. That's not the case, is it?

No. streams are generally non-threadsafe. If you use them across threads
you have to do your own locking. The single exception to this is
cancellation. g_cancellable_cancel() is safe to call even if another
thread is running an operation with this cancellable as argument (in
fact, that is the basic reason for GCancellable to exist).

However, GFile objects are meant to be threadsafe. They are read-only
after creation and the ref/unref calls of gboject are threadsafe, so
they can safely be shared between threads.

> Wow, that got more than I thought it would become. Please don't take
> that as me slamming gvfs, I'm just curious as to why those decisions
> were made.

I hope my replies were satisfying.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                            Red Hat,
Inc 
                   alexl redhat com    alla lysator liu se 
He's an otherworldly flyboy hairdresser who hangs with the wrong crowd. She's 
a scantily clad out-of-work research scientist who inherited a spooky stately 
manor from her late maiden aunt. They fight crime! 




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]