Re: Some comments about GVFS



I just had a chat with Alex on IRC. (I have attached the log if anyone
wants to read it.) Some of my ideas have changed after this
discussion, so I might contradict my previous mail now. Bear with me.
I've come to understand now that Alex' and my idea for what a stream
is and how it should work are very different because we have very
different use cases. Alex comes from the File manager world. For him
GVFS is primarily a way to interact with "files" all over the world,
rename them, tag them, and access it's contents.
For me it's all about getting a stream to/from somewhere and
write/read data to/from it.
I'm wondering is if there are use cases for an I/O library that are
different from those 2 use cases.

During the discussion the question of how to write an async version of
g_file_get_contents came up. (That's the paste link in the log.) I
have attached my quick'n'dirty showcase of how I'd imagine that would
work here, too. The git repo from today contains an implementation
from Alex. That one actually works.

So here's some followup comments. I'd like to note that what I care
most about, because that's important for a nice API is point 7 below.

> 1b) Cancelling operations from another thread doesn't look like good
> design to me. I've learnt (both in theory and in practice) that
> threads are supposed to be independant and not call into each other.

Eh? How else would you cancel a blocking i/o call? From within the
blocked thread? This is just bogus, cancellation is all about
cross-thread operations.

I'd argue that if you do a blocking call, you're aware that it's
blocking and don't want to cancel it. Otherwise you'd use async I/O
with proper cancellation mechanism such as g_main_context_wakeup().
You've told me that it's main use case are transactions like calling
the equivalent of gnome_vfs_xfer_async () which you'd want to
implement in a thread by calling lots of sync operations one after
another.
I can see the use case, but even though I still don't like it, I can't
come up with a better model.

> 2) GVFS seems to avoid the glib main loop for asynchronous operations
> in favour of relying on threads. I'm not sure this provides any
> benefits besides making apps way harder to debug. Is there a reason
> for that?

I wouldn't say it does. It gives you both sync (usefull if you do
threads) and async operations (useful it your typical gui app). Almost
all GUI apps are single threaded and async by desing, and integrating
glib-style async i/o into such apps is generally *much* easier than
manually mucking about with threads.

However, the default async operations are implemented using threads, so
in fact apps are often using threads already when using gvfs. Just in a
way that fits well with a mainloop based gui app. And when doing async
is *much* easier than using sync i/o we can avoid the thread cost by
having a custom async implementation. (For instance the gvfs daemon
protocol is very easy to do async without using threads.)

Yeah, I missed those implementations as I was only grepping for
GMainContext which you don't use.
I think it's a good idea to make the main context customizable so gvfs
can be used from other threads.

There is a default implementation of async operations using the sync
operations and threads. I don't think that means the model is only sync
i/o. I think it gives equal width to both sync and async models.

I'd love if GVFS would advocate the async model over the sync model
but provide both. So g_input_stream_read would be async and
g_input_stream_read_async would exist, too.
The reason for this is that I think in most cases you want the async
behaviour and it helps to tell lazy programmers that this is the right
way to go. It's purely psychological.

In fact, I agree with Linus (the syslet/threadlet thread on lkml is very
interesting wrt to this argument) that there are many operations that
just fit very badly with the asynchronous model, and things like
pathname lookup and complicated filename operations are examples of
that.

The relevant email can be found at http://lkml.org/lkml/2007/2/25/177
- the followup and previous mails might be interesting, too.
I need ot digest this before commenting on it though.

There are two basic models when it comes to async i/o. I call them
"push" and "pull".

The "push" model is when you initiate a transfer, and then you get a new
event each time there is more data availible. The "pull" model is that
you initiate each read and then get an even when data for that read is
availible.

The main difference for the two models are:
* flow control
  The push model doesn't give you much flow control. If you're copying
  from a fast source to a slow destination you will fill up memory with
  the data while waiting for the destination to accept your first block
  of data. You might also not really want to read the whole file (in
  for instance filetype sniffing), and this gets more complicated with
  push.
* pipelining
  The pull model means you have to initiate a pull before actually
  getting any data, and this makes it harder to pipeline reads and
  writes when copying from a source to another. A naive implementation
  will not request new data until the last block is written to the
  destination.
* buffer handling
  The pull model allows you to specify the target buffer before i/o
  happens, whereas the push model allocates buffers itself. This means
  that if we want the data to be in a specific place the push model
  requires an extra memory copy.

Now, in theory both models are basically equivalent. You can add a flow
control operation to the push model (pause()/unpause()) and you can add
automatic readahead to the pull model to fix up pipelining.

I think that solution is fine. However, there is one thing I am
missing: The read_all_available_data_right_now() function call. Or at
least a read_x_bytes_if_available() call. This is interesting because
in some cases I want to avoid calling back into the main loop.
It's an issue in Swfdec with gnome-vfs where I'm supposed to display
how many percent of the file is loaded and that display gets updated
via the main loop. So after every read () of DEFAULT_SIZE I get my
display updated. So loading seems slow even though it isn't, just
because every read goes via the main loop.

> 6) Has there been thoughts about using a buffer-like struct? I seem to
> remember having talked about that last Guadec, but don't remember. To
> elaborate: A buffer structure is basically a refcounted memory region.
> All the multimedia libs make use of a structure like it. Some examples
> are at [1] and [2]. It allows some neat features like avoiding memcpys
> to reference the data and subbuffers.

I'm not sure exactly what this would buy us. We can already avoid extra
memcpys. Care to give some more concrete examples of what this could be
used for in gvfs?

I know David Schleef is much better at explaining the advantages of
refcounted memory regions to others than I am. He was the one that
explained it to me after all. ;)
I'll just cc him and hope he does it here again.

> 7) The error handling in gvfs seems to be oriented towards the "old"
> and in glib still common error model of having people supply a GError
> to every function. I much prefer the cairo model, where basically the
> object keeps its own GError and has a function like cairo_status [3]
> that gives you the last error that occured. This has multiple
> advantages such as that I don't have to check for errors in every
> function, I can get the error on request and don't have to keep it
> around manually and function calls have one parameter less.

First of all, many objects in gvfs don't have that kind of state. For
instance, a GFile is stateless (its equivalent to a path string), so an
operation like g_file_copy() can't really modify any object error state.
Furthermore, the errors in gvfs are generally serious i/o errors
(missing file, etc) that should almost always be displayed to the user.
As such, I think the GError model, with nice error messages and enforced
error checking is a better model for gvfs.

I think Carl said it: There's two types of objects: objects you
operate _with_ (like GInputStream and GOutputstream) and objects you
operate _on_ (like GFile). So stream objects are basically like a
cairo_t in that all functions called on them modify the object itself.
As such it seems a lot nicer to me if everything that affects them
would be attached to them - maybe even as properties. I'm thinking
about the cancellable, the main context it operates in or the error
it's been put in.
I've also looked at the code a bit and it seems that currently a lot
of operations that use streams internally (like
g_file_get_contents_async) need to do the housekeeping with those
objects anyway.

> 8) The API seems very forgiving to (IMO) obvious porogramming errors
> that should g_return_if_fail. g_input_stream_read for example provides
> a proper error message when there's already a pending operation or
> when the size parameter is too large.

Is this such a bad thing? Should we convert these to asserts?

I would advocate this. For one it's not an error message a user should
be presented with ("Hey, someone passed a too large value to read()").
But it's even more interesting when you stick the error into the
object as I'm advocating above.


Another thing that came to my mind is if it would be better to use
signals instead of providing callbacks for some operations. It seems
somewhat weird for read(), but open and in particular close might be
interesting to be implemented using signals.

Benjamin

Attachment: chat
Description: Binary data

Attachment: prototype
Description: Binary data



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]