gvfs status report

From: Alexander Larsson <alexl redhat com>
To: "gtk-devel-list gnome org" <gtk-devel-list gnome org>, "gnome-vfs-list gnome org" <gnome-vfs-list gnome org>
Subject: gvfs status report
Date: Thu, 15 Feb 2007 16:54:01 +0100
The last month or so I've continued my work on a gnome-vfs replacement
called gvfs. Its still nowhere near finished, but its getting to a
state where it might be interesting for people to look at and give
some feedback. So, here is a writeup on the current design and
codebase.

If you haven't read my previous mail about the shortcomings of
gnome-vfs before I'd recommend you at least scan it before
continuing:
http://www.mail-archive.com/gnome-vfs-list gnome org/msg00899.html

The code is currently availible in a git repository at:
http://www.gnome.org/~alexl/git/gvfs.git

The gvfs code consists of three parts, a library "gio" which has the
API that applications use, a set of daemons handling access to various
file resources (only smb implemented atm), and a dynamic module that
(optionally) gets loaded into gio that communicates with said daemons.

GIO
===

The gio library is meant to be a part of glib. Its a generic I/O
library similar to e.g. java.io.*. Its a "modern" gobject-based
library using things like inheritance and interfaces. As such it can't
be in the main glib library (since that doesn't link to gobject).

Right now it contains these public classes:

GCancellable
GFile
GInputStream
  GInputStreamSocket
  GFileInputStream
GOutputStream
  GFileOutputStream
  GOutputStreamSocket
GSeekable
GFileInfo
GFileEnumerator
GMountOperation
GVfs

Some short explanations:

GCancallable: all i/o operations optionally take one of these so
that you can cancel the operation from another thread (for sync i/o)
or from the mainloop (for async i/o).

GFile: an abstract reference to a file. These are cheap to construct
and handle (i.e. creating one doesn't do any i/o). Consider them an
abstraction of a pathname. In fact, for local files they are just a
filename. Generally we want to avoid handling strings like uris/and
filenames in application code. It only leads to problems with things
like filename escaping etc. The general approach is to construct a
GFile from a filename or uri once, and then use things like
get_parent() and get_child() to navigate the filesystem.

GInputStream, GOutputStream: These are abstract stream base classes
that support both sync and async i/o. For subclasses that only
implement sync i/o there is a thread-based emulation of async
i/o. Compare to java.io.InputStream or System.IO.Stream.

GSeekable: Interface that streams of any type can implement if they
support seeking.

GInputStreamSocket, GOutputStreamStocket: stream implementation for
regular sockets. Created them from a file descriptor. This is
implemented because its used heavily in the daemon communication code.

GFileInputStream, GFileOutputStream: Abstract subclass of the regular
streams that add some features that streams from files typically have,
like seek, tell, and fstat. All the operations on GFile that return a
stream return a GFileStream.

GFileInfo: An abstraction of struct stat. It has a few "hardcoded"
attributes like type, name, size and display name. Then it has a
generic key-value attribute system that allows any backend to expose
its own (namespaced) attributes. (For instance, the local file backend
exposes xattrs and selinux attributes this way.)

GFileEnumerator: enumerates files inside a directory
 
GMountOperation: object that handles the types of callbacks needed
during a mount operation. You create one of these and connect to
e.g. the ask_password signal and then pass it to the mount
operation. When the mount operation needs to ask for a password the
signal will be emitted. There will be a standard implementation of
this in Gtk+ that displays authentication dialogs.

GVfs: Base class for the virtual filesystem. Does mapping to/from
filenames, paths and parse names. This is what gets switched out by
the daemon module.

Additionally there is an implementation of GFile for normal local
files, including streams and file enumerator.

Most of the current stuff in libgio are abstract baseclasses. But one
need only look at the java or dot net io libraries to see some obvious
additions that could be added. I think it would make sense to at least
add:

* base class for "chained" streams similar to the java FilterInputStream
* buffered input/output streams
* memory-array input/output streams
* charset encoding conversion streams
* some form of binary data stream that makes it easy to read/write
  binary data (int32, char, float, strings, etc) from/to any stream

GVFS:
====

The virtual filesystem as such is all run in session-wide daemons,
that apps talk to. This has many advantages: state (like caches,
authentication and charset settings) are shared between apps, we avoid
lots of connections to a server, and each app need not directly
link to all the weird i/o libraries needed. Of course, there is a risk
for bloat, but I think we can keep it pretty small by being careful
what the daemons link to.

There is one main daemon, started by demand by dbus, that handles
mounts and mounting. Each active mount is identified by a set of
key-value pairs. For instance, a samba share might be:

type=smb-share
server=bigserver
share=projectfiles
user=alexl

All mountpoints have the "type" key. Inside a mountpoint files are
addressed using plain (non-escaped) absolute (unix-style) pathnames.

A mount daemon register one or more mountpoints key-value pairs like
this with the main vfs daemon. The daemon then has an API that lets
applications look up a specific mountpoint, or list the currently
mounted ones.

There is also daemon calls to start a mount operation. The daemon
handle these by looking for type matches in a directory of .ini style
config files for the installed vfs backends. When it finds a match the
file can specify an executable to spawn, and/or a session dbus name to
talk to.

In general accesses to non-mounted locations will result in an error,
to avoid the problem with sudden authentication dialogs and unexpected
expensive i/o. However, its possible to tag some types of mount as
automountable. This is useful for things that should really "always"
be mounted, and that will never cause any dialogs. Examples like
network:, computer:, and fonts: come to mind.

Apps talk to the main daemon using the session bus, but most file
operations are done using direct peer-to-peer dbus connections between
the
clients and the mount daemons. Just send a request over the session
bus to set up such a peer-to-peer connection, then send normal dbus
messages over that connection instead. In most cases this should be
fast enough for normal use. However dbus isn't a good protocol for
bulk data transfer, so when we're actually reading or writing file
content we pass a unix socket fd over the dbus and use a custom binary
protocol over that. This lets us do file transfers very efficiently.

In general I think that we will still use URIs to pass file references
between apps when doing things like DnD, cut-and-paste or when saving
filenames in config files. It seems hard to change this at this time,
and it has some advantages in that other applications also understand
such URIs (to some extent, vfs uris aren't always exactly like web
uris). However, internally in gio we immediately map the URI to a
mountpoint spec (which might not be mounted yet) and a path, and all
path operations happens in this form. Think of URIs like a
serialization form.

The mapping from uri to mount spec is done by custom backend-specific
code. I arrived at this model after several false starts, and I think
its pretty nice. It  means the backends and the client implementation
get a very clean view of the world, and all the weirdness of strange
URI schemes like smb is handled totally in one place in the client
library code.

A large problem with gnome-vfs is that applications not specially
coded to use gnome-vfs will not be able to read non-local files. A
nice solution to this would be to use FUSE to let such apps use normal
syscalls to access the files. In general its quite tricky to map any
URI to a FUSE file, but with the mountpoint + filename setup gvfs uses
it is very easy to create a FUSE filesystem that lets you access all
the current vfs mounts. 

Backend design
==============

There is some helper code that makes it easy to implement a mount
daemon. The general design is to have a main thread running the glib
mainloop that handles all the incoming dbus (and other)
requests. Incoming requests are turned into GVfsJob subclass objects
which are then scheduled for running. Each job is run in two phases,
first the "try" phase which is called in the main thread. If the job
can be executed without any blocking i/o (i.e. if the results are in a
cache or if we can use real async i/o to implement it) you can return
TRUE and handle the job. Otherwise return FALSE, and the job will be
queued for synchronous execution on a job thread.

Its possible to limit the number of job threads used. For instance,
libsmbclient is not threadsafe, so the smb implementation uses only
one job thread, and has some locks protecting the caches in use. Any
operation that can be fulfilled from the cache is replied to
immediately, and all others go to the single thread that calls
libsmbclient (so, no smb locks needed).

All the GvfsJob types are implemented by the helper code, so all you
have to do is subclass the GVfsBackend class and fill out the methods
you want to support.

Status
======

Clearly this is far from finished. Many things, even essential things
like file writing, are not implemented yet, and the API in general is
not polished or full-featured. However, I think its showing most of
the core ideas of the new design, and is now at a state where it would
be nice with feedback and ideas.

Many things are not handled at all currently, like display names,
higher-level operation (copy file, rename file, etc), and mimetype
handling. These need to be designed and implemented. Other things
exist, but are not really that clean yet. These need polishing,
refactoring and bugfixing. One thing that I'm especially unsatisfied
with is the naming. There is just way too many "vfs", "daemon" and
"dbus" all over the place.

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Alexander Larsson                                            Red Hat, Inc 
                   alexl redhat com    alla lysator liu se 
He's a witless amnesiac cat burglar who hides his scarred face behind a mask. 
She's a bloodthirsty gold-digging opera singer trying to make a difference in 
a man's world. They fight crime!
Follow-Ups:
- Re: gvfs status report
  - From: Dimi Paun
- Re: gvfs status report
  - From: Xavier Bestel
- Re: gvfs status report
  - From: Stefan Kost
- Re: gvfs status report
  - From: nf2
- Re: gvfs status report
  - From: Hans Petter Jansson
- Re: gvfs status report
  - From: David Zeuthen
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]