[Rhythmbox-devel] rhythmdb post-mortem and future

From: Colin Walters <walters verbum org>
To: rhythmbox-devel gnome org
Subject: [Rhythmbox-devel] rhythmdb post-mortem and future
Date: Fri, 14 Nov 2003 17:22:51 -0500
So I think at this point we can call RhythmDB a success, but it wasn't
as big a one as I had hoped.  There are still issues with locking,
although fewer than with RBNode.  Also it will need reworking to solve
issues like filesystem coherency.  So, let's investigate.

* Locking
One major problem is doing a rhythmdb_write_lock from the main (GTK+)
thread.  What happens now, when you click on an artist or album (in the
library), the main GTK+ thread grabs the a read lock on the db, prepares
some data structures for the query (specifically the genre/artist/album
bits), and then read unlocks and kicks off a query.  

The query runs in a separate thread.  It first grabs a read lock on the
db itself, and starts pushing data into the query model into an async
queue.  The main thread occasionally polls the query model for updates. 
Finally, when the query is complete, a special entry is put into the
query model update queue, and when the main thread polls, it gets the
completion signal and resets the genre/artist/album stuff back to
normal.  The query thread, meanwhile, is waiting for acknowledgment that
the main thread got the completion, and when it gets it, it read unlocks
the db and returns.

Several problems here.  One, is there's a race condition between the
read unlock in rb_library_source_do_query and when the db is read locked
in query_thread_main.  So a write could sneak in there, which would be
bad because it would probably corrupt the refcounts on the properties.

Secondly, what happens if you try to do something that requires a write
lock on the db from the main thread while a query is in progress?  An
example is creating a new iradio station.  That will result in deadlock,
since the write lock won't succeed until the db is read unlocked by the
query thread, but the query thread won't read unlock until the main
thread acknowledges the query completion (or cancels it).

A smaller problem, but still annoying is that since the query runs in
another thread, at least two context switches are required before you
start to see results.

Really, the whole thing is just too complicated (you may have guessed I
was leading up to this).  Let's take a step back here.
Why run the query in a separate thread?  The only advantage here really
is that you'll get a speedup on SMP systems.  But how much of one?  I
was looking at some profiles, and the time spent in the query thread was
almost invisible.  All the real work is pretty much done in the main
thread anyways, like inserting into data structures and redrawing the
treeview.

The conclusion I've come to though is that if Rhythmbox ever locks up or
crashes because of something like this, we have lost all the advantages
we gained.  Stability is a lot more important than a slight speed
boost.  I want Rhythmbox to be totally rock solid, something you feel
you can always rely on.  Some of the crash reports in Bugzilla of course
are really GStreamer, but we have a lot of our own.

So I want to vastly shrink RhythmDB's use of threads.  I now envison
just 2 threads (apart from GStreamer).  The first is the main GTK+
thread.  The second is a thread that just does filesystem work - it will
read metadata from files, watch directories for changes, recursively
read directories, and the like.  Both threads have GAsyncQueues where
they receive updates from the other thread.

The advantage of this scheme is that we don't need any kind of locking
inside the main GTK+ thread.  Basically the GDK lock and the db lock are
merged.  The main thread knows that the database will never change under
its feet when it holds the GDK lock.  It can also do writes (like create
a new iradio station) without any locking.
From a periodic idle handler, the main thread will poll the fs thread
for changes.  It will then perform any actions necessary in the models.

How will incremental queries work?  I envision this just being an idle
handler (not a thread) that walks the db and pushes entries into a
model.  This makes cancelling a query a whole lot easier than the
current dance we go through - with an idle handler, we could just remove
it from the glib main loop, and we're done.

Making this change shouldn't be incredibly difficult; the hardest part
will be turning the threaded query into an idle handler.


* Filesystem coherency

In summary:
http://bugzilla.gnome.org/show_bug.cgi?id=125177
This is also related to:
http://bugzilla.gnome.org/show_bug.cgi?id=125452

Here are my thoughts on this.  Currently, RhythmDB just stores a set of
songs (set in the mathematical sense, so no ordering at all).  Each song
has a URI associated with it; RhythmDB maintains a bijective mapping
between URIs and songs.

One issue with this is that RhythmDB has no notion of directories.  But
it should, because this allows us to make some important optimizations. 
For example, as I mentioned in the bug, we can just stat() each toplevel
directory on startup, and if those haven't changed, we're done.

More generally though, we want to monitor directories for updates at
runtime as well.  So what we really need is for RhythmDB to maintain a
specialized kind of indexed filesystem, and to keep it in sync with the
real filesystem.  The RhythmDB filesystem will never be written to
directly.  When the user clicks delete on a song, we will simply delete
it from the real filesystem; then, the fs thread will eventually notify
us of this deletion, because it'll be watching the directory.

RhythmDB should primarily store (filesystem, inode) pairs, for both
directories and files.  That's sufficient for unique identification. 
These are the "device" and "inode" members of a GnomeVFSFileInfo.  Note
that storing things this way solves the recursive symlink problem; if
during a recursive directory traversal we hit a (filesystem, inode) pair
that's already in the DB, we just ignore it.

However, this doesn't work for non-local filesystems.  So we'll still
have to store URIs for those.  So I am thinking the structure will look
something like this:

struct RhythmDBFile
{
  gboolean is_local;
  gboolean is_dir;
  dev_t device;               /* Both of these are invalid if */
  GnomeVFSInodeNumber inode;  /* is_local is false */
  time_t mtime;
  char *name;  /* UTF-8 */
  union {
    GHashTable *dirents;  /* used if this is a dir */
    RhythmDBEntry *entry; /* used if this is a song */
  } data;
};
 
Now, RhythmDB stores a set of RhythmDBFiles that are the "roots".  It
also stores a hash table which maps (device, inode) pairs to
RhythmDBFiles.  This is because updates could come in two forms; our fs
monitoring thread will tell us things like (3, 3532523) has been
deleted.  Then we need to look that up in the hash table, map it to a
RhythmDBEntry, and signal the main thread that that entry has been
deleted.  

But we also need to be able to handle plain URIs; as I mentioned for
remote filesystems we don't have inodes.  So the strategy there will be
to try matching the URI against each root in turn.  If there's a match,
then we recursively traverse the directory pointers, matching up URI
components, until we get to a RhythmDBEntry.  This should be reasonably
efficient.

So, thoughts?
This is a digitally signed message part
Follow-Ups:
- Re: [Rhythmbox-devel] rhythmdb post-mortem and future
  - From: Bastien Nocera
- Re: [Rhythmbox-devel] rhythmdb post-mortem and future
  - From: Joshua Haberman
- Re: [Rhythmbox-devel] rhythmdb post-mortem and future
  - From: Benjamin Otte
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]