Media library (summer of code)
- From: tobutaz <tobutaz gmail com>
- To: gnome-multimedia gnome org
- Cc: amarok-devel lists sourceforge net, banshee-list gnome org, muine-list gnome org, musicpd-dev-team lists sourceforge net, snorp snorp net, rhythmbox-devel gnome org, murrayc gnome org
- Subject: Media library (summer of code)
- Date: Sat, 20 May 2006 21:44:58 +0200
This is a (late) attempt to discuss a library / desktop service I have
proposed at google's summer of code.
I just noticed murrayc had commented on it, which means I may upgrade
the proposal depending on your reactions.
The gist of the proposal (more below) is a media library that is able
to handle file moves, edits (think retaggings), new files and hotplug
events transparently. The metadata sniffing problem is not addressed,
however the application can use the library to store its metadata, or
just an external reference to it. The proposal also includes a tool
for the user to setup its desktop-wide music database, and adapting at
least one application to use the service.
I'd like to know if that is something you would want to integrate in
your application; this is particularly aimed at music players and
jukeboxes, but there may be other uses (sync tools come to mind).
Gabriel de Perthuis (include me or gnome-multimedia in replies)
- - 8< - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
The aim of this project is to provide media players with a media
library tracking the contents of part of the filesystem. As a (useful)
side-effect, a generic filesystem watch library will be implemented.
== Context, problem ==
Thanks to gstreamer, lots of media players have appeared, with some
creative interfaces, all of them able to play about any music format
available on linux.
Most of them, even the jukeboxes, are unable to track a media library
very well; the ones that make an effort in that direction (rhythmbox,
banshee can even use inotify) do not watch all the files, and do not
support files moves and retagging done outside of them. Others have to
be relaunched or asked to reimport folders to get notified of changes.
== Solution ==
The proposed solution can be split into several parts:
- filesystem notifications to keep an up-to-date view of the filesystem;
- a library exposing a set of directory trees (that may be on
different mounts), as a flat set of files, with attributes including
the path, the mount, and a hash;
- a configuration tool for a desktop-wide music database (include a
few directories, and any removable media HAL reckons is a music
- adapting applications to take advantage of this, a transitional way
being to write a DAAP server with the configured music (this is only
transitional as far as jukeboxes are concerned because it prevents
them from accessing the files for retagging or moving them)
= Filesystem watching =
It is possible to watch the filesystem recursively using inotify and
the 'lovebridge algorithm'. Since inotify doesn't support recursive
notifications, a more efficient (but less widely available, being an
out-of-tree kernel module) solution is rlocate; other in-kernel
solutions are appearing.
It is important to note that inotify alone isn't enough; inotify
listeners must be set for each 'interesting' directory, which can grow
to a very large list; applications using inotify just to get notified
of new files do not work well (like rhythmbox and banshee), either
because they didn't watch the right directories in the first place, or
because they are unaware of moves, new directories, or file
modifications. There are consistency issues when dealing with
recursive changes such as a directory being moved. This solution is
more high-level, and potentially useful to more applications.
= Metadata extraction =
Extracting music attributes is best left to what players already do;
the state of the art being an out-of-process gst-typefind.
Clients need to be able to identify a file uniquely, despite its
moves; to be able to work asynchronously, eg to keep up after a crash
or a reboot, we must be able to recalculate the id from existing
files; the simplest idea is to use a checksum and index by content,
but it is also possible to do a checksum on just the music stream
(using eg the hachoir), use a musicbrainz id, or an inode number. It
is reasonable to ask clients to calculate their own notion of ids
themselves, and use it to tell them if the file that just appeared was
newly created, copied or moved.
= Client API =
The view presented to the client is of a flat set of files. File
attributes are their path, their mountpoint, some client-given ids or
cookies (if the client decides to offload its metadata database here).
Notifications are given to subscribed clients using an inotify-like api:
clients receive transactions made of a batch of property changes; when
subscribing clients may set event masks to filter out properties they
don't want to receive notifiations for. A rename is a property change
(the path changes). Additionally, events may be retrieved
asynchronously to catch up when the application has been inactive,
which may prompt a disk scanning if the library has been inactive too.
Client API subtleties:
There may be several files and paths for a given id; even if client
ids are content hashes, an identical copy may appear at several paths.
A file may appear or disappear. A disappeared file is just a file with
a 'disappeared' attribute set; it will still be kept, along with
external metadata such as its rating and listening history, because it
may have been moved to a portable device.
The library has no notion of music, or playback, at all; the external
metadata ('cookies') and ids it receives are opaque data handed over
In case of a folder rename, it may be impractical to send path changes
for every element; clients are encouraged not to subscribe to path
changes, except for files whose path is currently displayed, or which
are being played.
== Other consumers ==
Clients shouldn't be limited to music clients.
Filesharing applications also need to watch lots of files, just like
beagle, synchronizer applications (unison, rsync currently can't
detect moves), and version control tools (detecting file moves is
important for transparently versioning a home directory). These
applications could very well become more important than a media
library, but they are long-term for now.
== Deliverables ==
I plan to
- implement the system notification service and the library API. This
will involve inotify (or rlocate or another kernel patch if they find
their way into a major distro), HAL for mounts/umounts, and a
lightweight database for holding ids and external metadata.
- make the API available in C (possibly GObject) for language bindings
- implement configuration for a desktop-wide music library, in GConf
keys and a dialog
- use the above in one of: a Banshee source, a Rhythmbox source,
amarok, the MPD daemon, a DAAP server (Tangerine based?)
- publish the project on a neat wiki, using possibly live.gnome.org or
a trac, and make progress visible to anyone.
== Limitations ==
The use of inotify means it won't be easily ported to windows or OSX,
although windows has some notion of recursive notifications. However,
this project could become an inotify abstraction layer.
== References ==
A blogpost about the need for recursive filesystem notifications:
The above links to a kernel patch:
rlocate kernel module (asynchronous, recursive filesystem notifications):
Beagle, a tool for metadata extraction, indexing and querying:
Tangerine, a lightweight, standalone DAAP (iTunes music library) server:
The hachoir, a file splitter/structure analyser/metadata extraction tool:
GStreamer, a media playback framework:
Banshee, a jukebox:
Rhythmbox, a jukebox:
amaroK, a jukebox:
MPD, a music player daemon:
Unison, a safe bidirectional file synchronizer:
] [Thread Prev