[Rhythmbox-devel] Better support for very large music collections

From: Forrest L Norvell <ogd aglaia aoaioxxysz net>
To: rhythmbox-devel gnome org
Subject: [Rhythmbox-devel] Better support for very large music collections
Date: Fri, 28 Feb 2003 15:07:20 -0800
Hello, everyone. I'm a newish user of Rhythmbox, having recently sat
down and audited the available alternatives and deciding that
Rhythmbox has the best chances for doing what I need in the long
term. I am at this point blissfully unencumbered by knowledge of the
source (although I'm more than willing to get my hands dirty to help
out), so some or all of the points I raise in this e-mail may already
be in the queue; I checked Bugzilla and didn't see much along the
lines of what I'm looking for. Because I don't have a really clear
idea of what I need, I thought I'd post a note here and see what
everyone thinks before I start posting RFEs. This is kind of long,
both because I'm naturally verbose and because I've been thinking
about this for a while. Those of you who bear with me will have my
gratitude.

I really like Rhythmbox; it's got a straightforward, uncluttered
interface, it follows Gnome 2 standards well, and it's not trying to
do too many things at once. Its internationalization support is
excellent (I have a large number of Japanese tracks, the data for
which was entered in iTunes and displayed without a hitch; I'm going
to steal the import code for my rewrite of Netjuke).  It has the key
feature of iTunes that I've been wanting in a free media player
(i.e. having a library rather than a motley assortment of
playlists). It doesn't have a zillion random plugins, and it uses
libraries sensibly. Given the current state of other players, I'm
likely to stick with Rhythmbox for the foreseeable future.

Before I get into the nitty gritty of my problems with Rhythmbox and
some possible solutions to them, let me tell you a little about my
intentions and setup:

My overall goal is to reproduce as much of my music collection online
as I can (for a variety of reasons: I need more fuel for my iPod, I'm
planning on purchasing Final Scratch (http://www.finalscratch.com/)
later this year, I'm considering moving to Japan and don't want to
haul my collection with me, and I spend a good chunk of time
evangelizing new music to my friends). This is a substantial project,
as I have somewhere betweeen 2,500 and 5,000 CDs and another 1,500+
pieces of vinyl, mostly techno and drum'n'bass 12s and EPs (I'm not
worrying about tapes right now; it's too painful to contemplate).

I'm running Colin Walters's net-rhythmbox package on a Debian unstable
system. My music library is an NFS-mounted share from my fileserver
(running Debian stable), and consists of a constantly-growing archive,
currently at about 30GiB (~4800 mp3 files). I'm using Grip with lame
--r3mix to encode my files, after trying abcde, jack, and various KDE
apps (if anyone has a better solution for ripping and encoding, I'd
love to hear about it. Just because it's the best thing I've found
doesn't mean that I like it). I would love to be encoding to Vorbis
instead, but neither Final Scratch nor the iPod support that format,
and I have it on reasonably good authority from friends at Apple that
Ogg support is not in the cards for iWhatever anytime soon. Every so
often I go through my library with Easytag to clean up metadata and
try to ensure that all of my files have useful names and readable
ID3vX tags. Finally, the two main clients of the archive are the
aforementioned net-rhythmbox and Netjuke as a web front end (which I'm
in the process of rewriting in Perl; I'm also replacing easytag with a
set of Perl scripts to synchronize the metadata in the database with
what's in the file tags, which is actually proving to be a pain due to
Perl's poor support for ID3v2.x).

What appeals to me about net-rhythmbox first and foremost is its
simplicity. What bugs me most at this point is its performance. I've
already learned that when building the initial library (as I've had to
do several times; see below), it pays to choose an artist and / or
genre early on, so the main library view doesn't spaz out trying to
keep up with the volume of incoming data. Even so, loading the library
pegs the CPU for a good 30 seconds, and switching from a view narrowed
to an artist to the Library view takes 10-15 seconds. My machine isn't
bleeding edge, but it's an all-SCSI box with a 700MHz Athlon and
256MiB of RAM and not a whole lot else to do.

By contrast iTunes takes about the same amount of time to start up at
first (on a 500MHz TiBook), but switching between views is about as
instantaneous as anything is in MacOS X, even when the machine is
busy. Obviously iTunes is using some kind of disk-based view of its
library, and I'd love to know what sort of twisted design process led
to it being responsible for maintaining both its own database *and* an
XML file, both containing the same data. But the net result is that it
feels pretty snappy, although it's beginning to creak a little under
the load of my current library, and I understand that it breaks down
completely once it gets to the 32K song limit (which I anticipate I'll
reach before too long).

Here's my first question: how tough would it be to make Rhythmbox pull
its metadata from an RDBMS in some sort of cursor-based (incremental)
fashion? I'm sure it's possible to make it burly enough to run
acceptably with an enormous library in memory, but why bother? This is
the sort of thing DBMSes are designed to handle. Even moving to a
Berkeley DB could improve performance tremendously. Providing that as
an optional back end would be way cool, and I'd be happy to assist in
the development of such a thing.

I have my own selfish reasons for wanting metadata in a database: it
seems silly to have that information in more than one database. I'm
pretty anal about my metadata, and all anal metadata maintainers
quickly grow exasperated with the inadequacies of in-file tags. In the
absence of a workable, general cataloging system for music, we're all
going to have to develop our own classification schemes, and this need
is very poorly served by current tools -- in particular, due to the
minimal ability to put information in the files themselves,
synchronizing metadata between tools is a godawful pain in the ass,
requiring a lot of gross XML file munging and many homegrown Perl
scripts. If it were all in the DB, this wouldn't be so much of an
issue. I could write an adaptor layer for whatever permutation of
Netjuke I end up writing for myself. Netjuke has a quirky interface,
but its batch editing mode is very useful.

Also, if that information were manageable via a database, it would
lessen the need to add id3 / ogg tag editing to rhythmbox itself
(yeah, adding direct support for editing is inescapable in the long
run; my own feeling is that it's not worth it unless you can come up
with an interface better than what WinAmp and XMMS have, and at least
on par with iTunes).  Updating a record in a database is cheap; adding
an id3v2 tag is expensive. It would be easy to write scripts that act
as clients to the database view and synchronize metadata in the
files. I know this because I'm doing it, and it's really not that
painful.

My secondary concern is stability, particularly with large libraries.
I'm not so worried about this because I know that rhythmbox is still
pre-beta software. However, on a couple of occasions, I've done
something to break my library file in such a way that I can no longer
start rhythmbox. This isn't a super big deal (besides losing ratings),
as I just re-add the library and wait a couple minutes, but, y'know,
it's still kinda lame. And sometimes, for no reason that I can
determine, the program starts eating CPU like a madman for 10-30
seconds. I used to think this was because it was importing what new
files I'd added to the library, and I think it was at one point, but
the 0.4.5 version of net-rhythmbox doesn't appear to be adding new
files to the library, as far as I can tell.

Has FAM support been added back into rhythmbox? Does FAM play well
with NFS-mounted directories these days? Not having to manually import
each new chunk of stuff would be a definite plus.

One last thing: the single most useful feature in Netjuke is its quick
search box. In general, I think browsing is far superior to searching,
but in large collections, it's a lot faster (at least on the hardware
I have available) to type in a search term than to narrow onto an
artist, and it's also a cheap way to group related artists ("Krom"
gets you Kromozone, DJ Krome & Mr Time, and Krome & Time, all of which
are the same two people -- try doing that in Rhythmbox right now).

Besides that, here are some totally blue-sky features for you to think
about, probably for somewhere far beyond a 1.0 release:

  o Richer metadata would be great:
    + Defining relationships between genres
    + Defining relationships between performing groups (ex.: Mark
      Hollis and Talk Talk share members, it would be neat to be able
      to show that in a browser somehow)
    + Sorting by composer rather than performer.
    + All current media players for all platforms, without exception,
      do a terrible job of handling classical music, which uses a
      completely different set of metadata that's not at all served
      well by current media-file tagging formats. I want to be able to
      view by performance, work, composer, conductor, ensemble, and
      opus; I want pieces to be clumped together so I can see all the
      instances of, say, Shostakovich's 9th Symphony, without having
      everything broken up into movement-sized chunks. Right now
      serious classical music listeners are still too anal to have
      moved to lossy file formats, but as things like high-capacity
      personal media players and the SliMP3 become commonplace,
      there's going to be more of them around, and they're going to be
      cranky about this.

  o Man, it would be totally rad to be able to control cdparanoia /
    libcdaudio and lame / oggenc directly from rhythmbox, along with a
    complete browser for the queue of tracks being ripped / encoded
    (there's a rough interface of what I'm looking for in the largely
    vaporware krabber). This is probably better handled in another
    tool (Grip is a decentish base for development, even though its
    code is kind of a mess), but if you're going to go the iTunes
    all-in-one route, this would be my preferred way to go.

  o It would be totally epic to be able to rip tracks to *both* MP3
    and Vorbis at the same time; disk space is cheap, and if Vorbis
    support gets prevalent enough in the future that I don't have to
    use MP3s anymore, it would be nice to avoid the tedious process of
    ripping everything again.

The real power of Rhythmbox lies in its simplicity; I fervently
believe that it's possible to present musical metadata simply, and
there's no player out there that gives me everything I need. Most of
the media players out there are obsessed with skins, visualizers, and
offering five zillion plugins, rather than actually focusing on music
and musical information. I would love to do the work to make Rhythmbox
suit my needs, as long as I can figure out a way to make my vision
match up with the team's. My one concern with the new UI (at least the
mockups I've seen) is that UI cruft seems to be sneaking in (in
particular, the four "play album / track" buttons). Keep it simple and
I'll love you forever.

Thanks for reading this far,
Forrest Norvell

-- 
       . . . the self-reflecting image of a narcotized mind . . .
ozymandias G desiderata     ogd@aoaioxxysz.net     desperate, deathless
(415)558-9064        http://www.aoaioxxysz.com/          ::AOAIOXXYSZ::
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]