Metadata systems (again :)



Okay, I've been thinking about this stuff for most of the day, and the more
I think about it, the nastier it gets :)

I have a few ideas of my own I'd like to toss into the pot for people...

The first is, rather than full-pelt jumping in and designing the metadata
system, what about looking at the interface required for the system?  What
I'm thinking is, assuming we're going to have a 'libgfdb.so' (gfdb =
gnome file database), what about defining the operations that library is
going to have to satisfy - that way, we can look at getting _something_
in production, without getting bogged down on exactly how it all works.
We make the assumption that, at least for gnome programs, we can link
this library in and use things like 'gfdb_add(inode,method,action)',
'gfdb_query(inode,method)', etc.  That way, gnome applications can start
to work with this, and we still leave the door open for pretty much any
implementation we like at a later date (including preloads, if that's
deemed necessary).

Now, my suggestion for the underlying databasing:

What about a .db file (or similar) per user, stored in the user's homedir
somewhere, that refers to files by _either_ hash or regexp of filename,
choosable by the user when the association is made?  This way, if you want
to associate a bunch of log files with an application (or even one log
file), you can specify it by filename ('cause logfile _names_ don't generally
change, but theyir content does), but if you want to specify that 'huge.png'
gets opened with a different app to all other '.png' files, you can specify
it by hash - and it'll hold no matter whether you rename it or move it
or whatever.

Obviously, there are some issues to be addressed with that - how do you
know which match is more exact, that sort of thing - but they _are_
addressable, I think.  The synchronisation problems are still there
(as they are with anything), but by allowing the user to pick the method,
we can gain a bit, and by defining a gfdb library, we can then write some
wrapper programs (gfdbrm, gfdbmv gfdbcp - you get the idea) that simply
call the appropriate library routines - those little programs are then
easy to integrate into scripts, and easy to port from machine to machine
and environment to environment - _and_, it give users the chance to
customise their other desktop environments and programs to use the
same system - which makes it a lot easier to integrate this system into
anything else you're doing, and thereby helps dodge the desynch problem.

Resynchronisation of the database is a nasty question, unfortunately -
if a file has been deleted, and the meta-info is still there, we need
to be able to clean it out at some stage.  The only way I can see,
unfortunately, is to run a cron job periodically that scans across the
whole drive system for each entry in each users database - messy, ugly,
bad:(  Alternatively, we provide a really nice GUI for 'association management'
- I think that's almost a requirement of any of the systems, if people
want to tinker with their associations, we should give them a really
neat tool to do that with.  Luckily, if we make the library and a bunch
of wrapper programs, we can even tinker with it from command line,
which I definately like :)

So, does any of the above make sense?  I guess my big suggestion is
that we seperate the actual databasing from the interface for databasing
a little bit, so that we can define the interface, and let gnome programs
start to use this interface, while we thresh out the underlying
question of 'which databasing system is superior'...
Hell, then we could even swap and change our databasing system to suit
ourselves, if it's done properly...

KevinL



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]