Re: Making metadata storage SQL-driven

From: Jamie McCracken <jamiemcc blueyonder co uk>
To: Christian Neumair <chris gnome-de org>
Cc: nautilus-list gnome org
Subject: Re: Making metadata storage SQL-driven
Date: Thu, 01 Sep 2005 12:10:15 +0100

Christian Neumair wrote:

I really wonder why we rely on files for our metadata. It has issues [1]
where a synchronous fopen and read or write operation would really take
too long, requiring us to schedule reads/writes. Without doing any
performance measures, I think this could significantly speed up our
metadata code for big directories. I think we should do the same as
beagle, and rely on a tiny SQL server, doing all our metadata operations
synchronously. I'm not a big fan of EAs, since they seem to be some
different flavors and inconsistencies among various implementations.

[1] http://makeashorterlink.com/?Y364247BB

you are correct and thats why Im building such a system for Gconf/commonconfig system, general metadata server, schematic storage (like winFS),indexing and anything else that needs fast structured storage.

My system is called DDS (data desktop server) and is simply a dbuswrapper around the embedded mysql DB. I have already analysed all theother SQL embedded systems like SQLite and have found them inadequate(SQLite is not guaranteed threadsafe on unix and also relies onpotentially broken file system locking to prevent corruption which istoo flaky especially on NFS mounted home directories).

I chose the embedded mysql lib because its a robust, high performance,threadsafe (fully optomised for threads so hyperthreading/multiple corescan utilise it too) and lightweight option. The embedded mysql is asingle library and the DB file can reside in the users home directory.It is much smaller than the full mysql daemon and needs only 1MB ram and4MB diskspace.

Ive only just started on it recently but it should be more than adequatefor all nautilus's metadata needs (including replacing all those dotfiles in nautilus). In combination with my own indexer/metadata crawlerit should provide a full file metadata solution. As we can also storeblobs in a DB it will also store thumbnails for files as they change (IEas notified from intoify/fam).

Performance wise, A btree indexed database will considerably outperformtext files especially if the database stores all data in a single file(you then get the benefit of locality cache which can give you a 90%+cache hit rate ratio and burst reads which result in minimal diskseeks). To access a key in a btree DB, performance is O(log N) whilstfor a text file its O(N) or to put plain numbers on it for a millionrecord DB the time taken to get a key at random is 114 operations worstcase whilst for a text file its 1 million operations worst case (now youall know why the current gconf sucks).

(btw beagle does not use Sqlite for indexing - its only used for storingwhether a file has been indexed as beagle cannot write EAs on filesthat dont have write permissions. Beagle uses a propriety flat file (IEone big fat table) structure for indexing which causes a lot of bloatand severely limits its ability to store relational data efficientlylike RDF or contextual data which exhibits multiple many to onerelationships)


--
Mr Jamie McCracken
http://www.advogato.org/person/jamiemcc/

Follow-Ups:
- Re: Making metadata storage SQL-driven
  - From: Rodrigo Moya

References:
- Making metadata storage SQL-driven
  - From: Christian Neumair

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]