Re: Making metadata storage SQL-driven



Christian Neumair wrote:
I really wonder why we rely on files for our metadata. It has issues [1]
where a synchronous fopen and read or write operation would really take
too long, requiring us to schedule reads/writes. Without doing any
performance measures, I think this could significantly speed up our
metadata code for big directories. I think we should do the same as
beagle, and rely on a tiny SQL server, doing all our metadata operations
synchronously. I'm not a big fan of EAs, since they seem to be some
different flavors and inconsistencies among various implementations.

[1] http://makeashorterlink.com/?Y364247BB



you are correct and thats why Im building such a system for Gconf/common config system, general metadata server, schematic storage (like winFS), indexing and anything else that needs fast structured storage.

My system is called DDS (data desktop server) and is simply a dbus wrapper around the embedded mysql DB. I have already analysed all the other SQL embedded systems like SQLite and have found them inadequate (SQLite is not guaranteed threadsafe on unix and also relies on potentially broken file system locking to prevent corruption which is too flaky especially on NFS mounted home directories).

I chose the embedded mysql lib because its a robust, high performance, threadsafe (fully optomised for threads so hyperthreading/multiple cores can utilise it too) and lightweight option. The embedded mysql is a single library and the DB file can reside in the users home directory. It is much smaller than the full mysql daemon and needs only 1MB ram and 4MB diskspace.

Ive only just started on it recently but it should be more than adequate for all nautilus's metadata needs (including replacing all those dot files in nautilus). In combination with my own indexer/metadata crawler it should provide a full file metadata solution. As we can also store blobs in a DB it will also store thumbnails for files as they change (IE as notified from intoify/fam).

Performance wise, A btree indexed database will considerably outperform text files especially if the database stores all data in a single file (you then get the benefit of locality cache which can give you a 90%+ cache hit rate ratio and burst reads which result in minimal disk seeks). To access a key in a btree DB, performance is O(log N) whilst for a text file its O(N) or to put plain numbers on it for a million record DB the time taken to get a key at random is 114 operations worst case whilst for a text file its 1 million operations worst case (now you all know why the current gconf sucks).


(btw beagle does not use Sqlite for indexing - its only used for storing whether a file has been indexed as beagle cannot write EAs on files that dont have write permissions. Beagle uses a propriety flat file (IE one big fat table) structure for indexing which causes a lot of bloat and severely limits its ability to store relational data efficiently like RDF or contextual data which exhibits multiple many to one relationships)

--
Mr Jamie McCracken
http://www.advogato.org/person/jamiemcc/



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]