Re: Making metadata storage SQL-driven
- From: Jamie McCracken <jamiemcc blueyonder co uk>
- To: Christian Neumair <chris gnome-de org>
- Cc: nautilus-list gnome org
- Subject: Re: Making metadata storage SQL-driven
- Date: Thu, 01 Sep 2005 12:10:15 +0100
Christian Neumair wrote:
I really wonder why we rely on files for our metadata. It has issues [1]
where a synchronous fopen and read or write operation would really take
too long, requiring us to schedule reads/writes. Without doing any
performance measures, I think this could significantly speed up our
metadata code for big directories. I think we should do the same as
beagle, and rely on a tiny SQL server, doing all our metadata operations
synchronously. I'm not a big fan of EAs, since they seem to be some
different flavors and inconsistencies among various implementations.
[1] http://makeashorterlink.com/?Y364247BB
you are correct and thats why Im building such a system for Gconf/common
config system, general metadata server, schematic storage (like winFS),
indexing and anything else that needs fast structured storage.
My system is called DDS (data desktop server) and is simply a dbus
wrapper around the embedded mysql DB. I have already analysed all the
other SQL embedded systems like SQLite and have found them inadequate
(SQLite is not guaranteed threadsafe on unix and also relies on
potentially broken file system locking to prevent corruption which is
too flaky especially on NFS mounted home directories).
I chose the embedded mysql lib because its a robust, high performance,
threadsafe (fully optomised for threads so hyperthreading/multiple cores
can utilise it too) and lightweight option. The embedded mysql is a
single library and the DB file can reside in the users home directory.
It is much smaller than the full mysql daemon and needs only 1MB ram and
4MB diskspace.
Ive only just started on it recently but it should be more than adequate
for all nautilus's metadata needs (including replacing all those dot
files in nautilus). In combination with my own indexer/metadata crawler
it should provide a full file metadata solution. As we can also store
blobs in a DB it will also store thumbnails for files as they change (IE
as notified from intoify/fam).
Performance wise, A btree indexed database will considerably outperform
text files especially if the database stores all data in a single file
(you then get the benefit of locality cache which can give you a 90%+
cache hit rate ratio and burst reads which result in minimal disk
seeks). To access a key in a btree DB, performance is O(log N) whilst
for a text file its O(N) or to put plain numbers on it for a million
record DB the time taken to get a key at random is 114 operations worst
case whilst for a text file its 1 million operations worst case (now you
all know why the current gconf sucks).
(btw beagle does not use Sqlite for indexing - its only used for storing
whether a file has been indexed as beagle cannot write EAs on files
that dont have write permissions. Beagle uses a propriety flat file (IE
one big fat table) structure for indexing which causes a lot of bloat
and severely limits its ability to store relational data efficiently
like RDF or contextual data which exhibits multiple many to one
relationships)
--
Mr Jamie McCracken
http://www.advogato.org/person/jamiemcc/
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]