Re: question on performance evaluation of beagle's indexing



Hi,

> I was wondering how you currently evaluate beagle's
> index performance, and decide on tradeoffs between
> different indexing options? Do you have some sort of
> in-house case base of test file systems?

Usually beagled (the indexer component of beagle) slows itself down to
not consume CPU continuously. You can disable this internal scheduling
by setting the environment variable BEAGLE_EXERCISE_THE_DOG. In that
case you might also want to set "--indexing-delay 0" to start indexing
right away; there is usually a 60sec gap (the option could be set
automatically by the EXERCISE_THE_DOG setting, not sure).

Beagle stores its bookkeeping information per file in the extended
attributes of the files or if that fails, in an sqlite database. There
is a folklore that using the extended attributes is significantly
faster than the sqlite database (disk access vs sqlite access). But
depending on disk I/O speed and other system load, sqlite access could
sometimes be faster IMO. For simplicity we ignore the tradeoff and
always suggest and prefer extended attributes over sqlite. You can
force sqlite by setting BEAGLE_DISABLE_XATTR.

Being a long running desktop process, not everything is tuned for
maximum speed though; we try to strike a balance between speed and
system resources.

Beagle uses XML messages over a Unix socket for IPC. Setting
MONO_XMLSERIALIZER_THS=0 would give you a faster IPC (you need to have
gmcs installed for this to work).

For effectively read-only filesystems (e.g. system documentation
directory, or backup directory), instead of using beagled and its live
filesystem backend (called "Files"), you can build a read-only index
using beagle-build-index. The read-only index can be added to beagled
for querying only, changes will not be monitored; you have to rerun
beagle-build-index to update the index with recent changes.

You can control which backends to start with beagled. Backends are the
different data sources e.g. "Files" for live-filesystem, "Opera" for
Opera browsing history, "KMail" for KMail emails etc. You can either
use "--backend" option to beagled or disable certain backends
permanently using beagle-settings. Disable the unused backends will
certainly boost performance but by not much. Pass "--backend none" to
not start any backend e.g. with "beagled --add-static-backend
/path/to/static/backend --backend none" will only query the read-only
static index.

I cant off the top of my head remember any other indexing option that
will affect indexing performance.

For testing, we have a inhouse stress testing and file system
correctness checker (trunk/testsuite/bludgeon/) - it creates different
directory structures, fills them with random files and indexes/queries
them looking for inconsistency.

I hope that answered your question at least partly. If there is
anything more you are looking for, please let me know.

- dBera

-- 
-----------------------------------------------------
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]