Re: [Tracker] Things that currently sux with tracker

From: "Marcus Fritzsch" <fritschy googlemail com>
To: "Jamie McCracken" <jamiemcc blueyonder co uk>
Cc: Tracker List <tracker-list gnome org>
Subject: Re: [Tracker] Things that currently sux with tracker
Date: Thu, 12 Oct 2006 20:29:50 +0200

Hi Jamie,

On 10/12/06, Jamie McCracken <jamiemcc blueyonder co uk> wrote:

I've noticed when indexing *large* amounts of data that a lot of disk
thrashing is taking place which is greatly slowing down performance of
both tracker and the system in general.

Also the nice +10 is not throttling enough (I dont have ionice in my
kernel so I dont know how good a job that does) so I will probably add
some sleeping intervals to smooth things out and keep cpu usage low
(with a --turbo command line option to disable this for those that want
faster indexing)

The cause of the slow down is heavy fragmentation of the file based hash
table.

Having indexed 30GB of stuff, the optimization routine shrank the full
text index from nearly 300MB to 20MB which means a massive 280MB of
fragmentation had occurred - this is obscene!


Yep, with the new tracker I reindexed all my files - the index was
about 630MB before optimization, and 31 MB (!) after... I observed up
to 80% iowait. It is even worse, when indexing more files than that,
my machine stands still every now and than!!!

I note other indexers do not update the hash table directly but cache
the data in memory and then bulk upload it to reduce fragmentation and
lessen the performance hit. The disadvantage of this is searches for
newly indexed content wont appear until the cache is uploaded to the
hash table. (we could upload every 10-15 mins or something - infrequent
words should be updated more quickly though)


Even 5 minutes should be fine I think.

As we are memory conservative, I am planning to do something similiar
but using sqlite (instead of precious memory) to cache new files and
then bulk upload. We could easily cache the data for many thousands of
files before uploading them.

We can actually do better than others here because firstly we are not
using any more RAM so can therefore have much bigger caches and secondly
unlike other indexers which upload all at once (which often causes a cpu
spike) we can do it incrementally in sqlite.

And no sqlite will not fragment as its btree based and not a hash table
(btrees are much faster to update then hashes) and we will use a
seperate db file which can be deleted when finished.

Will be experimenting on this tonight. There will be a few race
conditions to handle with this but its nothing too complex.

I am determined to get tracker running as smooth as a baby's bottom!


Yes, please :)



Best regards and good luck, Marcus

References:
- [Tracker] Things that currently sux with tracker
  - From: Jamie McCracken

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]