Re: [Tracker] Things that currently sux with tracker
- From: Anders Aagaard <aagaande gmail com>
- To: tracker-list gnome org
- Subject: Re: [Tracker] Things that currently sux with tracker
- Date: Thu, 12 Oct 2006 18:03:16 +0000
Jamie McCracken wrote:
I've noticed when indexing *large* amounts of data that a lot of disk
thrashing is taking place which is greatly slowing down performance of
both tracker and the system in general.
Also the nice +10 is not throttling enough (I dont have ionice in my
kernel so I dont know how good a job that does) so I will probably add
some sleeping intervals to smooth things out and keep cpu usage low
(with a --turbo command line option to disable this for those that want
faster indexing)
The cause of the slow down is heavy fragmentation of the file based hash
table.
Having indexed 30GB of stuff, the optimization routine shrank the full
text index from nearly 300MB to 20MB which means a massive 280MB of
fragmentation had occurred - this is obscene!
I note other indexers do not update the hash table directly but cache
the data in memory and then bulk upload it to reduce fragmentation and
lessen the performance hit. The disadvantage of this is searches for
newly indexed content wont appear until the cache is uploaded to the
hash table. (we could upload every 10-15 mins or something - infrequent
words should be updated more quickly though)
Why not hold back updates, but force flush to disk if a search is called?
As we are memory conservative, I am planning to do something similiar
but using sqlite (instead of precious memory) to cache new files and
then bulk upload. We could easily cache the data for many thousands of
files before uploading them.
If I remember correctly sqlite3 has some built in cache stuff, you might
wanna tweak the standard values a bit.
We can actually do better than others here because firstly we are not
using any more RAM so can therefore have much bigger caches and secondly
unlike other indexers which upload all at once (which often causes a cpu
spike) we can do it incrementally in sqlite.
And no sqlite will not fragment as its btree based and not a hash table
(btrees are much faster to update then hashes) and we will use a
seperate db file which can be deleted when finished.
Will be experimenting on this tonight. There will be a few race
conditions to handle with this but its nothing too complex.
Looking forward to it :)
I am determined to get tracker running as smooth as a baby's bottom!
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]