Re: [Tracker] Things that currently sux with tracker



Jamie McCracken wrote:
I've noticed when indexing *large* amounts of data that a lot of disk thrashing is taking place which is greatly slowing down performance of both tracker and the system in general.

Also the nice +10 is not throttling enough (I dont have ionice in my kernel so I dont know how good a job that does) so I will probably add some sleeping intervals to smooth things out and keep cpu usage low (with a --turbo command line option to disable this for those that want faster indexing)

The cause of the slow down is heavy fragmentation of the file based hash table.

Having indexed 30GB of stuff, the optimization routine shrank the full text index from nearly 300MB to 20MB which means a massive 280MB of fragmentation had occurred - this is obscene!

I note other indexers do not update the hash table directly but cache the data in memory and then bulk upload it to reduce fragmentation and lessen the performance hit. The disadvantage of this is searches for newly indexed content wont appear until the cache is uploaded to the hash table. (we could upload every 10-15 mins or something - infrequent words should be updated more quickly though)

Why not hold back updates, but force flush to disk if a search is called?


As we are memory conservative, I am planning to do something similiar but using sqlite (instead of precious memory) to cache new files and then bulk upload. We could easily cache the data for many thousands of files before uploading them.

If I remember correctly sqlite3 has some built in cache stuff, you might wanna tweak the standard values a bit.


We can actually do better than others here because firstly we are not using any more RAM so can therefore have much bigger caches and secondly unlike other indexers which upload all at once (which often causes a cpu spike) we can do it incrementally in sqlite.

And no sqlite will not fragment as its btree based and not a hash table (btrees are much faster to update then hashes) and we will use a seperate db file which can be deleted when finished.

Will be experimenting on this tonight. There will be a few race conditions to handle with this but its nothing too complex.

Looking forward to it :)


I am determined to get tracker running as smooth as a baby's bottom!






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]