Re: [Tracker] Things that currently sux with tracker

From: Anders Aagaard <aagaande gmail com>
To: tracker-list gnome org
Subject: Re: [Tracker] Things that currently sux with tracker
Date: Thu, 12 Oct 2006 18:03:16 +0000

Jamie McCracken wrote:

I've noticed when indexing *large* amounts of data that a lot of diskthrashing is taking place which is greatly slowing down performance ofboth tracker and the system in general.
Also the nice +10 is not throttling enough (I dont have ionice in mykernel so I dont know how good a job that does) so I will probably addsome sleeping intervals to smooth things out and keep cpu usage low(with a --turbo command line option to disable this for those that wantfaster indexing)
The cause of the slow down is heavy fragmentation of the file based hashtable.
Having indexed 30GB of stuff, the optimization routine shrank the fulltext index from nearly 300MB to 20MB which means a massive 280MB offragmentation had occurred - this is obscene!
I note other indexers do not update the hash table directly but cachethe data in memory and then bulk upload it to reduce fragmentation andlessen the performance hit. The disadvantage of this is searches fornewly indexed content wont appear until the cache is uploaded to thehash table. (we could upload every 10-15 mins or something - infrequentwords should be updated more quickly though)


Why not hold back updates, but force flush to disk if a search is called?

As we are memory conservative, I am planning to do something similiarbut using sqlite (instead of precious memory) to cache new files andthen bulk upload. We could easily cache the data for many thousands offiles before uploading them.

If I remember correctly sqlite3 has some built in cache stuff, you mightwanna tweak the standard values a bit.

We can actually do better than others here because firstly we are notusing any more RAM so can therefore have much bigger caches and secondlyunlike other indexers which upload all at once (which often causes a cpuspike) we can do it incrementally in sqlite.
And no sqlite will not fragment as its btree based and not a hash table(btrees are much faster to update then hashes) and we will use aseperate db file which can be deleted when finished.
Will be experimenting on this tonight. There will be a few raceconditions to handle with this but its nothing too complex.


Looking forward to it :)


I am determined to get tracker running as smooth as a baby's bottom!

Follow-Ups:
- Re: [Tracker] Things that currently sux with tracker
  - From: Jamie McCracken

References:
- [Tracker] Things that currently sux with tracker
  - From: Jamie McCracken

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]