Re: [Tracker] Things that currently sux with tracker



Anders Aagaard wrote:

Why not hold back updates, but force flush to disk if a search is called?

it would be too slow I guess (assuming 1000's of files cached up and could cause a bus timeout for the search).

We would only hold back updates for new files - existing files would be indexed straight away as we have a differential indexer so only the changed differences are indexed not the whole content (you are unlikely to have a large number of files that have changed at the same time as well).

It is tempting on the first run index to hold back any updating of QDBM until its finished - this should 100% eliminate all performance hits, disk thrashing and fragmentation but you would get back no search results until the first run completes.

subsequent indexing of new files can then be spooled into sqlite and flushed periodically into QDBM to minimise the above.

I dont think its a must have to have newly indexed content searchable right away.


As we are memory conservative, I am planning to do something similiar but using sqlite (instead of precious memory) to cache new files and then bulk upload. We could easily cache the data for many thousands of files before uploading them.

If I remember correctly sqlite3 has some built in cache stuff, you might wanna tweak the standard values a bit.

they are mostly used for sorting and stuff - we will be relying on the OS dirty write cache instead (in most cases, writing to sqlite the values would be instantaneous on any competent platform and those with extra RAM will benefit from the contents effectively being in memory via the disk cache anyhow)

--
Mr Jamie McCracken
http://jamiemcc.livejournal.com/




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]