Re: New module proposal: tracker



On Fri, 2009-11-06 at 11:55 +0000, Martyn Russell wrote:
> On 06/11/09 10:54, Alexander Larsson wrote:
> > On Fri, 2009-11-06 at 10:15 +0000, Martyn Russell wrote:
> > Thats cool. Although I don't think CPU load per-se is the main problem.
> > CPU scheduling is pretty easy to control such that a process only runs
> > when nothing else runs. The main problem is i/o costs (increased amount
> > of seeks causing degradation of application i/o) and general VM
> > behaviour (filling buffer caches, bumping out other apps from memory,
> > etc). These things are much much harder to control and measure.
> 
> Yea, I agree. This doesn't happen that often though. We only crawl once 
> on start up (which is quite cheap in my experience) and we are playing 
> with the idea of only doing it initially for first time indexes. About 
> this idea, the question is, how can we guarantee when the computer is 
> being shutdown to not miss file updates before the next boot? There is 
> also the case where we restart the monitoring daemon and we miss updates.

There has been discussions on the lkml about having persistant recursive
mtimes. That would solve this on filesystems that support it. Without
this all we can do is crawl on each startup, although if its not the
first indexing time such crawling can be done on an even lower prio
(with timeouts now and then perhaps). It is also less likely to cause
i/o starvation, because we mainly read directory entires, not files.
However, it would still read all inodes for all files in your homedir
which is a bunch of HD seeks and may use a fair amount of the buffer
cache.

The seeking is hard to avoid, but there may be ways to readdir stuff
without having the result be persistant in the buffer cache, although
i'm not sure how posix_fadvise(POSIX_FADV_DONTNEED) can be applied when
reading a directory...

Also, there are other tricks you can use like sorting readdir() result
by inode before stating to minimize seeking. Also, one could look at
struct dirent->d_type if its DT_DIR to avoid stating directories when
crawling (for the systems that support this).



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]