New directory-scanning behaviour in CVS

From: Daniel Drake <dsd gentoo org>
To: dashboard-hackers gnome org
Subject: New directory-scanning behaviour in CVS
Date: Mon, 25 Jul 2005 20:40:17 +0100

Previously, when beagled started up, it recursively scanned all directories inyour home directory (and other roots), created an internal ID for eachdirectory, stored those ID's in a database, created inotify watches on eachdirectory, etc ....

This was very expensive for large home directories - some users reportedstartup times of 5-10 minutes of continuous disk I/O while this process washappening. Stick a mozilla checkout in your home directory and it is quitenoticable.

After this scanning process is complete, beagle then starts to slowly crawldirectories and handle file events.

I have committed a change to this behaviour to CVS. The processes of scanningand crawling have been merged into one - crawling.

Crawling happens one directory at a time, with an appropriate length pauseinbetween each directory. On crawl, the directory is scanned forsubdirectories (which are then added to the crawling queue, if necessary), andthen the contents of that directory are crawled (files are indexed if they arenew or have changed, etc). inotify watches are created as soon as thedirectory is known.

This means that the startup expense is now almost non-existant since scanningis done on-crawl, but it also means that beagle will be blind to events fromparts of the filesystem until it has finished crawling.

To reduce the effects of this blindness, beagled starts watching directoriesas soon as it sees them (i.e. when they are added to the crawling queue) andwill happily respond to events even before they have been crawled.Similarly, beagled will immediately create watches on 1 level ofsubdirectories beneath each indexing root at startup.

The crawling pattern means that in general, lesser nested directories arewatched/crawled first.

Any file activity in queued directories will result in that directory beingbumped up the crawling queue.


I think thats everything.

Daniel

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]