Re: Extended attributes lastCrawlAttr



On Tue, 2004-11-02 at 14:11 +0000, Julian Satchell wrote:
> The correct design, in my opinion, is to hold the data that is currently
> written to the EAs inside Beagle's indices. This not only allows for
> read-only data sources, it also provides some of the infrastructure for
> time based queries (what files was I working with at this date?).

The timestamp that is stored in the EA is also stored in the indices.
We use EAs for performance reasons: Lucene is quite fast, but index
lookups are vastly more expensive than reading EAs.  Without EAs,
crawling becomes much more CPU- and I/O-intensive.

Remember that this isn't just an issue for the initial crawl.  Every
time beagled starts up, it has to assume that the user's files are in an
unknown state and has to re-crawl.  For the common case, where the index
is up-to-date and files have already been indexed, EAs make crawling
very, very efficient.

> Not all file systems support extended attributes. More importantly, it
> means that you cannot index read-only devices, filesystems or
> directories.

We need to have a fallback for files where EAs can't be set.  Using the
timestamps in the index is just too slow.  Maybe a little sqlite
database?

-J





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]