Re: Adding exclude path when indexing



Hi,

On Wed, 2007-02-07 at 14:37 +0100, Stephan Hegel wrote:
> I'm wondering why it requires recrawling the whole tree. I've thought
> that the indexer information is stored in a sql database and the entries
> to be removed could be easily detected just by querying for the exclude
> path in this this database ?

Unfortunately it's not this simple.  The index only stores information
about the directory that the file is immediately in, and not any of its
parent directories.  For example, given a file:

        /foo/bar/baz/quux.txt
        
the entry in the index for quux.txt only knows about its parent, baz,
and not even by name.  There is a unique ID that all files get so that
they can be referenced even across moves and renames.

The reason is efficiency.  Imagine that /foo/bar has thousands of files
underneath it.  If you rename /foo/bar to /foo/barbar, if we stored full
path information we would have to reindex thousands of files.  With our
current system, we have to reindex exactly one: /foo/barbar.

Consequently, this is also why you can't do full path searches with
Beagle.

Joe




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]