Re: Adding exclude path when indexing



> ignored document is not immediately removed from the index (that would
> require recrawling the whole tree to figure out what should be
> ignored) but silently removed from search results.
I'm wondering why it requires recrawling the whole tree. I've thought
that the indexer information is stored in a sql database and the entries
to be removed could be easily detected just by querying for the exclude
path in this this database ?

You are partially right. Its stored in a sql database _only_ in the
_unfortunate_ event that the "indexer information" cannot be stored in
the extended attributes of the corresponding files. Using extended
attributes is much faster and easier than using the database.

There is another technical problem: Suppose you added an exclude
pattern for /home. That would require dropping all the contents from
your home directory. Good so far. To maximize performance under normal
file system operation, beagle does not store the whole path of the
files in its lucene index (there might be a wiki entry explaining how
Files backend work - pls check). Which would mean, beagle would have
to figure out what files to remove and then delete them from the
index. That would be extremely expensive.

OTOH, its much easier to "lie" and lie consistently (the bug is wrt
the consistency part). Much healthier for your computer too. There has
been proposals for a background cleanup to remove the dead, but its
has not been implemented yet.

- dBera

--
-----------------------------------------------------
Debajyoti Bera @ http://dtecht.blogspot.com
beagle / KDE fan
Mandriva / Inspiron-1100 user



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]