Re: beagle - can it clean up after itself?



Hi,

On Thu, 2007-01-18 at 12:08 -0500, Michael Blaustein wrote:
> 1.  I've noticed that beagle fills my /tmp directory with scores of
> files named according to the pattern "tmpxxxxxxx.tmp".
>      When can these be safely  deleted, and can beagle take care of
> doing that itself?

As Bera mentioned, these should be transient files that should be
cleaned up by Beagle when it is finished with them.  They are created
when extracting data from a non-file-backed source -- like emails or
archive files -- and passed from the daemon to the helper process.  The
helper process removes them when the data has been indexed.

> 2. Same question about files in .beagle/TextCache -  can they be
> safely deleted; if so, when; and can beagle do this itself?
>    This is a serious issue for me, since this directory is rapidly
> exhausting my hard drive space.

They can be deleted, but it means that you won't have snippets in most
search results.  There are a lot of issues with the TextCache as it's
currently implemented, starting with the name. ;)  It's not really a
"cache" because it's important data that only expires when an item is
removed from the index.  We store it separately from the index largely
for search speed reasons.

Another problem is that it is very block size inefficient.  When we
moved from an uncompressed cache to a compressed one, we only got 40%
compression.  Since it's all just plain text, the gains should have been
much, much greater.  The problem is that a lot of the files stored there
are smaller than a single FS block, so space is wasted.  Storing it
differently would probably be more effective.

And lastly, I think that all terms are stored in the text cache,
although we only index the first 10,000.  This is only an issue for very
large documents, obviously.

> 3.  Is there any way for a user to control the amount of memory beagle
> (really beagled-helper) takes up?  Every time I start beagle,
>     by rebooting or by loggin on to my account, there is a longish
> time when indexing takes up 50-60 percent or more of memory,
>     and as a result no other process can run reasonably responsively.
> (exercise_the_dog is not set.)

The beagle-helper process monitors its own memory usage and if it
crosses a threshold it will shut itself down and restart at the end of
the current batch of indexing.  If you're seeing extended periods of
time where the memory usage is pretty high (and never going back down),
you're probably hitting a bug in one of the file filters.  Examining the
index helper logs might help identify the problematic file.

Joe




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]