beagle stuck with big html and pdf files


I am trying to make beagle work (Ubuntu Hoary + backports) and
each time I am about to celebrate, beagle gets into trouble with two

I don't know whether it qualifies as a bug, so I'm asking first here. 

This is a small part of beagle --fg --debug output (I can attach more if

DEBUG: Loading Beagle.Util.Conf+IndexingConfig from indexing.xml
DEBUG: Helper Size: VmRSS=71,9 MB, size=6,39, 134,9%
DEBUG: Process too big, shutting down!
DEBUG: Calling BeginShutdown
DEBUG: Beginning shutdown event
DEBUG: CancelIfBlocking Beagle.Daemon.ConnectionHandler
DEBUG: Done with shutdown event
DEBUG: (1) Waiting for 2 workers...
DEBUG: waiting for HandleConnection (161)
DEBUG: waiting for server 'socket-helper'
DEBUG: worker removed: name=server 'socket-helper'
DEBUG: (2) Waiting for 1 worker...
DEBUG: waiting for HandleConnection (161)
DEBUG: Server 'socket-helper' shut down
DEBUG: worker removed: name=HandleConnection (128)

Moreover, this is also interesting:
ps aux | grep pdfinfo | wc -l returns 349 (and ps aux | grep pdftotext |
wc -l also 349) 

My CPU is burning. 

Last time I shut down Beagle when it got stuck with these mysql manuals,
and after new login it purged my filesystem index. 

However, I haven't noticed that .noindex option is no longer valid (I am
not sure about the syntax of the new IgnorePatterns, so far I decided to
index several directories setting them as roots and setting IndexHome to
"no"), and I thought that maybe excess number of files caused this "too
big" error. Now my "roots" are ~8000 files of ~3GB in total (documents
of different formats and images). Beagle shows 3471 files indexed
(IndexingService is 15). 

So my question is: shall I keep beagle running (is it going to recover
in a few hours?) or shall I shut it down? Is it a bug (shall I file
it?)? How can I help more to make it work? 

(I've tried compiling cvs but gave up on *sharp and gmime dependencies,
too much compiling for my production desktop...)

I don't need to have these files indexed, so I could just move them out
of my way, but this is far from a solution to the problem.

Rafal Prochniak

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]