Re: [Tracker] Memory usage for tracker-extract and tracker-miner-fs



Hi Joe,

On Sun, Jan 10, 2016 at 6:40 PM, Joe Rhodes <lists joerhodes com> wrote:
I have just compiled and installed tracker-1.7.1 on a CentOS 7.1 box.  I
just used the default configuration ("./configure" with no additional
options).  I'm indexing around  5 TB of data.  I'm noticing that both the
tracker-extract   and  tracker-miner-fs processes are using a large amount
of RAM.  The tracker-extract process is currently using 11 GB of RAM (RES
not VIRT as reported by top), while the tracker-miner-fs is sitting at 4.5
GB.

Both processes start out modestly, but continue to grow as they do their
work.  The tracker-miner-fs levels off at 4.5 GB once it appears to have
finished crawling the entire volume. (Once the CPU usage goes back down to
near 0.)   The tracker-extract process also continues to grow as it works.
Once it is done, it levels off.  Last time it stayed at about 9 GB.

If I restart tracker (with: 'tracker daemon -t' followed by 'tracker daemon
-s') a similar thing will happen with tracker-miner-fs.  It will grow back
to 4.5 GB as it crawls its way across the entire volume.  The
tracker-extract process though, because all of the files were just indexed
and it doesn't need to do much, uses a very modest amount of RAM. I don't
have that number right now because I'm re-indexing the entire volume, but
it's well below 100 MB.

Is this expected behaviour?  Or is there a memory leak?  Or perhaps tracker
just isn't designed to operate on this large of a volume?

It totally sounds like a memory leak, although it sounds strange that
it hits both tracker-miner-fs and tracker-extract.

There is obviously an impact to running Tracker on large directory
trees, such as:

- Possibly exhausted inotify handles, the directories we fail to
create a monitor for would just be checked/updated on next miner
startup
- More (longer, rather) IO/CPU usage during startup, because the miner
has to check mtimes for all directories and files
- The miner also needs to keep an in-memory representation of the
directory tree for accounting purposes (file monitors, etc). Regular
files are represented in this model only as long as they're being
checked/processed, and disappear soon after. This might account for a
memory peak at startup, if there's many items left to process, because
Tracker dumps files into processing queues ASAP, but I think the
memory usage should be nowhere as big.

So I think nothing accounts for such memory usage in tracker-miner-fs,
the only known source of unbound memory growth is the number of
directories (and regular files for the peak at startup) to be indexed,
but you would need millions of those to have tracker-miner-fs grow up
to 4.5GB.

And tracker-extract has a much shorter memory, it just checks the
files that need extraction in small batches, and processes those one
by one before querying the next batch. 9GB shout memory leak, we've
had other memory leak situations in tracker-extract, and the culprit
most often is in the various libraries we're using in our extract
modules, if many files end up triggering that module (and the leaky
code path in the specific library), the effect will accumulate over
time.

The downside of this situation is that most often we Tracker
developers can't reproduce unless we have a file that triggers the
leak so we can fix it or channel to the appropriate maintainers, so it
would be great if you could provide valgrind logs, just run as:

valgrind --leak-check=full --log-file=valgrind-log --num-callers=30
/path/to/built/tracker-extract

Hit ctrl-C when enough time has passed, and send back the valgrind-log
file. Same applies to tracker-miner-fs.


My tracker meta.db file is about 13 GB right now, though still growing.  I
suspect it's close to indexed though.

This is also suspicious, you again need either a hideous amount of
files to have meta.db grow as large, or an equally hideous amount of
plain text content that gets indexed. Out of curiosity, how many
directories/files does that partition contain? is the content
primarily video/documents/etc?

Cheers,
  Carlos


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]