Re: [Tracker] Memory usage for tracker-extract and tracker-miner-fs



Hi Joe,

On Sun, Jan 10, 2016 at 10:05 PM, Joe Rhodes <lists joerhodes com> wrote:
Carlos:

Yes, there are a LOT of files on this volume.  The makeup of the 5 TB of data is PDFs, Photoshop files, 
Word docs, InDesign & Illustrator docs.  There are very few large files like MP3's or videos.   If I 
disable all the extractors and just build an index based on file names, I get an index of about 3 GB.

Oh wow :), Then your usecase is one of those we deemed "extreme".
People usually fill up those huge HDDs with movies and whatnot, as the
impact in tracker is per-file Tracker still does fine (eg. we don't
usually need to read the file entirely, there's less opendir()/open()
syscalls, etc). But obviously, a few thousands >1GB files are not the
same than a few million >1MB text documents, also in terms of DB
storage, so the meta.db size might be legit after all.


I did notice that I was possibly indexing all of my snapshots of my volumes. I'm using ZFS and they're 
available under "/volume/.zfs".  I've added that folder to my list of excluded directories:

org.freedesktop.Tracker.Miner.Files ignored-directories ['.zfs', 'ZZZ Snapshots', 'po', 'CVS', 
'core-dumps', 'lost+found']

Tracker ignores hidden directories unless they're specifically marked
as an indexing directory in config. You can check with "tracker info
<file>" nonetheless, if there is info spew, Tracker indexed the file
somehow.


I'll see if that makes any difference.  If it was digging into those, that would greatly increase the 
number of files.

I'm not entirely sure how to start tracker with the valgrind command.  Tracker is currently started 
automatically by the Netatalk file server process.  In order to run the tracker processes, I have to 
execute the following:

PREFIX="/main-storage"
export XDG_DATA_HOME="$PREFIX/var/netatalk/"
export XDG_CACHE_HOME="$PREFIX/var/netatalk/"
export DBUS_SESSION_BUS_ADDRESS="unix:path=$PREFIX/var/netatalk/spotlight.ipc"
/usr/local/bin/tracker daemon -t

So after stopping the daemon, I just started tried the following:

valgrind --leak-check=full --log-file=valgrind-tracker-extract-log --num-callers=30 
/usr/local/libexec/tracker-extract
valgrind --leak-check=full --log-file=valgrind-tracker-miner-fs-log --num-callers=30 
/usr/local/libexec/tracker-miner-fs

Hopefully that will get you want you want?

Definitely! quite some memory reported as definitely lost, this
spotted a few embarrassing leaks... one now kindly fixed by Philip in
the msoffice module, others were introduced by myself in the 1.7.x
series, while porting part of our async machinery to GTask... I went
again through all my related changes, and fixed some leaks affecting
both tracker-extract and tracker-miner-fs. These fixes should account
for everything I can peek in the valgrind logs you provided.

So, master should work a lot better for you now, I think the leaks
found account for such high memory usage (or the majority of it). If
you can fetch the last 4 commits from git (or try with git itself)
that'd be great. I'll get a 1.7.2 release out of the door this week
nonetheless.

It would be definitely appreciated if you could try running again on
valgrind these patches, and send info back if you see further
"definitely lost" memory reported in the summary at the end of the
valgrind logs.

Cheers,
  Carlos


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]