Re: [Tracker] Memory usage for tracker-extract and tracker-miner-fs



Carlos, et. al.,

I'm sorry, but I cannot seem to build the master branch right now.  I ran the autogen.sh script and then 
configure dies on me with this:

checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.16... yes
./configure: line 19136: syntax error near unexpected token `0.9.5'
./configure: line 19136: `GOBJECT_INTROSPECTION_CHECK(0.9.5)'


I'm not entirely sure what's going on there.  (Sorry, programming is not my forte.)    I'll have to wait for 
1.7.2 and give that a try.  I can only work on this in the evenings when I'm not at work and the server thats 
housing all of this data is otherwise not terribly busy.

Cheers!
-Joe Rhodes




On Jan 11, 2016, at 5:21 AM, Philip Van Hoof <philip codeminded be> wrote:

Hi Carlos,

Looks like my git-account has been closed on GNOME, so here is a patch
for one of the issues in that valgrind.


Kind regards,

Philip

On Sun, 2016-01-10 at 16:05 -0500, Joe Rhodes wrote:
Carlos:

Yes, there are a LOT of files on this volume.  The makeup of the 5 TB of data is PDFs, Photoshop files, 
Word docs, InDesign & Illustrator docs.  There are very few large files like MP3's or videos.   If I 
disable all the extractors and just build an index based on file names, I get an index of about 3 GB.  

I did notice that I was possibly indexing all of my snapshots of my volumes. I'm using ZFS and they're 
available under "/volume/.zfs".  I've added that folder to my list of excluded directories:

org.freedesktop.Tracker.Miner.Files ignored-directories ['.zfs', 'ZZZ Snapshots', 'po', 'CVS', 
'core-dumps', 'lost+found']

I'll see if that makes any difference.  If it was digging into those, that would greatly increase the 
number of files.

I'm not entirely sure how to start tracker with the valgrind command.  Tracker is currently started 
automatically by the Netatalk file server process.  In order to run the tracker processes, I have to 
execute the following:

PREFIX="/main-storage"
export XDG_DATA_HOME="$PREFIX/var/netatalk/"
export XDG_CACHE_HOME="$PREFIX/var/netatalk/"
export DBUS_SESSION_BUS_ADDRESS="unix:path=$PREFIX/var/netatalk/spotlight.ipc"
/usr/local/bin/tracker daemon -t

So after stopping the daemon, I just started tried the following:

valgrind --leak-check=full --log-file=valgrind-tracker-extract-log --num-callers=30 
/usr/local/libexec/tracker-extract
valgrind --leak-check=full --log-file=valgrind-tracker-miner-fs-log --num-callers=30 
/usr/local/libexec/tracker-miner-fs

Hopefully that will get you want you want?

I've uploaded the log files files to DropBox.  Hopefully you can easily grab those without having to jump 
through too many hoops. 

https://www.dropbox.com/s/o3w10hnaa6ikvn3/valgrind-tracker-extract-log.gz?dl=0
https://www.dropbox.com/s/5s4vqk0owrf5gjd/valgrind-tracker-miner-fs-log.gz?dl=0

I let them run for a bit.  I could definitely see RAM usage start to climb.  I didn't bother to let it go 
to GB's in size.  I think I was about about 300MB when I hit Ctl-C.

Cheers!
-Joe Rhodes


On Jan 10, 2016, at 2:25 PM, Carlos Garnacho <carlosg gnome org> wrote:

Hi Joe,

On Sun, Jan 10, 2016 at 6:40 PM, Joe Rhodes <lists joerhodes com> wrote:
I have just compiled and installed tracker-1.7.1 on a CentOS 7.1 box.  I
just used the default configuration ("./configure" with no additional
options).  I'm indexing around  5 TB of data.  I'm noticing that both the
tracker-extract   and  tracker-miner-fs processes are using a large amount
of RAM.  The tracker-extract process is currently using 11 GB of RAM (RES
not VIRT as reported by top), while the tracker-miner-fs is sitting at 4.5
GB.

Both processes start out modestly, but continue to grow as they do their
work.  The tracker-miner-fs levels off at 4.5 GB once it appears to have
finished crawling the entire volume. (Once the CPU usage goes back down to
near 0.)   The tracker-extract process also continues to grow as it works.
Once it is done, it levels off.  Last time it stayed at about 9 GB.

If I restart tracker (with: 'tracker daemon -t' followed by 'tracker daemon
-s') a similar thing will happen with tracker-miner-fs.  It will grow back
to 4.5 GB as it crawls its way across the entire volume.  The
tracker-extract process though, because all of the files were just indexed
and it doesn't need to do much, uses a very modest amount of RAM. I don't
have that number right now because I'm re-indexing the entire volume, but
it's well below 100 MB.

Is this expected behaviour?  Or is there a memory leak?  Or perhaps tracker
just isn't designed to operate on this large of a volume?

It totally sounds like a memory leak, although it sounds strange that
it hits both tracker-miner-fs and tracker-extract.

There is obviously an impact to running Tracker on large directory
trees, such as:

- Possibly exhausted inotify handles, the directories we fail to
create a monitor for would just be checked/updated on next miner
startup
- More (longer, rather) IO/CPU usage during startup, because the miner
has to check mtimes for all directories and files
- The miner also needs to keep an in-memory representation of the
directory tree for accounting purposes (file monitors, etc). Regular
files are represented in this model only as long as they're being
checked/processed, and disappear soon after. This might account for a
memory peak at startup, if there's many items left to process, because
Tracker dumps files into processing queues ASAP, but I think the
memory usage should be nowhere as big.

So I think nothing accounts for such memory usage in tracker-miner-fs,
the only known source of unbound memory growth is the number of
directories (and regular files for the peak at startup) to be indexed,
but you would need millions of those to have tracker-miner-fs grow up
to 4.5GB.

And tracker-extract has a much shorter memory, it just checks the
files that need extraction in small batches, and processes those one
by one before querying the next batch. 9GB shout memory leak, we've
had other memory leak situations in tracker-extract, and the culprit
most often is in the various libraries we're using in our extract
modules, if many files end up triggering that module (and the leaky
code path in the specific library), the effect will accumulate over
time.

The downside of this situation is that most often we Tracker
developers can't reproduce unless we have a file that triggers the
leak so we can fix it or channel to the appropriate maintainers, so it
would be great if you could provide valgrind logs, just run as:

valgrind --leak-check=full --log-file=valgrind-log --num-callers=30
/path/to/built/tracker-extract

Hit ctrl-C when enough time has passed, and send back the valgrind-log
file. Same applies to tracker-miner-fs.


My tracker meta.db file is about 13 GB right now, though still growing.  I
suspect it's close to indexed though.

This is also suspicious, you again need either a hideous amount of
files to have meta.db grow as large, or an equally hideous amount of
plain text content that gets indexed. Out of curiosity, how many
directories/files does that partition contain? is the content
primarily video/documents/etc?

Cheers,
Carlos

_______________________________________________
tracker-list mailing list
tracker-list gnome org
https://mail.gnome.org/mailman/listinfo/tracker-list

<0001-Fix-small-memory-leak.patch>



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]