Hi Martyn Am 19.05.2014 um 14:32 schrieb Martyn Russell <martyn lanedo com>:
Something is not right there, what function is trying to allocate that memory and for what variable?I don't have a debug build so I only have a SBT posted initially: fe3cfed2 g_logv (fe4669d9, 4, fe46d604, fdb3ec3c, 238e3c) + 1d2 fe3d00c2 g_log (fe4669d9, 4, fe46d604, fe46830f, 238e3c, 1494f725) + 32 fe3ce730 g_realloc (14833008, 238e3c, fdb3ec90, 14213008, fdb3ecec, 14213008) + 80 fe4191ba _g_gnulib_vasnprintf (0, fdb3ed4c, fd0bc35f, fdb3edf8, 14833008, 4) + 66a fe41a333 _g_gnulib_vasprintf (fdb3edac, fd0bc35f, fdb3edf8, fe481a3c, fe481a3c, 8b15e68) + 33 fe413bd4 g_vasprintf (fdb3edac, fd0bc35f, fdb3edf8, 14213008, 8b24120, ffffffff) + 34 fe3ee0d6 g_string_append_vprintf (8168260, fd0bc35f, fdb3edf8, fd0cc588, 8b15e68) + 46 fe3ee2fb g_string_append_printf (8168260, fd0bc35f, 14213008, 1d, 8b15e68, 0) + 2b fd0ac385 tracker_sparql_builder_object_string (8b15e68, 14632008, fdb3ee5c, fd0abcee, 8168260, fcad11bd) + b5 fd0ac55c tracker_sparql_builder_object_unvalidated (8b15e68, 14632008, fcad11f0, 86c26c8, 100000, 50) + 10cYea you sent this before, I meant more which extractor but I think I have some idea here because of the text extraction in your previous emails :)
:) Did't I mention that before? It's libextract-text.so of course. This is the full log from another tracker-control -r, forcing reindexing: <http://pastebin.com/3mD9gEz8>
The log you originally sent: ... 12 May 2014, 14:46:19: Tracker: Read 65535 bytes from file, 16 bytes remaining until configured threshold is reached 12 May 2014, 14:46:19: Tracker: Read 16 bytes from file, 0 bytes remaining until configured threshold is reached 12 May 2014, 14:46:19: GLib: gmem.c:168: failed to allocate 2330172 bytes I wonder if the text extractor is broken in some way and/or the GIO code is too on Solaris?
I was wondering that too and therefor decided to defer digging into this issue until I could check whether I could reproduce it on Linux too. Unfortunately I don't have the time at the moment to set up an environment on Linux where I can test with a Tracker version compiled from source. Tracker is linked with glib2 from OpenCSW which is at the newest version 2.40.0: <http://www.opencsw.org/packages/CSWglib2/> That's the newest version. I'm currently responsible for the glib2 package in OpenCSW and packaged 2.40.0 just a week ago.
Can you put the file somewhere to test so I can try on my local box here?
Attached.
Attachment:
file1.txt.zip
Description: Zip archive
I'm using a directory with ~485 copies of this file: $ ls -1 /tank/dbd/dir/ | wc -l 487 $
Reading that many bytes from a 1.2Mb file isn't right either. I wonder if you have some upper limit on memory allocations on the system?
Afair tracker-extract itself sets a memory limit at startup, ~256 MB afair: root beast:~# /opt/csw/libexec/tracker-extract -f /tank/dbd/ Locale 'TRACKER_LOCALE_LANGUAGE' was set to 'de_DE.UTF-8' Locale 'TRACKER_LOCALE_TIME' was set to 'de_DE.UTF-8' Locale 'TRACKER_LOCALE_COLLATE' was set to 'de_DE.UTF-8' Locale 'TRACKER_LOCALE_NUMERIC' was set to 'de_DE.UTF-8' Locale 'TRACKER_LOCALE_MONETARY' was set to 'de_DE.UTF-8' Setting priority nice level to 19 Loading extractor rules... (/opt/csw/share/tracker/extract-rules) Loaded rule '10-abw.rule' Loaded rule '10-dvi.rule' Loaded rule '10-epub.rule' Loaded rule '10-gif.rule' Loaded rule '10-html.rule' Loaded rule '10-ico.rule' Loaded rule '10-jpeg.rule' Loaded rule '10-mp3.rule' Loaded rule '10-msoffice.rule' Loaded rule '10-oasis.rule' Loaded rule '10-pdf.rule' Loaded rule '10-png.rule' Loaded rule '10-ps.rule' Loaded rule '10-tiff.rule' Loaded rule '11-msoffice-xml.rule' Loaded rule '90-text-generic.rule' Loaded rule '93-mplayer-generic.rule' Extractor rules loaded Setting memory limitations: total is 18,4 EB, minimum is 256 MB, recommended is ~1 GB Virtual/Heap set to 268,4 MB (50% of total or MAXLONG) MIME type guessed as 'inode/directory' (from GIO) tracker_mimetype_info_get_module: assertion 'info != NULL' failed Es wurden keine Metadaten gefunden oder keine Entdecker, die mit dieser Datei umgehen können Total memory is a little less then 18,4 EB of course, 64 GB afair. :)
The other thing is, the default maximum bytes to read from a text file is 1048576. So we shouldn't even be reaching 2330172 bytes.
See below, it's as if its reading up to the configured maximum of 1 MB from each file into an allocated buffer and fails to free the buffer. I check that code path in the module and it uses g_content_from_file() (or whatever it was named) and seems to properly free it afterwards. 12 Mai 2014, 14:33:31: Tracker: Extracting metadata for 'file:///tank/dbd/dir11/file185.txt' 12 Mai 2014, 14:33:31: Tracker: MIME type passed to us as 'text/plain' 12 Mai 2014, 14:33:31: Tracker: Using /opt/csw/lib/tracker-1.0/extract-modules/libextract-text.so... 12 Mai 2014, 14:33:31: Tracker: Starting to read 'file:///tank/dbd/dir11/file185.txt' up to 1048576 bytes... 12 Mai 2014, 14:33:31: Tracker: Read 65535 bytes from file, 983041 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Read 65535 bytes from file, 917506 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Read 65535 bytes from file, 851971 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Read 65535 bytes from file, 786436 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Read 65535 bytes from file, 720901 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Read 65535 bytes from file, 655366 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Read 65535 bytes from file, 589831 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Read 65535 bytes from file, 524296 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Read 65535 bytes from file, 458761 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Read 65535 bytes from file, 393226 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Read 65535 bytes from file, 327691 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Read 65535 bytes from file, 262156 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Read 65535 bytes from file, 196621 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Read 65535 bytes from file, 131086 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Read 65535 bytes from file, 65551 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Read 65535 bytes from file, 16 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Read 16 bytes from file, 0 bytes remaining until configured threshold is reached 12 Mai 2014, 14:33:31: Tracker: Done (2 objects added) Cheerio! -Ralph