Re: [Tracker] tracker 0.10: could not process <file>, creating minimal info [WAS: some files aren't indexed]



On 13 October 2011 18:05, Mildred Ki'Lya <mildred593+ml tracker gmail com> wrote:
Hi,

I am using tracker on a huge directory, and some of the files aren't indexed (approximately 35%). More precisely, tracker-info returns basic metadata about the file (file name and such) but nothing about the content of the file itself. I have a special exporter plugin that should fill in some information, and I can't see it.

If I use tracker-control -f to tell tracker to index a particular file that was not indexed, the file meta-data appears correctly. But because the number of files not indexed is huge (24500), and new files appear constantly, I can't really index files manually. Note: tracker-miner-fs -e says the file is eligible for mining.

I run tracker-miner-fs -v 3 -n and I get the following error message near the end:


============================================================
Tracker-Message:   No files qualify for updates
Tracker-Message:   No files qualify for updates
Tracker-Message: Could not process 'file:///home/tracker/gnatbugs/F7/F720-018/files/ntfs_mds.adb': GDBus.Error:org.freedesktop.DBus.Error.NoReply: Message did not receive a reply (timeout by message bus)
(tracker-miner-fs:22960): Tracker-DEBUG: Creating minimal info for new item 'file:///home/tracker/gnatbugs/F7/F720-018/files/ntfs_mds.adb' which had error: 'GDBus.Error:org.freedesktop.DBus.Error.NoReply: Message did not receive a reply (timeout by message bus)'
Tracker-Message:   No files qualify for updates
Tracker-Message:   No files qualify for updates
...
Tracker-Message: Flushing SPARQL buffer, reason: Queue handlers WAIT
Tracker-Message: (Sparql buffer) Finished array-update with 3 tasks
...
Tracker-Message: Flushing SPARQL buffer, reason: Queue handlers NONE
Tracker-Message: (Sparql buffer) Finished array-update with 1 tasks
Tracker-INFO: --------------------------------------------------
Tracker-INFO: Total directories : 83225 (113 ignored)
Tracker-INFO: Total files       : 327194 (327045 ignored)
Tracker-INFO: Total monitors    : 83112
Tracker-INFO: Total processed   : 149 (465 notified, 136 with error)
Tracker-INFO: --------------------------------------------------

Tracker-INFO: Idle
Tracker-INFO: Finished mining in seconds:58085.199916, total directories:83225, total files:327194
Tracker-Message: All miners are now finished
Tracker-Message: Shutdown started
Tracker-Message:   Need mtime check file:'/home/tracker/tracker/tracker-home/.cache/tracker/no-need-mtime-check.txt' created
Tracker-Message: Stopping disk space check

OK
============================================================

I got this error when tracker-control showed crawling was still at 1% (and after running all night long)

The error message (minimal info for new item) seems pretty straightforward, and I could detect at least 24387 files which had "minimal info for new item" created. There are possibly more.

Do you have an idea how I could get a list of those files which couldn't be harvested but only had minimal info created for them? Currently, I'm only looking at a certain type of files and I am looking for specific metadata. But I can't do that for any kind of file reliably.

Additionally, do you have an idea why I'm getting a "timeout by message bus". I assume this timeout is on a message sent from tracker-miner-fs to tracker-store containing the SPARQL update commands, and because tracker-store was overloaded it couldn't respond in a timely fashion. Is there a way to tell tracker-miner-fs to wait a little until tracker-store is available.

Also, would the files which have minimal metadata created for them will be crawled again next time, or will they be considered up to date and not crawled again until they change? How does tracker-miner-fs determines if a file needs to be updated ? I suppose it's conparine its mtime with a timestamp somewhere ... If it's with comparing with tracker:added property, then those files will never be re-indexed again. But there is a tracker:modified property I don't understand.

Note: I already increased the dbus timeout to 15s because I sometimes got a timeout on some big queries I made on tracker. i'm using the following dbus configuration file:

<!DOCTYPE busconfig PUBLIC "-//freedesktop//DTD D-Bus Bus Configuration 1.0//EN"
 "http://www.freedesktop.org/standards/dbus/1.0/busconfig.dtd">
<busconfig>
  <include ignore_missing="yes">/etc/dbus-1/session.conf</include>
  <limit name="reply_timeout">15000000</limit> <!-- 15s = 15 000 000 µs -->
</busconfig>

Do you have some insights to share with me?

Thanks,

Mildred



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]