Re: [Tracker] tracker 0.10: could not process <file>, creating minimal info [WAS: some files aren't indexed]



Hi!

On Fri, Oct 14, 2011 at 10:56 AM, Mildred Ki'Lya
<mildred593+ml tracker gmail com> wrote:
On 13 October 2011 18:05, Mildred Ki'Lya <mildred593+ml tracker gmail com>
wrote:

Hi,

I am using tracker on a huge directory, and some of the files aren't
indexed (approximately 35%). More precisely, tracker-info returns basic
metadata about the file (file name and such) but nothing about the content
of the file itself. I have a special exporter plugin that should fill in
some information, and I can't see it.

If I use tracker-control -f to tell tracker to index a particular file
that was not indexed, the file meta-data appears correctly.

 My wild guess: some files are making the extractor crash. We send the
files in batches to the extractor so if one fails, you get only basic
metadata in the whole batch. Maybe now with tracker-control -f you are
indexing a file that didn't crash (but was in a "crashed" batch).

Tracker-Message: Could not process
'file:///home/tracker/gnatbugs/F7/F720-018/files/ntfs_mds.adb':
GDBus.Error:org.freedesktop.DBus.Error.NoReply: Message did not receive a
reply (timeout by message bus)
(tracker-miner-fs:22960): Tracker-DEBUG: Creating minimal info for new item
'file:///home/tracker/gnatbugs/F7/F720-018/files/ntfs_mds.adb' which had
error: 'GDBus.Error:org.freedesktop.DBus.Error.NoReply: Message did not
receive a reply (timeout by message bus)'

 This usually means that the extractor has crashed.

Do you have an idea how I could get a list of those files which couldn't be
harvested but only had minimal info created for them? Currently, I'm only
looking at a certain type of files and I am looking for specific metadata.
But I can't do that for any kind of file reliably.

Maybe this query can help you:
$ tracker-sparql -q "SELECT ?u WHERE { ?u a nfo:FileDataObject. FILTER
(NOT EXISTS { ?u a nie:InformationElement. })}"

It gives you everything that is a File but doesn't have any
interpretation (the extractor didn't tell us what it is).

Additionally, do you have an idea why I'm getting a "timeout by message
bus". I assume this timeout is on a message sent from tracker-miner-fs to
tracker-store containing the SPARQL update commands, and because
tracker-store was overloaded it couldn't respond in a timely fashion. Is
there a way to tell tracker-miner-fs to wait a little until tracker-store is
available.

 Usually that timeout is between the miner-fs and the extractor. The
extractor crashes and the miner gets a timeout in his call asking for
metadata.

Also, would the files which have minimal metadata created for them will be
crawled again next time, or will they be considered up to date and not
crawled again until they change? How does tracker-miner-fs determines if a
file needs to be updated ? I suppose it's conparine its mtime with a
timestamp somewhere ... If it's with comparing with tracker:added property,
then those files will never be re-indexed again. But there is a
tracker:modified property I don't understand.

 Tracker uses the mtime to check if the file is up-to-date. Any change
in the file will modify the mtime and Tracker can detect it.

 We should fix the extraction errors. Repeating the extraction can
lead to a loop of trying/crashing/trying...

Do you have some insights to share with me?

 In this case the answer is lost because of the crash. It is not that
is coming very late. So the timeout doesn't have any effect.

 Hope this helps,

Ivan



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]