Re: [Tracker] tracker 0.10: could not process <file>, creating minimal info [WAS: some files aren't indexed]





On 14 October 2011 10:55, Ivan Frade <ivan frade gmail com> wrote:
Hi!

On Fri, Oct 14, 2011 at 10:56 AM, Mildred Ki'Lya
<mildred593+ml tracker gmail com> wrote:
> On 13 October 2011 18:05, Mildred Ki'Lya <mildred593+ml tracker gmail com>
> wrote:
>>
>> Hi,
>>
>> I am using tracker on a huge directory, and some of the files aren't
>> indexed (approximately 35%). More precisely, tracker-info returns basic
>> metadata about the file (file name and such) but nothing about the content
>> of the file itself. I have a special exporter plugin that should fill in
>> some information, and I can't see it.

>> If I use tracker-control -f to tell tracker to index a particular file
>> that was not indexed, the file meta-data appears correctly.

 My wild guess: some files are making the extractor crash. We send the
files in batches to the extractor so if one fails, you get only basic
metadata in the whole batch. Maybe now with tracker-control -f you are
indexing a file that didn't crash (but was in a "crashed" batch).

Well, that's quite possible, I'll try to run the extractor in foreground to understand what is causing this error.

> Do you have an idea how I could get a list of those files which couldn't be
> harvested but only had minimal info created for them? Currently, I'm only
> looking at a certain type of files and I am looking for specific metadata.
> But I can't do that for any kind of file reliably.

Maybe this query can help you:
$ tracker-sparql -q "SELECT ?u WHERE { ?u a nfo:FileDataObject. FILTER
(NOT EXISTS { ?u a nie:InformationElement. })}"

It gives you everything that is a File but doesn't have any
interpretation (the extractor didn't tell us what it is).

I didn't knew this syntax existed, thank you. And it will help me.
Thanks

> Also, would the files which have minimal metadata created for them will be
> crawled again next time, or will they be considered up to date and not
> crawled again until they change? How does tracker-miner-fs determines if a
> file needs to be updated ? I suppose it's conparine its mtime with a
> timestamp somewhere ... If it's with comparing with tracker:added property,
> then those files will never be re-indexed again. But there is a
> tracker:modified property I don't understand.

 Tracker uses the mtime to check if the file is up-to-date. Any change
in the file will modify the mtime and Tracker can detect it.

 We should fix the extraction errors. Repeating the extraction can
lead to a loop of trying/crashing/trying...

Would it be possible to add an attribute tracker:error to the file so:
- we can very easily find files that failed
- we could have a tool monitoring those files, so errors aren't silently thrown away

Perhaps, the tracker-miner-fs message that says there was an error could be raised to the error level instead, this way is would be displayed on the log files. I don't like having the errors silently ignored.

Do you have an idea how I could monitor errors from tracker?

Thank you.

Mildred




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]