Re: [Tracker] Re-index/re-scan on each restart?



Hi Martyn,

On Wed, 01 Sep 2010 19:58:52 +0100, Martyn Russell <martyn lanedo com> said:

Thanks for a quick and detailed reply!

   >> However, there is one thing which is a bit annoying: Tracker seems to
   >> re-scan/re-index all my files each time i start tracker (e.g., after
   >> reboot or re-login), even if in the previous run it seemed to have
   >> finished the complete scan/index (i.e., tracker-status showed
   >> everything as idle).  As i'm indexing a fair bunch of files this takes
   >> several hours (almost days) even with most aggressive scan settings.
   >> 
   >> Is this a feature or a bug?
   MR> 
   MR> Carlos recently fixed a bug which sounds similar to what you're
   MR> describing here, see commit:
   MR> 
   MR>   9339e32afca110fa08ac89a7161c080a9c70636e
   MR> 
   MR> This is in master, but not in 0.8. 

I must admit i haven't tried yet master/0.9 as even with a brand-new
lucid i couldn't get all build dependencies pre-built/packaged.  I
guess i should plunge and hand-install them ...


   MR> I will cherry-pick this for
   MR> tomorrow's release. The difference in start up time is incredible, for
   MR> my 20k files on this desktop machine it takes ~35s, before it was
   MR> taking minutes IIRC (that time is just to check+add monitors).

Speed-up sounds good :-)


   >> If the former, it is probably in the attempt to not loose any file
   >> modification when tracker is not run? Of course, the best approach
   >> for this would be to have a indexer which runs independent of the
   >> desktop.
   MR> 
   MR> Not sure what you mean by that?

Sorry for being too brief (even though my overall mail seemed to be
rather on the lengthy side :-)

What i meant is the following: Given that tracker runs only when the
desktop is up, there are potentially file modification which have
happened between runs.  So a potential reason for a rescan after a
restart could be to verify whether such changes have happened or not.
As said below, i might trade, though, some inaccuracies to (a
hopefully greatly) reduced startup cost.  However, from what you write
below i guess you also have to go unavoidable through all files to
activate the inotify interface so some form of scan always happens
with some cost (but not necessarily a re-index if the date of
modificaiton hasn't changed recently) ?


   >> Short of that, it would be great to have an option which
   >> allows turning off that feature (i definitely would trade the rather
   >> rare missed modifications against not having a CPU and IO hog after
   >> each UI login)
   MR> 
   MR> This is possible in 0.9, there is are config options,
   MR> 
   MR>   EnableMonitors=false (in 0.8 but will still crawl)
   MR>   CrawlingInterval=0 (in 0.9, set to -1 to disable crawling entirely)

I guess the second argument would give me the
monitor-but-don't-(re)crawl i was looking for?  I guess reasons to
manually upgrade the build dependencies :-)


   MR> The later option above allows application specific indexing only so
   MR> the crawler doesn't burn any CPU time, however, it isn't the default
   MR> or recommended since you then rely on applications to keep data up to
   MR> date.
   MR> 
   >> If it's a bug, following some observations after looking at the
   >> log-files in ~/.local/share/tracker:
   >> 
   >> - tracker-store.log is empty
   MR> 
   MR> All logs will be if Verbosity is < 1 in their respective .cfg files in
   MR> $HOME/.config/tracker.

All cfgs had verbosity=0. In this case i wasn't concerned about having
empty logs, but just as side-info that tracker-store didn't see any
errors. 


   >> - tracker-miner-fs.log has by far the most messages (several
   >> hunderts), half of them are of the flavor of below
   >> 
   >> 01 Sep 2010, 08:28:13: Tracker-Critical **: Could not execute sparql:
   >> Unable to insert multiple values for subject
   >> `urn:uuid:0c147350-e9fe-9b16-ced3-2564b21ef9fa' and single valued
   >> property `dc:rights' (old_value:
   >> 'http://creativecommons.org/licenses/by/2.5/', new value:
   >> 'http://www.apache.org/licenses/LICENSE-2.0')
   MR> 
   MR> Those should be fixed. 

You mean fixed in 0.8.16? (Aa mentioned, i'm not running 0.9 yet)

   MR> Could you turn the verbosity up to 3 and create
   MR> a new bug report with the file that causes this? (if possible)

Ok, i'll change the config and hope my log-files down overwhelm my
disk :-) [as mentioend, i index a lot of files with the index
currently about 4GB ...]


   >> PS: when i installed it, i also run ``make check'' and after i
   >> figured out that i had to do a ``cd `/bin/pwd`'' to please some tests
   >> it all worked fine with the exception of the
   >> ``tracker-password-provider-test'' test which didn't run as it
   >> expected some pwd files pre-configured which i didn't have (and didn't
   >> immediately could figure out how to create)
   MR> 
   MR> For 0.8? or 0.9? This should be fixed I would say.

This was for 0.8.16.  Haven't run 0.9.* yet for above mentioned
reasons.

-michael-



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]