Re: [Tracker] Re-index/re-scan on each restart?
- From: Martyn Russell <martyn lanedo com>
- To: Michael Steiner <michisteiner verizon net>
- Cc: tracker-list gnome org
- Subject: Re: [Tracker] Re-index/re-scan on each restart?
- Date: Wed, 01 Sep 2010 19:58:52 +0100
On 01/09/10 17:18, Michael Steiner wrote:
Hi,
Hi,
After switching from an old RHEL to Ubuntu Lucid, i finally dumped
Google Desktop and went for tracker (0.8.16). So far i'm quite happy
with what it does and from the architecture and open apis i'm
confident that it will get even better.
That's great to hear! :)
However, there is one thing which is a bit annoying: Tracker seems to
re-scan/re-index all my files each time i start tracker (e.g., after
reboot or re-login), even if in the previous run it seemed to have
finished the complete scan/index (i.e., tracker-status showed
everything as idle). As i'm indexing a fair bunch of files this takes
several hours (almost days) even with most aggressive scan settings.
Is this a feature or a bug?
Carlos recently fixed a bug which sounds similar to what you're
describing here, see commit:
9339e32afca110fa08ac89a7161c080a9c70636e
This is in master, but not in 0.8. I will cherry-pick this for
tomorrow's release. The difference in start up time is incredible, for
my 20k files on this desktop machine it takes ~35s, before it was taking
minutes IIRC (that time is just to check+add monitors).
If the former, it is probably in the attempt to not loose any file
modification when tracker is not run? Of course, the best approach
for this would be to have a indexer which runs independent of the
desktop.
Not sure what you mean by that?
Short of that, it would be great to have an option which
allows turning off that feature (i definitely would trade the rather
rare missed modifications against not having a CPU and IO hog after
each UI login)
This is possible in 0.9, there is are config options,
EnableMonitors=false (in 0.8 but will still crawl)
CrawlingInterval=0 (in 0.9, set to -1 to disable crawling entirely)
The later option above allows application specific indexing only so the
crawler doesn't burn any CPU time, however, it isn't the default or
recommended since you then rely on applications to keep data up to date.
If it's a bug, following some observations after looking at the
log-files in ~/.local/share/tracker:
- tracker-store.log is empty
All logs will be if Verbosity is < 1 in their respective .cfg files in
$HOME/.config/tracker.
- tracker-extract.log contains a warning
01 Sep 2010, 09:53:06: Tracker-Warning **: Could not load module 'libextract-mplayer.so':
/usr/lib/tracker-0.8/extract-modules/libextract-mplayer.so: undefined symbol: tracker_extract_guess_date
which seems due to libextract-mplayer (and libextract-totem) using
a non-existing function ``tracker_extract_guess_date'' (rather than
presumably the existing ``tracker_date_guess'') and is unlikely to
have an impact here.
Checking the source, it seems these extractors are seldom built and
still call that function in master. I will put it on the TODO list
before tomorrow's release. Also after Michael Biebl pointed out there
are a few other miscellaneous issues with 0.8/0.9 in the build system, I
will try to fix those at the same time. I wasn't going to do a 0.8
release tomorrow, but I may well do given these recent findings.
The tracker_date_guess() should be the right one here.
I also see a few warnings along the lines of
01 Sep 2010, 09:58:05: Tracker-Warning **: Couldn't convert 14848 bytes from CP1252 to UTF-8: Invalid
byte sequence in conversion input
but this is probably also not relevant for this problem?
Sounds like the file encoding was incorrectly detected or the file is
not encoded in the correct encoding in the first place. It is entirely
possibly this may be the case for MP3 files for example. You would need
to turn up the verbosity to know more details (like the file involved).
- tracker-miner-fs.log has by far the most messages (several
hunderts), half of them are of the flavor of below
01 Sep 2010, 08:28:13: Tracker-Critical **: Could not execute sparql: Unable to insert multiple values
for subject `urn:uuid:0c147350-e9fe-9b16-ced3-2564b21ef9fa' and single valued property `dc:rights'
(old_value: 'http://creativecommons.org/licenses/by/2.5/', new value:
'http://www.apache.org/licenses/LICENSE-2.0')
Those should be fixed. Could you turn the verbosity up to 3 and create a
new bug report with the file that causes this? (if possible)
(3 quarter of them include http://www.apache.org/licenses/LICENSE-2.0, for the rest i didn't spot a
pattern)
Being marked critical, maybe this is causing the re-index?
This is critical because it means the file didn't get indexed and can
mean the ontology and/or the SPARQL is incorrect. It shouldn't cause a
reindex at all, it just means the file is skipped and the code should be
fixed to handle (usually) a corner case.
Any insights are welcome. Thanks!
Thanks for reporting these issues, they're most useful for us to look
into and fix.
-michael-
PS: when i installed it, i also run ``make check'' and after i
figured out that i had to do a ``cd `/bin/pwd`'' to please some tests
it all worked fine with the exception of the
``tracker-password-provider-test'' test which didn't run as it
expected some pwd files pre-configured which i didn't have (and didn't
immediately could figure out how to create)
For 0.8? or 0.9? This should be fixed I would say.
--
Regards,
Martyn
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]