Re: [Tracker] more issues with indexer-split
- From: Jamie McCracken <jamie mccrack googlemail com>
- To: Martyn Russell <martyn imendio com>
- Cc: Tracker-List <tracker-list gnome org>
- Subject: Re: [Tracker] more issues with indexer-split
- Date: Wed, 03 Sep 2008 10:28:53 -0400
On Wed, 2008-09-03 at 12:34 +0100, Martyn Russell wrote:
Jamie McCracken wrote:
trunk only checks directories (If a file in a directory is modified then
the directories mtime is also altered so no need to check every file)
hence startup is much faster.
Note: the mtime of the parent directory ONLY is updated. This is not
recursive. So if you have /foo/bar/baz/sliff.txt, the mtime of baz/ is
updated not for bar/ and foo/.
This means you _HAVE_ to go into every directory to see if it has a
subdirectory with an mtime that has updated.
that is what trunk does - it only checks directories (and
subdirectories). Theres no need to check mtime for a file ever unless
the parent directory mtime has changed
We can do this. Can you guarantee that on EVERY file system type the
parent directory mtime is updated when a file changes? I am not 100%
sure this is the case.
on all major platforms yes (*nix and windows)
Hmm. This wories me. How mtime is used across file systems tends to vary
slightly and this might come back to bite us.
Its not been a problem in the past for tracker and certainly wont be for
our target audience
it is for me - its in the order of 3x slower than trunk at startup
What exactly is 3x slower? The crawling?
I have been thinking about this. The best solution here to me is to send
ALL files/directories to the indexer and let the indexer check the mtime
of a directories before deciding to process the files it holds. This
should dramatically reduce the DB lookups on startup. But if the
slowness is NOT in the indexer, then there is little you can do except
increase the throttle. Have you tested it again recently since I made
throttle mandatory whenever it is called (i.e. it is 5+config value).
This made a lot of difference for me.
trackerd should just pass directories at startup and let the indexer
work out what to process. Dbus is not optimised for passing large number
of strings. Can the current design easily accommodate this?
jamie
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]