Re: [Tracker] Getting ready for TRUNK: Config options
- From: Jamie McCracken <jamie mccrack googlemail com>
- To: Martyn Russell <martyn imendio com>
- Cc: Tracker-List <tracker-list gnome org>
- Subject: Re: [Tracker] Getting ready for TRUNK: Config options
- Date: Wed, 02 Jul 2008 18:26:21 -0400
On Wed, 2008-07-02 at 16:41 +0100, Martyn Russell wrote:
Hi all,
So as part of finishing up the indexer-split branch to try and get
things into a state to begin merging, I have been looking at the config
options we have and checking we implement them and if we don't finding
out if we need them.
Here are the current config options and some questions/comments about them.
WORKING OPTIONS:
================
â Verbosity
â Initial Sleep
â Enable Indexing
â Min Word Length
â Max Word Length
â Language
â Enable Stemmer
â Max Bucket Count
â Min Bucket Count
â Enable Xesam
NOT WORKING OPTIONS:
====================
â Low Memory Mode
This option currently has no effect in the indexer-split branch.
In TRUNK, it is used to:
1. Set the cache size to 1/2 of what it is normally when loading DBs
2. Set the array in update_word_table() to be 1/2 size.
3. It affects these variables (which are usually 1/2 in low mem mode):
a) tracker->memory_limit = 16000 *1024;
b) tracker->max_process_queue_size = 5000;
c) tracker->max_extract_queue_size = 5000;
d) tracker->word_detail_limit = 2000000;
e) tracker->word_detail_min = 0;
f) tracker->word_count_limit = 500000;
g) tracker->word_count_min = 0;
For #1, I think this makes sense to reimplement
agreed
For #2, I think this is pointless if the array grows
I guess
For #3a, The memory limit is used to know when to flush the word cache.
This needs reimplementing in the indexer.
yes
For #3b, The process queue size is used to know how big the files queue
can get before it should be processed in the database. This is done now
by the indexer and I am not sure it is pertinent any longer.
could get away without this
For #3c, This is the same as #3b.
this aint used
For #3d, This is unused in TRUNK.
For #3e, This is unused in TRUNK
For #3f, This is unused in TRUNK.
For #3g, This is unused in TRUNK
we need to limit no of hits per word if we are to use stack allocated
arrays - however I think this is done elsewhere using a #define in the
code so those vars are likely no longer needed
â NFS Locking
Do we need this? What is it for - as far as I can see, it is just some
simple locking mechanism using a file on the disk. What needs this? Can
we remove it?
no - we need to make sure on NFS that only one indexer can be launched
at any one time per user (note different session bus so cant use dbus
locking)
â Watch Directory Roots
â Crawl Directory
â No Watch Directory
â No Index File Types
These closely map to the .module files. I would like to rename them to
map exactly so they are obviously an override or addition to the
non-user space config of each module. What are your thoughts here?
thats fine
I would like to rename "WatchDirectoryRoots". Everyone, even GIO uses
"monitor", instead of "watch" and you can supply a list so it isn't just
one. Also, should we have ANOTHER option like we do in the module files
right now to be able to set "MonitorRecursiveDirectories" and
"MonitorDirectories"? We assume they are always recursive right now.
thats fine so long as we provide an upgrade path for all changed
I would like to rename "CrawlDirectory". This needs integrating with the
.module files.
I would like to rename "NoWatchDirectory". This is currently working.
â Enable Watching
I would like to rename this to "EnableMonitors"
â Throttle
This needs reimplementing in the indexer. Right now, we don't really
need it - at least my machine copes fine without it, but I think it
might be a good idea to add that back.
yes pls - laptops can get very hot (and with noisy fans too) so some
scaling is needed
â Enable File Content Indexing
â Enable Thumbnails
These need implementing. Plus it would be nicer to call
"EnableThumbnails", "EnableThumbnailIndexing", more consistent. I am
assuming these will both be implemented in the indexer.
yes the former disables text indexing of files but allows metadata
indexing only
â Fast Merges
Carlos is currently working on a solution which means we won't need this
option or to write to separate files temporarily before writing to the
main index. How do you feel about removing this option?
dunno - ext/3 is so shite with fsync
being able to avoid fsyncs would be nice but cannot be done without
hogging disk when doing large writes
â Battery Index
â Battery Index Initial
â Low Disk Space Limit
â Index Mounted Directories
â Index Removable Media
These need some final testing and fixing up.
â Index Email Client
This has been removed since the .module files mean we don't need this now.
â Max Text To Index
This is not used in trunk, can we remove it?
must be used - we should limit text to 1mb by default otherwise gigantic
indexes could result with large files
â Max Words To Index
We should probably use this, it isn't used right now.
as above
â Optimization Sweep Count
This is not used in trunk, can we remove it?
for now yes
â Divisions
This was used in TRUNK to call dpoptimize(). Is this really necessary as
an option? We don't use it in the indexer-split branch yet.
no stick with defaults
â Bucket Ratio
We need to readd this to the indexer-split branch. Unless you think it
is unimportant?
stick with defaults
â Padding
This isn't used in TRUNK, can we remove it?
ïstick with defaults
â Thread Stack Size
This is not used now because we don't create threads.
CONCLUSION:
===========
The idea is to get these options working or removed and once that's done
we can hopefully merge to TRUNK pending a big review from Jamie of course.
One other option we have considered, is adding a config version number,
so we know if we ever have to upgrade config files the migration path
needed. What are your thoughts on this?
might be needed
jamie
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]