[Tracker] Getting ready for TRUNK: Config options



Hi all,

So as part of finishing up the indexer-split branch to try and get
things into a state to begin merging, I have been looking at the config
options we have and checking we implement them and if we don't finding
out if we need them.

Here are the current config options and some questions/comments about them.


WORKING OPTIONS:
================

â Verbosity
â Initial Sleep
â Enable Indexing
â Min Word Length
â Max Word Length
â Language
â Enable Stemmer
â Max Bucket Count
â Min Bucket Count
â Enable Xesam


NOT WORKING OPTIONS:
====================

â Low Memory Mode

This option currently has no effect in the indexer-split branch.
In TRUNK, it is used to:

  1. Set the cache size to 1/2 of what it is normally when loading DBs
  2. Set the array in update_word_table() to be 1/2 size.
  3. It affects these variables (which are usually 1/2 in low mem mode):

     a)           tracker->memory_limit = 16000 *1024;

     b)           tracker->max_process_queue_size = 5000;
     c)           tracker->max_extract_queue_size = 5000;

     d)           tracker->word_detail_limit = 2000000;
     e)           tracker->word_detail_min = 0;
     f)           tracker->word_count_limit = 500000;
     g)           tracker->word_count_min = 0;

For #1, I think this makes sense to reimplement
For #2, I think this is pointless if the array grows
For #3a, The memory limit is used to know when to flush the word cache.
This needs reimplementing in the indexer.
For #3b, The process queue size is used to know how big the files queue
can get before it should be processed in the database. This is done now
by the indexer and I am not sure it is pertinent any longer.
For #3c, This is the same as #3b.
For #3d, This is unused in TRUNK.
For #3e, This is unused in TRUNK
For #3f, This is unused in TRUNK.
For #3g, This is unused in TRUNK

â NFS Locking

Do we need this? What is it for - as far as I can see, it is just some
simple locking mechanism using a file on the disk. What needs this? Can
we remove it?

â Watch Directory Roots
â Crawl Directory
â No Watch Directory
â No Index File Types

These closely map to the .module files. I would like to rename them to
map exactly so they are obviously an override or addition to the
non-user space config of each module. What are your thoughts here?

I would like to rename "WatchDirectoryRoots". Everyone, even GIO uses
"monitor", instead of "watch" and you can supply a list so it isn't just
one. Also, should we have ANOTHER option like we do in the module files
right now to be able to set "MonitorRecursiveDirectories" and
"MonitorDirectories"? We assume they are always recursive right now.

I would like to rename "CrawlDirectory". This needs integrating with the
.module files.

I would like to rename "NoWatchDirectory". This is currently working.

â Enable Watching

I would like to rename this to "EnableMonitors"

â Throttle

This needs reimplementing in the indexer. Right now, we don't really
need it - at least my machine copes fine without it, but I think it
might be a good idea to add that back.

â Enable File Content Indexing
â Enable Thumbnails

These need implementing. Plus it would be nicer to call
"EnableThumbnails", "EnableThumbnailIndexing", more consistent. I am
assuming these will both be implemented in the indexer.

â Fast Merges

Carlos is currently working on a solution which means we won't need this
option or to write to separate files temporarily before writing to the
main index. How do you feel about removing this option?

â Battery Index
â Battery Index Initial
â Low Disk Space Limit
â Index Mounted Directories
â Index Removable Media

These need some final testing and fixing up.

â Index Email Client

This has been removed since the .module files mean we don't need this now.

â Max Text To Index

This is not used in trunk, can we remove it?

â Max Words To Index

We should probably use this, it isn't used right now.

â Optimization Sweep Count

This is not used in trunk, can we remove it?

â Divisions

This was used in TRUNK to call dpoptimize(). Is this really necessary as
an option? We don't use it in the indexer-split branch yet.

â Bucket Ratio

We need to readd this to the indexer-split branch. Unless you think it
is unimportant?

â Padding

This isn't used in TRUNK, can we remove it?

â Thread Stack Size

This is not used now because we don't create threads.


CONCLUSION:
===========

The idea is to get these options working or removed and once that's done
we can hopefully merge to TRUNK pending a big review from Jamie of course.

One other option we have considered, is adding a config version number,
so we know if we ever have to upgrade config files the migration path
needed. What are your thoughts on this?

-- 
Regards,
Martyn



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]