Re: [Tracker] Ready for merge



Jamie McCracken wrote:
1.) Index being too big that indexing is slowed right down:

this is why trunk used index merging so it would flush to separate
indexes and then merge them into one when finished.

By experimentation 16MB is a good size for an index hence trunk flushed
in 16mb lumps

doing lots of small writes to a large index causes real IO problems with
ext/3 (ext/4 fixes this) - each word being indexed is a separate seek
and write on the index so you could easily have 100,000+ small updates
being applied. Also when words have hits added the hashtable has to
relocate them to free space which also causes fragmentation - the
optimize qdbm call recovers this by copying the index into a new one
once indexing is finished

as you can see index merging eliminates this perfomance overhead or the
need to call optimize as the final merged index will have no
fragmentation

I should add. Right now you can search while it is indexing. I think you
loose this functionality if you write to temporary files and merge at
the end. Unless all files are opened and read from - but that sounds
like quite a bit of work to get that working in the indexer-split branch.

We have also considered writing to tmp files and merging while we are
indexing... (but only merging all EXCEPT the one we are writing to from
the indexer). So the merge is progressive. The problem is, we have no
idea of time scales for doing this and if it is worth the effort to try
and implement. Jamie do you have any thoughts on this?

-- 
Regards,
Martyn



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]