Re: Beagle Scoring System



> I have noticed that mail messages seem to get unusually high scores
> from the indexer, while holmes makes the problem much less of a issue
> (since it separates the conversation results) it still seems like
> something worth fixing. I can't seem to figure out exactly why the
> scoring is so off, but an initial guess would be the ease with which
> we can add hotwords for email (subject lines) as opposed to most other
> backends.

(from http://wiki.apache.org/jakarta-lucene/LuceneFAQ )
Lucene automatically adds a weight inversely proportional to the length of the 
field i.e. terms in short fields (like sender name, email address, subject) 
will get a higher weight (known as 'boost') that terms in text. Same holds 
for document metadata - they have more weight than document data/text.

(from my understanding)
Beagle searches several lucene indexes and merges the results based on their 
scores. Somewhere during the process, it recalculates the score based on the 
age of the document. However, absolute value of lucene scores are not 
directly comparable - the ratio (and hence the ranking) between the scores 
are comparable. In that sense, I dont think scores across multiple indexes 
should be directly compared. Ranking in a particular backend is meaningful 
and IMO, that is correct way to do it.

- dBera



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]