Re: Beagle Scoring System
- From: Debajyoti Bera <dbera web gmail com>
- To: dashboard-hackers gnome org
- Subject: Re: Beagle Scoring System
- Date: Sat, 17 Dec 2005 14:02:16 -0500
> I have noticed that mail messages seem to get unusually high scores
> from the indexer, while holmes makes the problem much less of a issue
> (since it separates the conversation results) it still seems like
> something worth fixing. I can't seem to figure out exactly why the
> scoring is so off, but an initial guess would be the ease with which
> we can add hotwords for email (subject lines) as opposed to most other
> backends.
(from http://wiki.apache.org/jakarta-lucene/LuceneFAQ )
Lucene automatically adds a weight inversely proportional to the length of the
field i.e. terms in short fields (like sender name, email address, subject)
will get a higher weight (known as 'boost') that terms in text. Same holds
for document metadata - they have more weight than document data/text.
(from my understanding)
Beagle searches several lucene indexes and merges the results based on their
scores. Somewhere during the process, it recalculates the score based on the
age of the document. However, absolute value of lucene scores are not
directly comparable - the ratio (and hence the ranking) between the scores
are comparable. In that sense, I dont think scores across multiple indexes
should be directly compared. Ranking in a particular backend is meaningful
and IMO, that is correct way to do it.
- dBera
[
Date Prev][
Date Next] [
Thread Prev][
Thread Next]
[
Thread Index]
[
Date Index]
[
Author Index]