Re: [Evolution] Spam Filtering



On Mon, 2006-01-23 at 16:10 -0500, Lee Revell wrote:
On Mon, 2006-01-23 at 15:02 -0600, Gregg Fowler wrote:
I just started using Evolution with Ubuntu last Saturday.  While I am
really happy with the program, I don't believe the Spam Filter is
working correctly. At first I realized that the Spam Assasin package was
required and I installed it. The filter still doesn't seem to catch
much. If anyone could be of help, I would certainly appreciate it. I am
migrating from Windows XP and am committed to making Evolution work for
me. Thus far I really like it. I also have the remote filtering box
checked. 

Known bug, currently the spam filtering implementation does not work, as
spamassassin does not start to work until it has learned 200 non-spam
messages, and Evo has no way to teach SA what a non-spam message ("ham")
looks like.

Ooh, wait... This is not correct.

SpamAssassin needs to be trained at least 200 Spam and Ham (non-Junk)
messages *each*, before the *Baesian* Classifier works. Any other SA
built-in rules and remote tests *do* work out of the box without any
training at all.


Anyway, you are correct that there is a "known bug". Up to Evo 2.4.x
there is no (good) way to train the SA Bayes filter at all, unless SA
classifies a mail incorrectly.


A hack-ish workaround to train Bayes using Evo is the following: Pick at
least 200 non-Junk messages, and keep them in mind. Now mark them as
Junk using the Evo UI. Go to your Junk folder, and mark all those
non-Junk mail we just abused in a sacrificial manner and correctly mark
them as non-Junk. SA will realize it learned these messages previously,
and learn them as Ham (non-Junk) only, AFAIK. Now we got the 200 Hams
learned. Collecting at least 200 Spams for learning shouldn't be hard I
guess. ;-)

Note: This really is a *hack* only, and you should not try this unless
you feel a little bit adventurous. :)


You can work around it by using sa-learn on the command line.  See the
spamassassin docs for more info.

Yes. :)  Please note though, that this is a safe approach only, if you
are *really* confident that there is *no* Junk in those folders you are
training as Ham. (Having all Junk removed ensures this. :)

For a safe way of training manually using 'sa-learn' I recommend saving
at least 200 Hams and Spams each into dedicated Ham and Spam files. The
safed files are in mbox format and can be learned easily using
'sa-learn' as Lee pointed out.

The Junk folder is a vFolder only, which effectively means it is a
Search over all existing real folders, displaying those mails that are
marked as Junk. The Junk mail still remains in it's physical mail folder
(the mbox format file).

...guenther


-- 
char *t="\10pse\0r\0dtu\0  ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]