Re: [Evolution] Tying the ribbons on Evolution and SpamAssassin

I'm about 95% percent of the way there in getting Evolution and
SpamAssassin to play nicely together. I need a bit of help on the last
5%. And the "playing field" is a set of preexisting filter folders set
up under Inbox, not under VFolders.

I've set up a Can-O-Spam (COS) filter in Evolution, as per the Help
file, moved it to the head of the filter list, and SpamAssassin is now
sending files to that folder. I did add one step; I added "stop
processing" to the COS filter to prevent quarantined emails from ending
up in other filter folders -- including Inbox -- as well when they first
came in. Without the stop-processing command, SA would nab suspects, but
they'd also end up in Inbox. I'm now wondering whether the
stop-processing applies only to the spam filter I set up or the whole
filtering process. Anyway...

The "Stop Processing" action simply is an action like any other. It
means, that IF the filters conditions are met, THEN this mail won't be
filtered any more -- if there are other filters.

Without this "Stop Precessing" action you can have multiple filters
acting on the very same mail. Like copying to different folders if
different conditions are met (this would double that mail actually).

Then, I manually picked out the keepers and stored them in a training
folder. I ran sa-learn --ham --mbox /path/to/trainingfolder/mbox from a
command line (perhaps a script might work better) to identify these as
keepers. It told me it learned from X number of emails in the mbox. So
far, I'm happy. Then I ran Evolution's filters on the keepers to
distribute them to the correct filter folders, thinking that once SA
"learns" it will ignore the keepers and allow the filter to place them
in the right location. Instead, the keepers showed up in the COS folder

Seems like a problem understanding SA and Bayes to me.

The mails you are learning with 'sa-learn --ham' are training your Bayes
database. With the help of this database, SA can decide on the words
contained in the mail, if it looks like SPAM or not.

Bayes does not kick in, unless you have at least trained it with 200
SPAMs and 200 HAMs, IIRC.

This effectively means, the second filtering on the same messages
probably will result in the very same scores being assigned by SA.

A personal note: Your description above sounds, like you get a lot of
mails marked as SPAM which are actually *not*? This is not a good sign
to me. You should *definitely* not get more than 1 False Positive for
100 SPAMs. Personally, I am *way* below that.

If you get too much FP, you should adjust the SA settings. Have you
changed any defaults? Are those messages somewhat "special"? What tests
do you have enabled in SA?

It may be a wise idea, to filter any known-good-mail and never let SA
judge on them. (This means, add "Stop Precessing" actions to any filters
which are before SA and definitely know the mails are HAM.)

Hope, this helps at least a bit...


char *t="\10pse\0r\0dtu\0  ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]