Hey, Just want to describe an alternate (not spamassassin) based spam filtering & training setup I've been using successfully with Evolution, in case anyone is interested. So I use bogofilter ( http://bogofilter.sourceforge.net/ ) as my spam identifier. I've had good luck with it over the past couple years. Indeed, I tried switching back to spamassassin once Evo 1.5 got serious about the built in hooks to fire up spamd and use spamc to talk to it, but after a while found that I was really unhappy with spamassassin's performance [at correctly identifying spam]. More importantly, I do my bogofilter-ing server side, inline the delivery pipe that my [ISP's] server uses (they run Qmail, and it's easy to hook into the local delivery process). So that left me with the problem of wanting to use Evolutions Junk / Not Junk buttons to train my filter (with the resultant wordlist file sitting on the client) but wanting to have that wordlist up on the server for the bogofilter there to work off of. The solution was three fold: override what Evo does to train, rsync the word list serverside and do the actual scanning there, but do the actual "check messages headers and sort accordingly" in a Evo client side rule. In order: (1) First, glancing at the code in em-junk-filter.c, I was able to figure out what calls Evo is making when one presses the Junk or Not Junk buttons. It composes a command line along the lines of sa-learn --spam --norebuild < MESSAGE_DATA and sa-learn --ham --norebuild < MESSAGE_DATA So what I did was override the sa-learn file [1]. Since I didn't want to try and replace a system binary (whether or not spamassassin was installed) [2], I wrote a tiny wrapper script and stuck it in ~/bin. The wrapper intercepts the call to sa-learn, and instead calls bogofilter -s or -n, as appropriate, to learn. I attached my script for anyone interested. Of course, to ensure that Evo sees my script instead of /usr/bin/sa- learn, I need to invoke Evolution as PATH=~/bin:$PATH /usr/bin/evolution Which isn't that big of a deal [3]. (2) I now have a growing, better trained ~/.bogofilter/wordlist.db on my client machine. But I want to do the actual scanning server side, because it means that the CPU work of spam checking and preliminary sorting will be done ahead of time, before I see the messages. So I simply use rsync to push that file to the server. Nothing more complicated than rsync --verbose \ --recursive \ -e /usr/bin/ssh \ --partial \ --progress \ ~/.bogofilter afcowie server mycolo com:/home/afcowie On the server, my delivery instruction (a .qmail file) is along the lines of | /var/qmail/bin/preline /home/afcowie/bin/bogofilter -H -e -p \ | /home/afcowie/bin/maildrop The -e -p to bogofilter passes messages through regardless (don't want positives to be bounced right there, tempting as that may be, because we want to be able to train false positives and false negatives on the client in Evo with those terrific zippy Junk / Not Junk buttons!)... ... and maildrop (think procmail) has a really great little mail sorting language, see http://www.courier-mta.org/maildropex.html . So server side I do preliminary sorting of traffic to folders titled Clients, Boards, and Lists (just so that if I *am* using webmail, I have a chance in hell of seeing messages from my customers - also helps downstream when composing rules for vFolders in Evo). Note that I *don't* railroad a message marked with X-Spam-Status: Yes off to a ProbableSpam folder or whatever because if, in Evo, I find a false positive or negative, I want to be able to train it using Evo's wonderful UI. (3) New messages are fetched by Evolution's IMAP code across four folders. In combination with NotZed's one liner "apply filters to all IMAP folders" patch [4], I set up an incoming Filter set up to look for X- Spam-Status: Yes, and if so, does "Set Status" as "Junk" (puts it in the Junk auto-vfolder) & "Set Status" as "Read" (so that it doesn't clutter my unread counts). [5] And done! If I get a wrongly classified message, I use the {Junk | Not Junk} buttons. Evo moves the message {to Junk meta folder | back to the folder it came from and should have been in}, and calls sa-learn (which I've overriden to call bogofilter} to learn from the mistake. And I periodically push via rsync bogofilter's wordlist up to the server. [Note I'm not using autolearn server side, because then there would be a two way sync problem, and there's no reason to, really] And it all Just Works (tm) AfC Sydney [1] This is all highly dependent on the exact form of the exec calls in em-junk-filter.c . If those change, this will need to be tweaked. [2] In fact, it turns out that the training code attempts to activate spamd, and if it fails, bails out without doing any training. That's not very good, because it means in my case I have to have SpamAssassin installed, just so Evo can start it, just so I can ignore it and do bayes training. However, I'd say more generally that firing up spamd is is unnecessary if all the user is doing is training (indeed, if they don't have "filter incoming messages for junk selected) then that fire- up-spamd should never need to happen - but still, allow the training cycle to occur. [3] But it sure would be nice if I could just tell evo what training program to use. Devs aren't about to write that UI, I know. [4] No problems with the filter on all folders thing so far! [5] I know Jeff is going to be working on the IMAP code again sometime soon. It seems like under POP the messages get passed to the filters before they show up as unread in a folder; in my IMAP case, I get a blob of unread messags in INBOX, then half a second later they vanish as they get Junk classified. Not sure if that's fixable. [6] hey, so I just attached a little shell script as an example, but it's showing up as MIME type application/x-shellscript . I certainly wouldn't want anyone's client to try and just *run* this script (its not like its a photo which needs a viewer) - I want to deliver it as text/plain so people can glance at it if they want to. How do I do that? Hm. Anyway, to workaround and achieve text/plain, I stripped the #!/bin/sh line. -- Andrew Frederick Cowie OPERATIONAL DYNAMICS Operations Consultants and Infrastructure Engineers http://www.operationaldynamics.com/
Attachment:
sa-learn
Description: Text document
Attachment:
signature.asc
Description: This is a digitally signed message part