Re: [Evolution] Built-in spam filtering?



On Thu, Jan 02, 2003 at 12:12:00PM -0500, Jim Frost wrote:
Anyway, in case this spurs someone to do some work, I did spend some
time working on an imap server based bayesian system.  The idea was that
with imap the folders are all on the server and I can easily create a
special "spam" folder that users can drag and drop spam into, and use
their personal folders for the not-spam side of things.  My system was
rebuilding the databases every once in awhile out of cron but with a
built-in system you could do it as-you-go (which would be cool).

My problem with this method is that you will need all your spam and nonspam
messages to properly rebuild the databases.  You will either use large amounts
of disk space or not rebuild the database exactly the same.

This was drop-dead simple to use from the user's point of view (my goal
was that my wife should be able to use it without my help).  The
downfall was that I haven't had the time to get the delivery stuff
working and integrated into my mail delivery system.

Apple's mail client with Jaguar (OSX 10.2) does something more or less
like this, but instead of a spam folder there's a "this is spam"
button.  And instead of moving probable spam into a special folder it
colorizes them or destroys them (at your option).  In some ways I like
this, but I would kind of like to be able to go in and edit the spam
template messages so I think I'd still rather have a spam folder and
have colorization or prioritization versus a trash folder as an option.

I think the Jaguar email client has a better balance of ease of use and
usefulness.  This is much the same way that bogofilter works.  All incoming
messages are marked as spam or not spam.  I just go through and correct it 
when it's wrong.  Interface wise, this is as simple as a "this is spam"
button (and conversely "you were wrong this is not spam" button).  What you 
loose is the ability to edit your templates readily.  What you gain is 
requiring maintaince of the databases in their original email form.  

The whole goal of the bayesian filter is to learn what you think is spam, and
is not spam.  I don't find myself re-reclassifying my email that often.  If I 
thought the message was spam yesterday, I'm not likely to think it's not spam
today.  Requiring the user to track all their spam messages will also require
them to track their nonspam messages if they want accurate results.  Since,
you are not only keeping track of bad words, but good words too. 

The so called "power" user should have the option to view the databases,
add words, modify weights, and so forth.

I note that I looked into spamassassin, which seems to be the preferred
technique using an external filter, and I really dislike its rule-based
system.  Way too many false positives, and a lot of work to set up and
maintain too.  Spam filtering would be a great integrated feature and
doesn't look like it'd be a lot of work to implement.


spamassassin isn't that bad.  Though it can't beat any bayesian filters.  
The advantage is that it comes fully ready out of the box.  Bayesian filters,
before they are trained will do poorly.

The comand line bayesian filters I saw were bogofilter (what I use), and ifile
(which uses the spam folders and such).  If you are using a pop host, I'd
recommend popfile (I've used it with OE, and recommended it to all my friends
who use Outlook or OE with a pop host).  It has a nice web based interface
to modify the word lists and such.




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]