[Evolution-hackers] Spam filtering thoughts



I think we need to have some built-in anti-spam functionality in Evo
2.0.

A possible list of requirements for this feature:

      * It should be simple, work out of the box and make sense.

      * It should have Bayesan filtering, i.e. the user should be able
        to train the filter to recognize spam.  (A la Apple Mail or
        Mozilla Thunderbird.)

      * It should also have good built-in anti-spam functionality that
        doesn't require training.

      * It shoudl let the user define a filter action for when a message
        is spam.

      * Support black lists and white lists.  (This could be optional.)

For the actual spam detection, I think we should just use Spamassassin. 
It works great, is actively maintained, and is very simple to interface
to.  It also does a great job without any training, although it does
support Bayesan filtering.

(If at some point some better solution shows up, it should be fairly
easy to either switch to the new solution, or make the interfaces
generic enough to make the system fully pluggable.  However, I don't
think pluggability is a must; let's focus on the user experience first.)

>From the mail side of things, we could do something like this:

      * We add a way for Camel to mark a message as spam or not.  It
        should probably have to be a bit in the summary, the same way we
        handle things like colors/labels.

      * We invoke Spamassassin in spam detection mode every time a new
        message goes through the mailer, and set the corresponding bit
        in Camel according to what it tells us about the message.

      * We give the user a way to say what happens to messages detected
        as spam; i.e. whether they should stay in the folder, or moved
        to a "Junk" folder or deleted.  (This should probably be
        separate from the general filter dialog, because this choice
        should be available without the user having to understand
        filters.)

      * We put a button in the mail toolbar to mark a message as spam or
        not spam.  When a message gets marked by the user as spam or not
        spam, Evolution sends it to Spamassassin to train the filter
        accordingly.

      * As an additional aid to the user, messages detected as spam
        could be displayed with a header saying "this message is spam,
        if it's not click here to mark it as non spam", like in Apple
        Mail:

                http://www.apple.com/macosx/jaguar/images/mainmailbox.jpg
                
        This button would behave exactly like the "not spam!" toolbar
        button.

      * We add a command to delete all the messages marked as spam in
        the current folder.

Now the possible issues:

      * In the case of IMAP mail, when would the spam value get
        measured?  We have to download the message before we decide
        whether it's spam or not.  I guess if we make Evolution download
        all messages by default in the background for offline then we
        can just hook the spam detection to that.  (But if the user
        turns that off, then detection wouldn't happen until she opens
        the message.)

      * We would probably want it to automatically recognize messages
        that went through a spam filter in the server already.  I.e.
        Evolution could recognize the X-Spam headers and turn the "this
        is spam" bit on automatically when appropriate?

      * Should the "Junk" folder be implemented as a vfolder, like the
        Trash?  Then messages wouldn't actually move back and forth
        between folders, and if there is a false positive the user
        marking the message as "not spam" would automatically result in
        the message disappearing from the vfolder and appearing in its
        original folder.

Thoughts?

-- Ettore



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]