Re: [Evolution-hackers] spam filtering

From: Jeffrey Stedfast <fejj ximian com>
To: Radek Doulík <rodo ximian com>
Cc: Evolution Hackers Mailing List <evolution-hackers ximian com>
Subject: Re: [Evolution-hackers] spam filtering
Date: Wed, 01 Oct 2003 16:17:19 -0400
On Wed, 2003-10-01 at 12:27, Radek Doulík wrote:
> Hi all,
> 
> before I start implementing spam filtering for evolution, I would like
> to discuss my plan. Please read the whole mail and comment. I am
> describing the model from user view and then implementation details
> and some things to think about. I took Ettore's model as a base and
> modified it a little bit - mostly simplified.
> 
> User view
> 
>       * incoming messages are identified by spam filter as spam or
>         nospam (IMAP messages are filtered once completed - fully
>         downloaded). 

this is where things are unclear:

when will these imap messages be fully downloaded? when the user opens
the message? (normally an imap message is not fully downloaded until the
user clicks on it in the message-list)

or will we now just always download entire messages when the user opens
an imap folder? this would suck.

>       * spam messages are moved to Spam folder or deleted 
>       * new [No]Spam button on toolbar and item in menubar
>         Actions/[No]Spam. when message was identified as Spam,
>         button/item says NoSpam to revive the message from Spam folder
>         (spam flag is set to false and incoming message filters are
>         applied). For nospam messages it says spam to mark message as
>         Spam (spam flag is set to true and message is moved to Spam
>         folder).

I presume that you plan to add a CAMEL_MESSAGE_SPAM system flag? (see
camel/camel-folder-summary.h for details)

(note: this is fine by me, just asking for clarification)

> 
>       * new page labeled "Spam filtering" in Mail preferences section
>         of Settings dialog 
>                 [checkbox] filter incoming messages - default: enabled
>                 Spam messages are [option menu - moved to Spam
>                 folder/deleted] default: moved to Spam folder

are we only going to have a local Spam folder that is always used? or do
we wish to allow users to pick and choose the location of their spam
folder? if we will allow the user to pick a spam folder, should this
then be a per-account preference?

a local spam folder would be easiest, but I'm not sure if this would be
acceptable to users?

>                  
>                 Spam filter [option menu - spam filters list] default:
>                 1st filter

I'm not sure I understand this... what does "1st filter" mean? what
other options would there be? etc.

>                  
>                 Filter options frame with filter specific options

kinda hazy on this as well... what sort of specific options are we
talking about?

>                  
>         
> 
> Described above is the simplest model I have. I think simplicity is
> good here. It also lowers risks of time based schedule. Additional
> features could be implemented once this model works.
> 
> Additional features
> 
>       * display spam filter score
> 
>       * "Check spam" filter rule
>         some people may not want to filter every incoming message
>         (because it could be too slow) and instead to filter messages
>         only per folder. (it's OK to have spam messages in mailing
>         list folders but not in personal mail folder)

this might be cool.

> 
>       * more - add your favorite feature here
> 
> What do you think about this model?
> 
> How do you feel about forcing spam messages to be listed only in Spam
> folder?

this doesn't bother me.

> 
> Implementation
> 
> I believe it's worth to make spam filter(s) pluggable. There are
> advantages it has: 
>       * it's possible to develop spam filter plugin outside evolution
>         => faster development, lower barrier for external developers 
>       * simple API, no added complexity 
>       * I don't see anything we cannot do with plugins compared to
>         filter implemented inside evolution

I'll agree with this...

> 
> Plugin will be shared library which will be loaded by dlopen/dlsym.
> Evo will get SpamFilterStruct by dlsym, check api_version and then use
> supplied methods.

sounds reasonable. note that NotZed had been working on a plugin api for
the mailer, might want to talk to him. he had ideas on versioning and so
forth which might be nice.

> 
> typedef struct _SpamFilterPlugin SpamFilterPlugin;
> struct _SpamFilterPlugin
> {
> 	/* spam filter human readable name */
> 	gchar *name;
> 	/* should be set to 1 */
> 	gint   api_version;
> 
> 	/* when called, it should return TRUE if message is identified as spam,
> 	   FALSE otherwise */
> 	gboolean (*check_spam)    (CamelMimeMessage *message);
> 	/* called when user identified a message to be spam */
> 	void     (*report_spam)   (CamelMimeMessage *message);
> 	/* called when user identified a message not to be spam */
> 	void     (*report_nospam) (CamelMimeMessage *message);
> 
> 	/* when called, it should insert own GUI configuration into supplied.
> 	   container. returns data pointer which is later passed to apply,
> 	   plugin has to call (*changed_cb) (); whenever configuration
> 	   is changed to notify settings dialog about a change.
> 	   if setup_config_ui is NULL, it means there are no options */
> 	gpointer (*setup_config_ui) (GtkWidget *container, void (*changed_cb) ());
> 	void     (*apply)           (gpointer data);
> };
> 򻮻
> 
> Spam will be identified by check_spam method, spam status changes will
> be reported to filter by report_[no]spam methods. Plugin may or may
> not provide configuration gui for Settings dialog.
> 
> Spam flag will be stored in X-Spam: header. Also for IMAP we may need
> X-Evolution-Spam-Checked header.

for imap this is going to suck pretty hardcore. there's no way to append
headers to an IMAP message, the only option then is to download the
message, add the header(s), append the message back to the IMAP mailbox,
and finally delete the original message (and expunge?).

this is gonna be a killer for performance.

> 
> >From discussion on the mailing list, it looks like everybody is for
> using vFolder for Spam folder. I am not sure if it's that great.
> Consider this: about 90% of spam messages is identified right, so at
> worst only 10% of spam will be moved between folders. I am not sure
> how resources hungry vfolders are. Also messages which end in vfolder,
> stay there until Expunge. So if I am correct we have to implement
> message removal from vfolder, mail guys is that right?

I'm indifferent about vfolders vs physical spam folders. as far as
needing to add functionality to vfolders to get removing to work, you'll
have to wait for Zucchi's response.

> 
> If we put them in vfolder, are they going to be visible in the source
> folder?

yes.

> 
> If spam messages will stay in Spam folder only, we don't need new mail
> message list column with spam flag and also "Delete spam mails" action
> in menu.
> 
> So the spam mails location seems to be crucial here. I like the
> simplicity of spam mails to be only visible in Spam folder. What do
> you think, are there any advantages of having spam messages visible in
> source folders?

the advantage of vfolders in the spam scenario is the same as the
delete/undelete case. if the user decides to un-mark the message as
being spam, it immediately returns to the folder it would normally have
been contained in and even back into its original location in an
unsorted message-list.

if we have a physical spam folder that spam gets moved into when thought
to be spam, when the user decides to un-mark the msg as spam - we need
to do one of the following:

1. move it back to the original folder (thus we'd need to keep track of
which folder it had originated from)
2. just always move it into Inbox?
3. don't move it anywhere, thus forcing the user to move it to where he
wants it?
4. ask the user where to move it?

> 
> I plan to write Spamassassin and Bogofilter plugins (I expect it may
> work faster, but I tried only spamassassin so far).

sounds like a start.

Jeff

-- 
Jeffrey Stedfast
Evolution Hacker - Ximian, Inc.
fejj ximian com  - www.ximian.com
Follow-Ups:
- Re: [Evolution-hackers] spam filtering
  - From: Joe Shaw
- Re: [Evolution-hackers] spam filtering
  - From: Radek Doulík
- Re: [Evolution-hackers] spam filtering
  - From: guenther
References:
- [Evolution-hackers] spam filtering
  - From: Radek Doulík
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]