Re: [Evolution-hackers] spam filtering



On Wed, 2003-10-01 at 22:17, Jeffrey Stedfast wrote:
> On Wed, 2003-10-01 at 12:27, Radek Doulík wrote:
> > Hi all,
> > 
> > before I start implementing spam filtering for evolution, I would like
> > to discuss my plan. Please read the whole mail and comment. I am
> > describing the model from user view and then implementation details
> > and some things to think about. I took Ettore's model as a base and
> > modified it a little bit - mostly simplified.
> > 
> > User view
> > 
> >       * incoming messages are identified by spam filter as spam or
> >         nospam (IMAP messages are filtered once completed - fully
> >         downloaded). 
> 
> this is where things are unclear:
> 
> when will these imap messages be fully downloaded? when the user opens
> the message? (normally an imap message is not fully downloaded until the
> user clicks on it in the message-list)
> 
> or will we now just always download entire messages when the user opens
> an imap folder? this would suck.

I counted on previous discussion here, IIRC it was suggested that in
offline mode we have whole messages (so we can filter while they are
downloaded) and in case we have only headers available, the message is
filtered when displayed - so message is identified as spam at that time
- seems pretty inconvenient, but it's the price for speed. Also IIRC
Michael have a patch for switching full download on and off. So we can
use this option. Then the question really is what to make default
behavior, full download or headers-only. My opinion is to make full
download as default, but I am not 100% sure about this.

> >       * spam messages are moved to Spam folder or deleted 
> >       * new [No]Spam button on toolbar and item in menubar
> >         Actions/[No]Spam. when message was identified as Spam,
> >         button/item says NoSpam to revive the message from Spam folder
> >         (spam flag is set to false and incoming message filters are
> >         applied). For nospam messages it says spam to mark message as
> >         Spam (spam flag is set to true and message is moved to Spam
> >         folder).
> 
> I presume that you plan to add a CAMEL_MESSAGE_SPAM system flag? (see
> camel/camel-folder-summary.h for details)
> 
> (note: this is fine by me, just asking for clarification)

Yes, I meant such a flag. It will be stored in folder summary file,
which is local, right?

> > 
> >       * new page labeled "Spam filtering" in Mail preferences section
> >         of Settings dialog 
> >                 [checkbox] filter incoming messages - default: enabled
> >                 Spam messages are [option menu - moved to Spam
> >                 folder/deleted] default: moved to Spam folder
> 
> are we only going to have a local Spam folder that is always used? or do
> we wish to allow users to pick and choose the location of their spam
> folder? if we will allow the user to pick a spam folder, should this
> then be a per-account preference?
> 
> a local spam folder would be easiest, but I'm not sure if this would be
> acceptable to users?

After reading all the answers I am convinced that Spam vFolder will be
better than regular folder as it will avoid moving remote messages to
[local] Spam folder (wherever it should have been located).

> >                  
> >                 Spam filter [option menu - spam filters list] default:
> >                 1st filter
> 
> I'm not sure I understand this... what does "1st filter" mean? what
> other options would there be? etc.

I just wanted describe that 1st available filter plugin will be selected
by default. I guess we can even link some of plugin(s) to evolution and
in that case it should default to that/best one.

> >                  
> >                 Filter options frame with filter specific options
> 
> kinda hazy on this as well... what sort of specific options are we
> talking about?

option specific to selected plugin (local tests only for spamassassin,
...)

> 
> >                  
> >         
> > 
> > Described above is the simplest model I have. I think simplicity is
> > good here. It also lowers risks of time based schedule. Additional
> > features could be implemented once this model works.
> > 
> > Additional features
> > 
> >       * display spam filter score
> > 
> >       * "Check spam" filter rule
> >         some people may not want to filter every incoming message
> >         (because it could be too slow) and instead to filter messages
> >         only per folder. (it's OK to have spam messages in mailing
> >         list folders but not in personal mail folder)
> 
> this might be cool.

yes, it will allow people to customize it for their needs, like filter
only personal mail, move all spam to their preferred folder, etc.

We should have also "Is spam" filter rule which will use spam flag.

> 
> > 
> >       * more - add your favorite feature here
> > 
> > What do you think about this model?
> > 
> > How do you feel about forcing spam messages to be listed only in Spam
> > folder?
> 
> this doesn't bother me.
> 
> > 
> > Implementation
> > 
> > I believe it's worth to make spam filter(s) pluggable. There are
> > advantages it has: 
> >       * it's possible to develop spam filter plugin outside evolution
> >         => faster development, lower barrier for external developers 
> >       * simple API, no added complexity 
> >       * I don't see anything we cannot do with plugins compared to
> >         filter implemented inside evolution
> 
> I'll agree with this...
> 
> > 
> > Plugin will be shared library which will be loaded by dlopen/dlsym.
> > Evo will get SpamFilterStruct by dlsym, check api_version and then use
> > supplied methods.
> 
> sounds reasonable. note that NotZed had been working on a plugin api for
> the mailer, might want to talk to him. he had ideas on versioning and so
> forth which might be nice.
> 
> > 
> > typedef struct _SpamFilterPlugin SpamFilterPlugin;
> > struct _SpamFilterPlugin
> > {
> > 	/* spam filter human readable name */
> > 	gchar *name;
> > 	/* should be set to 1 */
> > 	gint   api_version;
> > 
> > 	/* when called, it should return TRUE if message is identified as spam,
> > 	   FALSE otherwise */
> > 	gboolean (*check_spam)    (CamelMimeMessage *message);
> > 	/* called when user identified a message to be spam */
> > 	void     (*report_spam)   (CamelMimeMessage *message);
> > 	/* called when user identified a message not to be spam */
> > 	void     (*report_nospam) (CamelMimeMessage *message);
> > 
> > 	/* when called, it should insert own GUI configuration into supplied.
> > 	   container. returns data pointer which is later passed to apply,
> > 	   plugin has to call (*changed_cb) (); whenever configuration
> > 	   is changed to notify settings dialog about a change.
> > 	   if setup_config_ui is NULL, it means there are no options */
> > 	gpointer (*setup_config_ui) (GtkWidget *container, void (*changed_cb) ());
> > 	void     (*apply)           (gpointer data);
> > };
> > 򻮻
> > 
> > Spam will be identified by check_spam method, spam status changes will
> > be reported to filter by report_[no]spam methods. Plugin may or may
> > not provide configuration gui for Settings dialog.
> > 
> > Spam flag will be stored in X-Spam: header. Also for IMAP we may need
> > X-Evolution-Spam-Checked header.
> 
> for imap this is going to suck pretty hardcore. there's no way to append
> headers to an IMAP message, the only option then is to download the
> message, add the header(s), append the message back to the IMAP mailbox,
> and finally delete the original message (and expunge?).
> 
> this is gonna be a killer for performance.

ah, didn't know that. hmm, so I guess I have to make vFolder be
specified by Spam flag then. Do you know if adding it to local files has
any use? I don't see any right now.

> > 
> > >From discussion on the mailing list, it looks like everybody is for
> > using vFolder for Spam folder. I am not sure if it's that great.
> > Consider this: about 90% of spam messages is identified right, so at
> > worst only 10% of spam will be moved between folders. I am not sure
> > how resources hungry vfolders are. Also messages which end in vfolder,
> > stay there until Expunge. So if I am correct we have to implement
> > message removal from vfolder, mail guys is that right?
> 
> I'm indifferent about vfolders vs physical spam folders. as far as
> needing to add functionality to vfolders to get removing to work, you'll
> have to wait for Zucchi's response.

Zucchi says it's pretty fast. Cool.

> > 
> > If we put them in vfolder, are they going to be visible in the source
> > folder?
> 
> yes.

I think it's better to hide them, at least as default. Do you think that
showing them has any advantage/possible use?

> > 
> > If spam messages will stay in Spam folder only, we don't need new mail
> > message list column with spam flag and also "Delete spam mails" action
> > in menu.
> > 
> > So the spam mails location seems to be crucial here. I like the
> > simplicity of spam mails to be only visible in Spam folder. What do
> > you think, are there any advantages of having spam messages visible in
> > source folders?
> 
> the advantage of vfolders in the spam scenario is the same as the
> delete/undelete case. if the user decides to un-mark the message as
> being spam, it immediately returns to the folder it would normally have
> been contained in and even back into its original location in an
> unsorted message-list.
> 
> if we have a physical spam folder that spam gets moved into when thought
> to be spam, when the user decides to un-mark the msg as spam - we need
> to do one of the following:
> 
> 1. move it back to the original folder (thus we'd need to keep track of
> which folder it had originated from)
> 2. just always move it into Inbox?
> 3. don't move it anywhere, thus forcing the user to move it to where he
> wants it?
> 4. ask the user where to move it?

I was thinking about applying filters on it, but now I am for vFolder
scenario.

Cheers
Radek

> 
> > 
> > I plan to write Spamassassin and Bogofilter plugins (I expect it may
> > work faster, but I tried only spamassassin so far).
> 
> sounds like a start.
> 
> Jeff





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]