Re: [Evolution-hackers] spam filtering

From: Radek Doulík <rodo ximian com>
To: Not Zed <notzed ximian com>
Cc: Evolution Hackers Mailing List <evolution-hackers ximian com>
Subject: Re: [Evolution-hackers] spam filtering
Date: Thu, 02 Oct 2003 15:42:49 +0200
On Thu, 2003-10-02 at 04:52, Not Zed wrote:
> On Thu, 2003-10-02 at 01:57, Radek Doulík wrote: 
> > Hi all,
> > 
> > before I start implementing spam filtering for evolution, I would
> > like to discuss my plan. Please read the whole mail and comment. I
> > am describing the model from user view and then implementation
> > details and some things to think about. I took Ettore's model as a
> > base and modified it a little bit - mostly simplified.
> > 
> > User view
> > 
> >       * incoming messages are identified by spam filter as spam or
> >         nospam (IMAP messages are filtered once completed - fully
> >         downloaded).  
> 
> As jeff said ... i think .. perhaps this will have to be done always,
> and since you may as well do filtering at the same time, it might best
> be done as some sort of implicit filter action (or something run on
> every message before its filtered).

I am not quite sure I understand you here. What do you think about "full
download"/"headers only + spam filter when displaying" option defaulting
to full download?

>   Of course, filters only apply to 'inbox', do we need this to apply
> to other folders too??  (if people are doing server-side filtering,
> presumably they are doing server-side spam filtering too?)

What do you mean by server-side [spam] filtering? That people run their
own spam filter on server? (we may set spam flag by X-Spam flag then)

> 
> >       * spam messages are moved to Spam folder or deleted  
> 
> It would be nice to at least have an option that this is controlled by
> the user.  i.e. by a filter and vFolder/search rule to match the spam
> bit.

Agreed.

You may disable filtering incoming messages and use "Check spam" filter
rule. In reply to Jeff's mail I talk about "Is spam" rule which is
filter part here. And I will add vFolder/search rule to my task list.

> 
> >       * new [No]Spam button on toolbar and item in menubar
> >         Actions/[No]Spam. when message was identified as Spam,
> >         button/item says NoSpam to revive the message from Spam
> >         folder (spam flag is set to false and incoming message
> >         filters are applied). For nospam messages it says spam to
> >         mark message as Spam (spam flag is set to true and message
> >         is moved to Spam folder). 
> 
> Well i think the toolbar is already too cluttered, but sure, this is
> no biggy.  Also plugging into a popup menu should probably be
> considered, and is very easy to do.

Oh, and I considered adding it to popup menu in my reply to Mark and
Anna. Interesting how we all have different view on that. Let see what
Anna think.

> >       * new page labeled "Spam filtering" in Mail preferences
> >         section of Settings dialog 
> >                 [checkbox] filter incoming messages - default:
> >                 enabled Spam messages are [option menu - moved to
> >                 Spam folder/deleted] default: moved to Spam folder
> >                 Spam filter [option menu - spam filters list]
> >                 default: 1st filter Filter options frame with filter
> >                 specific options 
> >         
> 
> Should this be per-account?  Or if it isn't, there may need to be
> something per-account about it - see below.

I wanted to keep it simple. I thought it will be possible to do it
per-account using Filters/Check spam rule. But now I see there's two
sets of filter rules, incoming/outcoming rules. Doesn't make it sense to
have generally filter rules per account?

> > Described above is the simplest model I have. I think simplicity is
> > good here. It also lowers risks of time based schedule. Additional
> > features could be implemented once this model works.
> > 
> > Additional features
> > 
> >       * display spam filter score 
> > 
> >       * "Check spam" filter rule 
> >         some people may not want to filter every incoming message
> >         (because it could be too slow) and instead to filter
> >         messages only per folder. (it's OK to have spam messages in
> >         mailing list folders but not in personal mail folder) 
> > 
> >       * more - add your favorite feature here 
> > 
> > What do you think about this model?
> > 
> > How do you feel about forcing spam messages to be listed only in
> > Spam folder?
> 
> No real preference.  It could be done in many ways, e.g. just a
> vFolder with 'message is spam', and also have an implicit 'hide spam
> messages' (like the 'hide deleted messages') on all other folders.
> 
> And again, should/could this be per-account?

I am for vFolder with hidden spam messages. I don't see how it can be
useful to have it per account.

> > Implementation
> > 
> > I believe it's worth to make spam filter(s) pluggable. There are
> > advantages it has: 
> >       * it's possible to develop spam filter plugin outside
> >         evolution => faster development, lower barrier for external
> >         developers 
> >       * simple API, no added complexity 
> >       * I don't see anything we cannot do with plugins compared to
> >         filter implemented inside evolution 
> 
> Yes yes, but you know I agree here :)  I had some ideas for a more
> generalised plugin system, but its going to be a pretty thin layer
> anyway.  So as a first cut we could start with something simple, then
> just retrofit it later.  I'd like to do it based on gobject/classes,
> because that adds a little more flexibility and some type-ness, and
> just use g_module for the low-level dl* stuff.

I think I will use g_module but keep it simple and not use gobject - so
people who want write plugin don't have to learn glib type system.

> > Plugin will be shared library which will be loaded by dlopen/dlsym.
> > Evo will get SpamFilterStruct by dlsym, check api_version and then
> > use supplied methods.
> > 
> > typedef struct _SpamFilterPlugin SpamFilterPlugin;
> > struct _SpamFilterPlugin
> > {
> > 	/* spam filter human readable name */
> > 	gchar *name;
> > 	/* should be set to 1 */
> > 	gint   api_version;
> > 
> > 	/* when called, it should return TRUE if message is identified as spam,
> > 	   FALSE otherwise */
> > 	gboolean (*check_spam)    (CamelMimeMessage *message);
> > 	/* called when user identified a message to be spam */
> > 	void     (*report_spam)   (CamelMimeMessage *message);
> > 	/* called when user identified a message not to be spam */
> > 	void     (*report_nospam) (CamelMimeMessage *message);
> > 
> > 	/* when called, it should insert own GUI configuration into supplied.
> > 	   container. returns data pointer which is later passed to apply,
> > 	   plugin has to call (*changed_cb) (); whenever configuration
> > 	   is changed to notify settings dialog about a change.
> > 	   if setup_config_ui is NULL, it means there are no options */
> > 	gpointer (*setup_config_ui) (GtkWidget *container, void (*changed_cb) ());
> > 	void     (*apply)           (gpointer data);
> > };
> > 򻮻
> 
> I'd probably suggest the setup_config_ui was a get_widget factory.  If
> the plugin was an instantiated object you could also do signals rather
> than callbacks, and store any context data on it directly.  Would the
> config ever be anything other than per-filter data?  i.e. does it need
> its own context data, or just use the plugin context?  But either way
> is fine, its a simple api.
> 
> > Spam will be identified by check_spam method, spam status changes
> > will be reported to filter by report_[no]spam methods. Plugin may or
> > may not provide configuration gui for Settings dialog.
> > 
> > Spam flag will be stored in X-Spam: header. Also for IMAP we may
> > need X-Evolution-Spam-Checked header.
> 
> As jeff said, we can't add any headers to IMAP.  Even in the worst
> case of filtering IMAP body content, if the message is staying on the
> same server we just do a server-move, and avoid having to re-upload
> the message.  And trying to add the header and put it back in the same
> folder is not really going to work well at all.

okie, I wasn't aware of that, thanks for info.

> > >From discussion on the mailing list, it looks like everybody is for
> > using vFolder for Spam folder. I am not sure if it's that great.
> > Consider this: about 90% of spam messages is identified right, so at
> > worst only 10% of spam will be moved between folders. I am not sure
> > how resources hungry vfolders are. Also messages which end in
> > vfolder, stay there until Expunge. So if I am correct we have to
> > implement message removal from vfolder, mail guys is that right?
> 
> What's wrong with select all/delete/expunge ?

I understand it so that it will delete also messages marked for delete
in source folders of that vFolder, right? So user may loose mail in
Trash which wasn't expunged yet. Or did I misunderstood it?

I am not sure if we talk about the same. I need this in case message in
Spam vFolder is not spam and user press NoSpam button, I need to unset
spam flag on that message and make that message disappear from Spam
vFolder - this is the point I care about - normally messages stays in
vFolder until expunge (IIRC). Because of reasons I described above I
don't want to expunge.

>   Also, If we knew which folder it was we could just have an
> auto-expunge or 'clear spam' menu item too.  This will work for a
> vFolder just like any other folder, and you'll have to do the same
> things with any other folder you'd have to with a vFolder.

If we allow listing of spam messages in source folder then we may want
this. But if not then we don't need anything special, user will be able
to delete messages as usual.

> vFoldering on header flags is pretty fast fwiw.

That's great.

> BTW i'm not advocating either or, i'm just suggesting that either a
> vFolder or a physical folder is a practical solution.
> 
> > If we put them in vfolder, are they going to be visible in the
> > source folder?
> 
> Yes, but we could also implicitly hide them, as we do deleted
> messages.

Agreed. I prefer to hide them.

> > If spam messages will stay in Spam folder only, we don't need new
> > mail message list column with spam flag and also "Delete spam mails"
> > action in menu.
> 
> Well you still might want a global action menu to empty the spam
> folder, rather than having to go to it, delete everything, and
> expunge.

Hmm, maybe. But definitely not per folder basis, right?

> > So the spam mails location seems to be crucial here. I like the
> > simplicity of spam mails to be only visible in Spam folder. What do
> > you think, are there any advantages of having spam messages visible
> > in source folders?
> 
> I actually think the mail location is kind of irrelevent :)  Because
> ... we have the infrastructure to support either mode, fairly simply. 
> i.e. vFolders if people want that, or a separate junk folder if they
> want that.

I guess I meant rather visibility here?

> > I plan to write Spamassassin and Bogofilter plugins (I expect it may
> > work faster, but I tried only spamassassin so far).
> 
> I think we also need one other very simple plugin, a server-side one. 
> i.e. it assumes the server has already added the X-Spam-Status flag,
> and just translates that to the internal camel SPAM flag.
> 
> 
> 
> Ok, now the 'see below' bit, although i've kind of forgotten half of
> what i was going to say.
> 
> One reason we might want per-account spam settings is to configure how
> the account detects spam.  e.g. you might have an isp that
> automagically runs a spamchecker on your pop account, and so the
> account already has X-Spam-Status set.  The same goes for imap, but
> that account might use a different format of the spam checker.  etc
> etc.  I think this could be distilled down to a pretty simple setting
> for people to use (i.e. a checkbox 'check spam' and a dropdown 'how'
> ).  Otherwise we're going to have to force the lowest common solution,
> i.e. download every message and check it and set a local bit to say if
> its spam or not.
> 
> You'd then still have the main spam setting page where you setup how
> spam is dealt with for the 'user friendly default case'.  And for
> 'expert users', they'd disable the spam settings there, and just use
> filters or vFolders directly on their spam-checked account/flags. 
> Having a filter 'spam' option also maps to sieve filtering where the
> server supports the spam-test extension (if we ever get seive done).

Yeah, I think we agree here. Default global simple solution + filter
rules (can these be per folder?).

Huh, pretty long mail to reply, I am finally through. Thanks for your
comments Michael.

I guess I will update my plan now and resend it so we don't forget
anything.

Cheers
Radek
References:
- [Evolution-hackers] spam filtering
  - From: Radek Doulík
- Re: [Evolution-hackers] spam filtering
  - From: Not Zed
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]