Re: [Evolution-hackers] spam filtering



On Thu, 2003-10-02 at 01:57, Radek Doulík wrote:
Hi all,

before I start implementing spam filtering for evolution, I would like to discuss my plan. Please read the whole mail and comment. I am describing the model from user view and then implementation details and some things to think about. I took Ettore's model as a base and modified it a little bit - mostly simplified.

User view


As jeff said ... i think .. perhaps this will have to be done always, and since you may as well do filtering at the same time, it might best be done as some sort of implicit filter action (or something run on every message before its filtered).  Of course, filters only apply to 'inbox', do we need this to apply to other folders too??  (if people are doing server-side filtering, presumably they are doing server-side spam filtering too?)


It would be nice to at least have an option that this is controlled by the user.  i.e. by a filter and vFolder/search rule to match the spam bit.


Well i think the toolbar is already too cluttered, but sure, this is no biggy.  Also plugging into a popup menu should probably be considered, and is very easy to do.


Should this be per-account?  Or if it isn't, there may need to be something per-account about it - see below.

Described above is the simplest model I have. I think simplicity is good here. It also lowers risks of time based schedule. Additional features could be implemented once this model works.

Additional features

  • display spam filter score

  • "Check spam" filter rule
some people may not want to filter every incoming message (because it could be too slow) and instead to filter messages only per folder. (it's OK to have spam messages in mailing list folders but not in personal mail folder)

  • more - add your favorite feature here

What do you think about this model?

How do you feel about forcing spam messages to be listed only in Spam folder?

No real preference.  It could be done in many ways, e.g. just a vFolder with 'message is spam', and also have an implicit 'hide spam messages' (like the 'hide deleted messages') on all other folders.

And again, should/could this be per-account?

Implementation

I believe it's worth to make spam filter(s) pluggable. There are advantages it has:
  • it's possible to develop spam filter plugin outside evolution => faster development, lower barrier for external developers
  • simple API, no added complexity
  • I don't see anything we cannot do with plugins compared to filter implemented inside evolution

Yes yes, but you know I agree here :)  I had some ideas for a more generalised plugin system, but its going to be a pretty thin layer anyway.  So as a first cut we could start with something simple, then just retrofit it later.  I'd like to do it based on gobject/classes, because that adds a little more flexibility and some type-ness, and just use g_module for the low-level dl* stuff.

Plugin will be shared library which will be loaded by dlopen/dlsym. Evo will get SpamFilterStruct by dlsym, check api_version and then use supplied methods.

typedef struct _SpamFilterPlugin SpamFilterPlugin;
struct _SpamFilterPlugin
{
	/* spam filter human readable name */
	gchar *name;
	/* should be set to 1 */
	gint   api_version;

	/* when called, it should return TRUE if message is identified as spam,
	   FALSE otherwise */
	gboolean (*check_spam)    (CamelMimeMessage *message);
	/* called when user identified a message to be spam */
	void     (*report_spam)   (CamelMimeMessage *message);
	/* called when user identified a message not to be spam */
	void     (*report_nospam) (CamelMimeMessage *message);

	/* when called, it should insert own GUI configuration into supplied.
	   container. returns data pointer which is later passed to apply,
	   plugin has to call (*changed_cb) (); whenever configuration
	   is changed to notify settings dialog about a change.
	   if setup_config_ui is NULL, it means there are no options */
	gpointer (*setup_config_ui) (GtkWidget *container, void (*changed_cb) ());
	void     (*apply)           (gpointer data);
};

򻮻

I'd probably suggest the setup_config_ui was a get_widget factory.  If the plugin was an instantiated object you could also do signals rather than callbacks, and store any context data on it directly.  Would the config ever be anything other than per-filter data?  i.e. does it need its own context data, or just use the plugin context?  But either way is fine, its a simple api.

Spam will be identified by check_spam method, spam status changes will be reported to filter by report_[no]spam methods. Plugin may or may not provide configuration gui for Settings dialog.

Spam flag will be stored in X-Spam: header. Also for IMAP we may need X-Evolution-Spam-Checked header.

As jeff said, we can't add any headers to IMAP.  Even in the worst case of filtering IMAP body content, if the message is staying on the same server we just do a server-move, and avoid having to re-upload the message.  And trying to add the header and put it back in the same folder is not really going to work well at all.

>From discussion on the mailing list, it looks like everybody is for using vFolder for Spam folder. I am not sure if it's that great. Consider this: about 90% of spam messages is identified right, so at worst only 10% of spam will be moved between folders. I am not sure how resources hungry vfolders are. Also messages which end in vfolder, stay there until Expunge. So if I am correct we have to implement message removal from vfolder, mail guys is that right?

What's wrong with select all/delete/expunge ?  Also, If we knew which folder it was we could just have an auto-expunge or 'clear spam' menu item too.  This will work for a vFolder just like any other folder, and you'll have to do the same things with any other folder you'd have to with a vFolder.

vFoldering on header flags is pretty fast fwiw.

BTW i'm not advocating either or, i'm just suggesting that either a vFolder or a physical folder is a practical solution.

If we put them in vfolder, are they going to be visible in the source folder?

Yes, but we could also implicitly hide them, as we do deleted messages.

If spam messages will stay in Spam folder only, we don't need new mail message list column with spam flag and also "Delete spam mails" action in menu.

Well you still might want a global action menu to empty the spam folder, rather than having to go to it, delete everything, and expunge.

So the spam mails location seems to be crucial here. I like the simplicity of spam mails to be only visible in Spam folder. What do you think, are there any advantages of having spam messages visible in source folders?

I actually think the mail location is kind of irrelevent :)  Because ... we have the infrastructure to support either mode, fairly simply.  i.e. vFolders if people want that, or a separate junk folder if they want that.

I plan to write Spamassassin and Bogofilter plugins (I expect it may work faster, but I tried only spamassassin so far).

I think we also need one other very simple plugin, a server-side one.  i.e. it assumes the server has already added the X-Spam-Status flag, and just translates that to the internal camel SPAM flag.



Ok, now the 'see below' bit, although i've kind of forgotten half of what i was going to say.

One reason we might want per-account spam settings is to configure how the account detects spam.  e.g. you might have an isp that automagically runs a spamchecker on your pop account, and so the account already has X-Spam-Status set.  The same goes for imap, but that account might use a different format of the spam checker.  etc etc.  I think this could be distilled down to a pretty simple setting for people to use (i.e. a checkbox 'check spam' and a dropdown 'how' ).  Otherwise we're going to have to force the lowest common solution, i.e. download every message and check it and set a local bit to say if its spam or not.

You'd then still have the main spam setting page where you setup how spam is dealt with for the 'user friendly default case'.  And for 'expert users', they'd disable the spam settings there, and just use filters or vFolders directly on their spam-checked account/flags.  Having a filter 'spam' option also maps to sieve filtering where the server supports the spam-test extension (if we ever get seive done).

Michael



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]