Re: [Evolution] Spam Blocking :: Bayesian Filtering

From: Chris Ness <nesscg mcmaster ca>
To: Tony Earnshaw <tonni billy demon nl>
Cc: Evolution Email List <evolution ximian com>
Subject: Re: [Evolution] Spam Blocking :: Bayesian Filtering
Date: 16 Aug 2002 16:59:42 -0400

On Fri, 2002-08-16 at 16:07, Tony Earnshaw wrote:

fre, 2002-08-16 kl. 21:04 skrev Christopher Ness:

In order for this filtering to occur it cannot happen on the server - I
suppose it could, but would be very taxing on resources - since each
person needs to taylor (pun intended) the probabilities of tokens
(words) to the type of spam and valid emails they get both in the
message and the header.


Hmmm ... let's just say that I don't agree with this. Nor do 1,001
(10,001, 100,001?) Unix sysadmins that use Spamassassin.


I've seen Spamassassin in action.  This email account uses it.  It does
an ok job but I get a lot of false positives (mostly in my dumbass
friends HTML emails, and yes I do call them dumbasses to their face for
sending HTML email). I know I can experiment with the "sensitivity" of
the score, but I don't want to have to do that and increase my spam
intake.

Spamassassin recognizes *properties* of spam.  But as those properties
change wrt time, therefore so must the program/tests.

When using the Bayesian filtering method, it's done automatically based
on the *content* (meaning most common words within the message) of the
header and the body.  I feel this is a far superior method of filtering
as no one is required to write new tests for an ingenious spammer who
cooks up a new method.  Over time it too will be filtered.

Yes, you will get spam even with this method but a "delete as spam"
keystroke would tell Evo to update it's DB and recalculate probabilities
to use on the next sync.

In fact you need a good supply of spam to create a initial DB of
probabilities and non-spam too.  There is a start-up time, but well
worth it overall.

I believe it is a step in the direction of solving the problem and not
throwing another leaky sandbag on the dike.

Can others see how valuable a leading edge feature like this could be in
Evo?


What on earth is the point of duplicating, quadrupling, 10-folding,
100-folding, 1000-folding a service that can take place at a single
point? Even if the spam attacks are so horrendous, that one has to have
a dedicated smtp router to cope with it? Which may or may not be likely.


Why not uninstall spamassassin?  

Then it *is* only at one point, being the end user.  This leaves clock
ticks for the servers.  Of course we could never do this unless Bayesian
filtering became common place in email clients as we would leave
non-Bayesian filtering clients 'high and dry.'

The best part is that now you have *your own* filter, you do not rely on
the sysadmin to keep SA up2date and you trap spam no matter your
location.  I'm a little naive but I would believe spam in Germany is
just a little different than spam in Canada simply based on the cultures
and language.

Cheers,
Chris


How about going back to the time when there was no virus (G*d save us,
Microsoft clients) scanner on the incoming servers?

Sorry, but the idea does not appeal and I would not use it.

Follow-Ups:
- Re: [Evolution] Spam Blocking :: Bayesian Filtering
  - From: Kenneth Porter

References:
- [Evolution] Spam Blocking :: Bayesian Filtering
  - From: Christopher Ness
- Re: [Evolution] Spam Blocking :: Bayesian Filtering
  - From: Tony Earnshaw

[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]