Re: [Evolution-hackers] SA rules adjustment



On Thu, 2004-01-29 at 18:29 +0100, guenther wrote:
> On Thu, 2004-01-29 at 17:53, Radek Doulík wrote:
> > On Wed, 2004-01-28 at 22:24 +0100, guenther wrote:
> > > > > 2) Excluding the Habeas headers from Bayes would be good too. Otherwise,
> > > > > getting more SPAM as HAM with this faked headers will poison the Bayes
> > > > > database and HAM will get bad Bayes scores.
> > > > 
> > > > I am not sure about this. It may cause trouble in case the bayes db is
> > > > already poisoned. Otherwise it should work OK.
> > > 
> > > I'm not sure, if I understand you.
> > > 
> > > Excluding this headers means, their existence and the values will
> > > neither be a sign for SPAM nor HAM. Which this headers definitely are
> > > ATM, being abused by SPAMmers.
> > 
> > yeah, so it's pretty good sign of spam right now and bayes filter will
> > profit from it.
> 
> I wouldn't say this is sign of spam. Should be considered as neutral in
> the worst case.

forgot to say: in case you are getting spam with habeas haiku.

> <sarcasm> If you really think, this is sign of spam, then why adjusting
> the score to 0 instead of a positive value? </sarcasm>

I think you are mixing 2 things here, the habeas score and bayes
filtering. The habeas score is hardcoded score, while bayes db is
variable as it learns from the input. so adjusting it to positive value
is a bad thing, as it doesn't reflect user's mails.

> The hackers and users on the SA mailing list agreed, this must be set to
> a value <=0. Probably most of them adjusted to 0, some uses nagative
> values like -1. It was strongly advised to *not* use a positive value,
> as this must not be a sign for spam.

yes, I don't say 0 is bad value for habeas score. I am against ignoring
habeas headers.

> > when it eventually become HAMs again, the filter will
> > learn that from user (or from us when we turn HABEAS score on again) and
> > this header will be neutral in bayes db.
> 
> No. It will be punished with a bad Bayes score until the user got at
> least as much HAM with the Habeas headers as he already got as SPAM.

that is fortunately not true. only extreme probabilities are used for
combined probability computation in bayes filter. frequency of ham
tokens is even doubled. this means that once you start reporting it as
ham, it will quickly become neutral.

Radek





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]