[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Asrg] spamstones



Dave Crocker said:

> hmmm. it occurs to me that a technical research group on spam might want
> to consider look to agree on standard methodology for deriving false
> negatives and false positives. this would allow everyone to compare
> mechanisms in an equivalent way.
> 
> given the nature of spam, and the nature of most technologies for
> detecting it, the determination of FNs and FPs is not automatically
> obvious.  that makes it a fertile opportunity for standardization.

May I suggest using Ion Androutsopoulos' metrics?   He wrote some of the
seminal papers on naive Bayesian classification's effectiveness as a spam
filter, and proposed a very helpful metric -- TCR -- which we in
SpamAssassin have used as a result for a couple of years.  It's a nice way
to get an idea of effectiveness distilled into 1 number.

TCR takes into account a concept of "wasting the user's time" with FPs and
FNs.  In summary, FPs are much more inconvenient for the user, so should
be penalised much more heavily.

Too many FPs, and you've created so much work for the user, they would be
better off without your spam filter (a TCR of < 1.0) -- for example, the
MAPS RBL blacklists are reportedly this bad, according to effectiveness
rates posted by an analyst company 1.5 years ago. ;)

He also uses recall and precision metrics, which are the traditional
2-number metrics used in classification research, as far as I can see.

A citeseer, or even Google, search for his name will throw up these papers
quickly.

PS: one issue we ran into BTW is that the current crop of bayesian filters
do not make binary classifications; instead they typically classify mail
in the set { ham, unsure, spam } -- a triple-option classification.
Dealing with "unsures" with the traditional metrics is hard...

--j.
_______________________________________________
Asrg mailing list
Asrg@ietf.org
https://www1.ietf.org/mailman/listinfo/asrg