On 2003-09-11 14:13:50 +0100, Andrew Akehurst wrote: > Can I suggest a subtly different approach? Rather than trying to > characterise spam, why not try and characterise your legitimate > messages and see if incoming messages match that statistical > profile? > > My reasoning is based on the fact that the profile of spam > undergoes sudden shifts as spammers switch to using new tactics > each time their old ones become less effective. Whereas, in my > case anyway, the profile of the legitimate mail I receive is > much more stable. > > Bayesian classification systems have to undergo training in order > to learn what spam and "ham" look like. But because "spam" keeps > changing, so re-training is needed over time. As time passes, the > class of spam will grow and become less clearly-defined because > the range of tactics used by spammers seems to increase. As the > definition of "spam" becomes fuzzier, does the accuracy of > filtering decrease? Does this make a difference? Baysian-like filters just sort mails into two buckets, and you train them by telling them which messages belong into bucket A and which belong into bucket B. The filter doesn't care which is ham and which is spam (theoretically. In practical implementations, there is a bias to make false negatives more likely than false positives but I don't think this changes the symmetry of the algorithm fundamentally). hp -- _ | Peter J. Holzer | Humor ohne Emoticons ist trockener Humor. |_|_) | Sysadmin WSR | | | | hjp@hjp.at | -- Toni Grass in aip __/ | http://www.hjp.at/ |
Attachment:
pgp00098.pgp
Description: PGP signature