[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Asrg] Re: 2a. Analysis - Spam filled with words



On 2003-09-11 14:13:50 +0100, Andrew Akehurst wrote:
> Can I suggest a subtly different approach? Rather than trying to
> characterise spam, why not try and characterise your legitimate 
> messages and see if incoming messages match that statistical
> profile?
> 
> My reasoning is based on the fact that the profile of spam 
> undergoes sudden shifts as spammers switch to using new tactics 
> each time their old ones become less effective. Whereas, in my
> case anyway, the profile of the legitimate mail I receive is 
> much more stable.
> 
> Bayesian classification systems have to undergo training in order
> to learn what spam and "ham" look like. But because "spam" keeps
> changing, so re-training is needed over time. As time passes, the
> class of spam will grow and become less clearly-defined because
> the range of tactics used by spammers seems to increase. As the
> definition of "spam" becomes fuzzier, does the accuracy of
> filtering decrease?

Does this make a difference? Baysian-like filters just sort mails into
two buckets, and you train them by telling them which messages belong
into bucket A and which belong into bucket B. The filter doesn't care
which is ham and which is spam (theoretically. In practical
implementations, there is a bias to make false negatives more
likely than false positives but I don't think this changes the symmetry
of the algorithm fundamentally).

	hp


-- 
   _  | Peter J. Holzer    | Humor ohne Emoticons ist trockener Humor.
|_|_) | Sysadmin WSR       | 
| |   | hjp@hjp.at         |	-- Toni Grass in aip
__/   | http://www.hjp.at/ |

Attachment: pgp00098.pgp
Description: PGP signature