[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Asrg] While I was on vacation, I came up with a proof that it is not possible to build a 100% effective anti spam filter based on content
On Sunday, June 13, 2004, 1:44:11 PM, Jeff wrote:
JS> People,
JS> I have just returned from a vacation to England with my
JS> father, andhe came up with an idea that is a proof that it is not
JS> possible tobuild a 100% effective anti spam filter based on the
JS> contents of amessage.
<snip/>
JS> The Bad Guy can defeat the S function by creating a
JS> messagemodifier function M. The message modifier function
JS> modifies a messagem by, for example, adding more words or
JS> sentences or changing thespelling of some words. The Bad Guy can
JS> then create a loop of the form:
JS> while(S(m)) {
JS> m = M(m)
JS> }
JS> Is this loop guaranteed to terminate? That would depend
JS> on thedetails of S and M. For example, if S measures the ratio of
All of this is true of any list mechanism. White/Gray/Black list
mechanisms are simply another form of content analysis with a more
restricted scope. (Consider that transaction data, the SMTP envelope,
and the content of the message must all be considered components of m
in any practical example.)
Further, all forms of lists with specific entries (rather than
heuristics) are very easy to defeat by the "Bad Guy" because any
change no matter how small forces a mismatch. In the case of a
whitelist then there are often attacks which can guess entries.
Heuristics can be broadly shared and agreed upon and therefore have a
lower cost (we've proven this in our Message Sniffer system). Bayesian
classification systems can be extremely effective when used
appropriately. Both heuristic and statistical classification systems
have an advantage over lists where m changes or is unknown.
Ultimately, the solution is a broad combination available mechanisms
combined in a way that reduces costs and errors and forces the cost of
your equation up for the Bad guy. This includes white/gray/black
lists.
No system is without error, and the characteristics of m are both
constantly changing and largely unknown in practice - therefore, while
lists are a strong component of the solution, they are brittle and
unworkably costly when implemented on their own.
Best,
_M
Pete McNeil (Madscientist)
President, MicroNeil Research Corporation
Chief SortMonster, www.sortmonster.com
_______________________________________________
Asrg mailing list
Asrg at ietf.org
https://www1.ietf.org/mailman/listinfo/asrg