...On Sep 20 2004, Markus Stumpf wrote:
The Spammers' Compendium http://www.jgc.org/tsc/
has a list of tricks spammers use to beat bayesian filters.
Clearly, this is entirely untypical of ordinary language. Like the nonsense words, this sticks out (e.g. what percentage of legitimate messages do *you* have that don't contain the word "the"?).
In fact, many sequences will recur if a spammer sends several messages of this type. Even without splitting on punctuation, various parts of the "typefaces" recur, such as '888' which is used in 'n', 'o', '!'. So the filter will automatically think messages with large frequencies of '888' tend to be junk.
What is missing what I have seen lately is the use of e.g.
|_) |_)|_|\/ / _ ___ | | / (_)___ _____ __________ _
| | / / / __ `/ __ `/ ___/ __ `/
| |/ / / /_/ / /_/ / / / /_/ / |___/_/\__,_/\__, /_/ \__,_/ /____/ .o. 888 ooo. .oo. .ooooo. oooo oooo ooo 888 `888P"Y88b d88' `88b `88. `88. .8' Y8P 888 888 888 888 `88..]88..8' `8' 888 888 888 888 `888'`888' .o. o888o o888o `Y8bod8P' `8' `8' Y8P
to beat a bayesian filter.
...
A statistical filter will recognize all these things automatically.
-- --------------------------------------------------------------- Jose Marcio MARTINS DA CRUZ Tel. :(33) 01.40.51.93.41 Ecole des Mines de Paris http://j-chkmail.ensmp.fr 60, bd Saint Michel http://www.ensmp.fr/~martins 75272 - PARIS CEDEX 06 mailto:Jose-Marcio.Martins at ensmp.fr
_______________________________________________ Asrg mailing list Asrg at ietf.org https://www1.ietf.org/mailman/listinfo/asrg