[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Asrg] 2a. Analysis - Spam filled with words



On Tue, 09 Sep 2003 00:08:12 -0400, 
Yakov Shafranovich <research@solidmatrix.com>:

>I started getting weird spam samples in the last few 
>days. The spam message consists of words, one after 
>the other, with an image in the middle. Looks like 
>another attempt to defeat the filters, here is a sample:

As Jose pointed out in his reply, these random invisible words do 
serve to "add bulk"--although any random text (even nonsense words) 
would serve that same purpose.

I have a pretty strong hunch about what these messages are trying to 
do.  Specifically, I think they're a clever attempt (by someone who 
doesn't really understand statistical language processing) to sneak 
past Bayesian classifiers.  And they succeed, the first time or two; 
but by the third time, the Bayesian classifer's identified at least 
two "tell-tale giveaways" that make these messages very easy to 
"spot" for any statistically-based technology (including mine).

On a unrelated note: I've agreed to try to help corrdinate the area 2 
analysis work for an indeterminately short time.  One of the things I 
would really like to do is to run a quick "pilot" study (and I pretty 
much don't care about what).  This study should be small, tightly 
focused, and (ideally) something that could be accomplished in, say, 
6 weeks or so.  The primary goal of this pilot would be to help folks 
working in area 2 to discover what (if any) unique mechanical 
requirements there may be to conducting an anti-spam research 
project.  (Think of it as a "shakedown run.")

Ideas for a possible focus for this pilot study are actively 
solicited.  My preference would be for folks to email me off-list 
with ideas, brainstorms, etc.  I will summarize, and then post the 
summary to the list.

- Terry



_______________________________________________
Asrg mailing list
Asrg@ietf.org
https://www1.ietf.org/mailman/listinfo/asrg