[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Asrg] Re: "worm spam" and SPF
On Dec 09 2004, gep2 at terabites.com wrote:
>
> I'm not IGNORING anything. But that doesn't mean that you necessarily have to
> believe everything that E-mail tells you, either. :-)
>
Do you fully parse the MIME structure to extract HTML or not? Do you
decode + scan each attachment, identify its type, further decode its
contents according to identitied type, and extract things
such as embedded HTML? Unless you respond yes to these questions, you
are ignoring the actual structure, and instead making assumptions
about certains string patterns. It doesn't matter if you use SNOBOL or
regular expressions, you make decisions without the full context,
which is risky in terms of content alterations.
I accept your point that you are willing to pay that price, but since
there are effective competing solutions available without this flaw (I
consider alteration a flaw), users already have alternative
options. May the best offering win, as they say.
> [snip steganography]
> You're NEVER going to be able to prevent the transfer of
> all kinds of information. No point in even trying.
>
That's what spam filtering is. A competition between those who want to
pass along information and those who want to censor it. Spammers have
shown that they are willing to think laterally to get their message
across, and don't shy away from building complex edge cases (as well
as simple ones if that works). Dismissing such things now will only
mean more work for you later, imho.
>
> > Whatever regular expression for an HTML tag you come up with, it can
> easily be made unrecognizable.
>
> Sure, but it can also in the process be made unrecognizable to MUAs, too.
>
As a filter, you take on the responsibility to censor for an unknown
class of MUAs with varying capabilities, unless you're building a
plugin for a single specific MUA. The latter problem is much easier,
but much more narrowly applicable, and certainly doesn't deserve to be
considered a general or universal solution to the spam problem.
> 1) the USE of such types of stuff is prima facie evidence of an E-mail having
> something to hide;
If your (or any) filter becomes widely used, email will be crafted to fool
it if it has weaknesses. Being fooled literally means that filter won't recognize
such evidence, even though it might be obvious to other filters. So your
point 1) only applies as long as your filter is unpopular enough to not impact
spam distribution.
>
> 2) such tricks are of little value if they confuse or break MUAs too;
>
Different MUAs break in different ways. There is no set of tricks which are
obviously going to break all MUAs and therefore can be dismissed a priori.
> 3) translating pointy brackets to curly brackets (or square brackets or
> something else) will also effectively "neuter" such HTML, such that MUAs won't
> try to process it;
>
> 4) it's relatively easy to (again, by default) simply say "NO HTML, period"
> and divert offending mail.
By claiming to modify content, you're only making your system less
attractive to a section of the user population who: want HTML, don't
want HTML but feel it is crucial to accept HTML in the event it is
sent bu customers, expect messages to be untampered for technical or
ethical reasons. That's a lot of people who have an interest in
preventing your system from getting critical mass.
> > But say you keep up to date with tricks designed to make a complex
> payload look innocuous to simple minded filters, then you are on the
> losing side of such an arms race, because a spammer need only change
> their email, while you need to patch your software with new regular
> expressions and redeploy it to all the customers every time.
>
> Well, "patch" isn't really necessary. :-) It's rather easy to add
> new stuff to SNOBOL/SPITBOL programs, including at run time.
Can you do it in an evening and have all your deployed systems fixed
up world wide by the next morning? The spammer needs an evening to
figure out a loophole and send millions of mails exploiting it the
next day.
Some commercial systems already have elaborate world wide networks
designed to propagate new email signatures in a matter of
seconds. That's the technology you're competing against today.
> Fair enough, although it's pretty extraneous to discuss them
> publicly at this time. As I've said, the current implementation is
> "experimental" and like all such software, a work in
> progress.... which I modify and improve from time to time as that
> seems necessary.
Unless you have an argument that breaks the spam arms race, the
distinction between "experimental" and "release" is blurry, I think.
You can spend months or years polishing a user interface only to
have your solution be outdated.
> > ...It only makes discussion imprecise and harder to see any flaws.
>
> The important thing is NOT whether there are "flaws" at the lowest
> level (and undoubtedly there are, since all nontrivial programs
> contain bugs or at least opportunities for improvement). At this
> point we ought to be talking concepts and approaches, rather than
> getting bogged down in pointless minutae and detail which in any
> case is going to be implementation-dependent.
I'm not convinced that your high level description will handle the
rough seas of widespread deployment, but instead will get bogged down
in endless silly details. Just my opinion ;-)
> OK, but at least it's not going to be something that they just
> click on (again, by denying HTML, and hopefully by implementing
> suitably dire-sounding warnings when they try to follow any other
> links to external executables, whether EXE files or SCR files or DLL
> files or ActiveX or whatever).
Denying HTML is useless if the MUA generates it on the fly whenever it
sees something that looks like an url. People are inundated with
warning dialogs which they just click mindlessly. As is, I don't see
that your filter can change or address these issues (why should it?
those are some of your stated high level means of protection).
> Perhaps so, and what may end up happening is that content filters
> will be simplified and re-engineered to make them faster and more
> tailored to use within a framework such as I propose. (Although
> some of those "hard cases" might still get through, from
> "somewhat-trusted" senders). Current content filters usually
> presume that they are getting E-mail "raw" and therefore have to
> handle cases that might be filtered out already by the time mail
> would get to them through my filter.
Either that, or users will decide that your system is hampering their
content filtering and remove it. Nobody knows...
>
> > A few points about "Bayesian" systems:
>
> > To my knowledge, no successful attack has been performed on such
> systems yet.
>
> Depends on what you call an "attack", but certainly an awful lot of
> spam contains bogus (random or unrelated) stuff that's designed to
> confuse or evade such types of filters.
Most of the random stuff is designed to evade a different kind of filter,
namely those filters which keep a database of email hashes. It sticks
out to statistical filters because it contains novel tokens.
I call an attack a repeatable procedure which can consistently bypass
the spam verdict with sufficiently high success rate to be valuable.
An attack requires a modification of the filtering algorithm.
> > There is a lot of garbage in mail to try to pass through the
> statistical filtering, but just like you look for nonsense tokens as
> an indicator of spam on a case by case basis, such nonsense tokens
> if present easily tip the balance toward spam in a statistical
> filter, automatically.
>
> Perhaps, and we agree that a good program can detect certain types
> of such stuff. but at SOME point the spam E-mail in question is
> going to look EXACTLY like a regular E-mail that you want to get,
> except for the spam content (which might be JUST a URL or a phone
> number or who knows what?)
Not quite. The spam content is what looks statistically different
(unless the recipient is in the habit of discussing this kind of spam
content regularly), while the statistical commonalities between spam
and nonspam are automatically discounted if they appear in both types
of mail. That and the fact that there is no need to do complex
parsing text is why it's difficult to evade. Any string added to a spam
message either
1) looks like some nonspam string, in which case it counts for
approximately nothing, or
2) doesn't look like some nonspam string, in which case it tips the
balance towards spam.
> Fine. In any case, it is POSSIBLE to create spam E-mail that looks
> just like legitimate E-mail, at least within statistical
> uncertainties. There are limits at what can be achieved going down
> that path (but that doesn't necessarily mean that it's not worth
> going there, if there's useful progress there nevertheless).
There are always limits, e.g. a single statistical system shared by
many people is vulnerable to contradictory inputs. As far as
advantages are concerned, the statistical systems have the simplest
user interface yet devised, as the user only needs to label the
message without analysis, and update automatically without programmer
intervention.
--
Laird Breyer.
_______________________________________________
Asrg mailing list
Asrg at ietf.org
https://www1.ietf.org/mailman/listinfo/asrg