Network Working Group D. Crocker Internet Draft Brandenburg draft-crocker-spam-techconsider- Vernon Schryver 02.txt Rhyolite Software Expires: <12-03> John Levine Taughannock Networks 29 June 2003 Technical Considerations for Spam Control Mechanisms This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Copyright ¨ The Internet Society (2003). All Rights Reserved. SUMMARY Internet mail has operated as an open and unfettered channel between originator and recipient. This invites some abuses, called spam, such as burdening recipients with unwanted commercial email. Spam has become an extremely serious problem, is getting much worse and is proving difficult (or impossible) to eliminate. The most practical goal is to bring spam under control; it will require an on-going, adaptive effort, with stochastic rather than complete results. This note discusses available points of control in the Internet mail architecture, considerations in using any of those points, and opportunities for creating Internet standards to aid in spam control efforts. It offers guidance about likely trade-offs, benefits and limitations. CONTENTS 1. SPAM AND CONSENT 2. ARCHITECTURAL REFERENCE 2.1. EMAIL CONTROL POINTS 2.2. TERMINOLOGY 3. APPROACHES TO CONTROLLING SPAM 3.1. ADMINISTRATIVE AND LEGAL MECHANISMS 3.2. INFRASTRUCTURE AND OPERATIONS 3.3. FILTERING 3.4. NEGOTIATION 4. EVALUATING TECHNICAL APPROACHES 4.1. ADOPTION 4.2. BURDEN 4.3. SCALING 4.4. ROBUSTNESS 4.5. SCENARIOS 5. SECURITY CONSIDERATIONS 5.1. PRIVACY CONSIDERATIONS 6. APPENDIX 6.1. SPAM CONTROL PROPOSAL EVALUATION CHECKLIST 6.2. ACKNOWLEDGEMENTS 6.3. AUTHORS' ADDRESSES 6.4. FULL COPYRIGHT STATEMENT 1. SPAM AND CONSENT Internet mail has operated as an open and unfettered channel between originator and recipient. It has always suffered from some degree of abuse, in which originators impose on recipients inappropriately. In recent years, a version of this abuse has grown substantially. Called spam, its definition varies from "unsolicited commercial email" to "any email the recipient does not want". Often there are no technical differences between spam and "acceptable" email. Their format, content and even aggregate traffic patterns may be identical. Hence spam is a problem for fundamentally non-technical reasons, yet the Internet technical community must pursue technical responses to it. The lack of strong community consensus on a single, precise definition makes this particularly challenging. For most working discussions, the term "Unsolicited Bulk Email" is sufficient. The salient point that it is a mass-mailing ensures that discussion covers the broadest concern of the user and provider communities. Mail that is not in some real sense "bulk" cannot flood networks or mailboxes. Essentially all mail that people object to, as "spam", is bulk. For example practically all objectionable advertising mail is also bulk, although modern techniques for targeted advertising can permit extensive content or address tailoring. "Bulk" is usually very difficult for an individual recipient to prove, but almost always easy to recognize in practice. More detailed discussion must, of course, be precise in the definition of "unsolicited" and usually must distinguish between different types of mail, such as commercial, religious, political or personal. The simplistic -- but entirely adequate -- summary of the role of spam on Internet mail is that it is an extremely serious problem, it is getting much worse, and it is proving difficult or impossible to eliminate. Spam is generated by a very wide range of clever sources and it always will be. Instead of thinking of spam as a disease that might be eliminated, it is more useful to think of it like crime, war and cockroaches. It is not realistic to expect to eliminate any of these, no matter how much anyone might wish otherwise. Therefore the best we can hope to accomplish is to bring spam under reasonable control and that control will require an on-going, adaptive effort, with stochastic rather than complete results. That is, we need multiple, adaptive techniques. As spam changes, so must our mechanisms. Different mechanisms will be appropriate for different circumstances. In other words, spam has become a permanent part of the Internet mail experience and efforts to control it may only reduce it to a tolerable level, rather than eliminate it. It is somewhat comforting to remember that an individual spam is not damaging. Rather the quantity of spam is what poses a threat. Hence there is flexibility in permitting spam control mechanisms to be imperfect. This note discusses available points of control in the Internet mail architecture, considerations in using any of those points, and opportunities for creating Internet standards to aid in spam control efforts. It offers guidance about likely trade-offs, benefits and limitations. The note does not offer an analysis of the types of spam or the types of attacks used in sending spam, nor is it intended to specify solutions. Similarly, the note does not discuss fine-grained details, such as the arguments associated with single opt-in mechanisms, versus double opt-in. These points are important to the engineering of particular solutions, but only as refinements after the larger architectural and system control choices are made. Note: This document is intended to evolve, based on comments from the Anti-Spam Research Group (ASRG) mailing list. It is certain that the current draft is incomplete and entirely possible that it is inaccurate. Hence, comments are eagerly sought, preferably in the form of suggested text changes, and preferably on the ASRG mailing list, at STD [0] Throughout this document, opportunities OPP for technical standards are cited. These represent an attempt to provide a complete list of such possibilities, rather than to offer recommendations. These will be in entries of this form, with the label "STD OPP". 2. ARCHITECTURAL REFERENCE 2.1. Email Control Points Email transmission sequences can touch many systems, between the originator and the recipient. However for most discussions about control, only five major components are important: +---------------+ +---------------+ | UA.o -> MTA.o | -> MTA.i -> | MTA.r -> UA.r | +---------------+ +---------------+ UA.o: The originator's user agent, operated by the user and under their direct control MTA.o: The mail transfer agent service associated with the originator's environment, possibly operated by the sender and possibly operated under separate control, such as by their employer. MTA.i: The mail transfer agent service operated by an independent third- party, such as an Internet Service Provider (ISP) MTA.r: The mail transfer agent service associated with the recipient's environment UA.r: The recipient's user agent In many organizations, the MTA service is multi-stage, such as including a department MTA and an Internet "firewall" MTA. This distinction is of fundamental importance for making software and operations decisions, but it does not have a significant impact on a discussion about points of control. Points of control are primarily affected by crossing administrative boundaries. Therefore the distinction between originator's environment, recipient's environment and any independent third parties is essential to this larger examination. These are separate, independent administrative environments and are subject to different policies. In particular, note that a discussion about using control points hinges on the scope of the control to be exercised. Besides constituting a major burden to recipients, the volume of spam traffic has become a serious problem for transit services. Hence a precept in controlling spam is to seek control as close to the source as possible. The fewer downstream resources consumed by spam, the better. Of course, the ideal would be a mechanism in UA.o that would prevent spam from being sent in the first place. Indeed, legal remedies seek to affect a sender's motivations, so that they will not send the spam at all. Unfortunately there is no opportunity for software control of spam in UA.o , because the software is under the control of the originator. If they wish to bypass any control mechanisms in UA.o , they will find a way. Of course, some services have UA.o under administrative control from the software's user. This affords a software choice, placing controls in that module, but does not permit the more general architectural specification of controls there, because the separate administrative control cannot be relied on. The next opportunity is MTA.o. Often this service is operated by a group independent of the originator. Wherever the detection mechanism is placed, the critical challenge is to identify spam in real time, if its relaying and delivery are to be stopped. The other avenue is post-hoc removal of the right to make further use of the MTA service. This may have strong utility for spammers attempting to operate within acceptable social bounds. It will have no effect upon spammers who avoid accountability. 2.2. Terminology When determining whether a message qualifies as spam, different types of email attributes can be considered, different types of analyses can be performed on them. Equally the results of the analyses can be used in different ways, for preventing, detecting or following up occurrences of spam. 2.2.1. Evaluation Focus When discussing both the attributes of spam and the mechanisms for controlling it, the major distinction for evaluation is between: Originator: Evaluate the trustworthiness of creator of the content. Will the originator create spam? Content: Evaluate the message content, itself. Does the content contain spam? Destination: Evaluate whether special destinations were specified, such as honeypots Traffic Evaluate the aggregate posting behavior, to determine whether multiple, related postings qualify as "bulk" Validating the originator can often be done with excellent reliability. However current common practises for author authentication have resisted wide- scale adoption and this approach only protects against spam indirectly. The creator might choose to violate the criteria used to assess them. When validation of the originator is based on the contents, this certifies authorship, but does not certify any other characteristic of the content. By contrast evaluating content is direct -- either it is spam or it is not -- but it is impossible to do the evaluation perfectly. For example, legitimate subscription-based bulk mail is technically identical to spam, in every regard, except that it is solicited or desired by its recipients. Simplistic content evaluation criteria have a high rate of false positives and are easily bypassed by spammers, leading to a high rate of false negatives. Complex criteria are difficult to create and maintain. They, too, are likely to have a high rate of false assessments, eventually, unless maintenance of the analysis rules is diligent. 2.2.2. Originator Focus Evaluation of the originator sub-divides between: Author: Evaluate whether the person creating the content is likely to create spam. System: Evaluate whether the system that is sending email on a person's behalf is likely to permit spam to be sent. Evaluating the person (or organization) creating the message is direct, albeit still carrying the caveats noted above. Evaluating the system is indirect, but presumes that the system enforces quality assurance policies on the email sent from it. A larger problem with evaluating the originator of mail is that Internet mail necessarily and desirably involves receiving mail from strangers. Mailboxes that are closed to mail from strangers do not have a spam problem. On the other hand, it is impossible to know whether copies of a message from a stranger are also being sent to 30,000,000 of your closest friends. Contrary to often-expressed hopes, a third party that is also a stranger cannot attest to the virtue of a mail sender. A letter of introduction from a stranger does not make the bearer other than a stranger. If the history of spam is any guide, organizations such as Internet service providers and public key infrastructure (PKI) providers cannot be expected to ensure that their customers do not send spam. Even with the best of intentions, they will always be willing to open new accounts to strangers. The most that can be expected is that they will punish their spamming customers such as by imposing substantial fees or filing lawsuits. It should be noted that the "punishment" of terminating their account often is meaningless, because many spammers create one-time accounts. 2.2.3. Detection Qualification performs tests against one or more criteria. Test results are: Positive: Message matches the test criteria. Negative: Message fails to match the test criteria. When the tests are heuristic or statistical, some portion of the results will be incorrect. Incorrect results are classed as: False Positive (FP): The filter classified a non-spam message as spam. That is, the message matches the test criteria, but the criteria are too aggressive. False Negative (FN): The filter classifies a spam message as non-spam. That is, the message fails to match the test criteria, but the criteria are not sufficiently strong. 2.2.4. Disposition Filters are used for two, basic and complementary purposes: Acceptance: Approves mail for delivery. Rejection: Withholds or refuses permission for delivery. Implementations of filter mechanisms may provide for a range of choices, rather than simple acceptance or rejection. Note that rules for acceptance are equally subject to error. However Acceptance rules are usually for simple, explicit rules rather than heuristics, so that FP and FN results are not usually a concern. Hence discussion of FP and FN are usually for Rejection rules. 2.2.5. Simple Filtering The combined range of capabilities for detection and disposition of email can produce complex, heuristic behaviors. For better efficiency and predictability, such mechanisms usually permit specification of explicit lists of criteria and values that, when present in the message, prompt direct disposition. The simplest method of testing is to have explicit lists of simple identifier criteria, such as From address or standard text in the Subject header. Pre-assessed values are entered into: Whitelist: Automatic Acceptance Blacklist: Automatic Rejection One approach to maintaining Whitelists and Blacklists is to make explicit entries into them, manually. This is often what a spam control service will offer to its subscribers. Most such services are for blacklisting known sources of spam. A difficulty with these listing services is the set of criteria used for adding and removing senders or sites. These policies usually need to be explicit, objective, documented and consistently applied. Even then, blacklist operators are attractive targets for threats of lawsuits claiming inappropriate listing, interference in business or trade, etc. 3. APPROACHES TO CONTROLLING SPAM 3.1. Administrative and Legal Mechanisms Both government law and service provider contracts can be used for defining unacceptable behavior, requiring preventive measures, and providing for remedies when there are violations. There are two major problems with this administrative approach to the control of spam. One is that the sender often cannot be readily identified by the recipient of spam. There are many opportunities for practically anonymous posting of email, including Internet cafes, transient access services and free email services. The second problem is that the sender of spam may not be in the jurisdiction seeking to exercise control or a jurisdiction responsive to the recipient's jurisdiction. The Internet is global. Unlike postal bulk mail, the cost of sending spam over the Internet does not change as the mail crosses jurisdictional boundaries. Hence it seems likely that use of administrative procedures can be effective for controlling "responsible" spam -- that is, spam sent by organizations operating as accountable social participants. Perhaps they indulge in overly aggressive policies, but they still desire to be socially tolerable. The large number of "rogue" spammers is not similarly burdened. However, most "rogue" spammers are trying to sell a product or service. There have been notable successes against spammers by the U.S. federal government ""following the money." ." However the government staff for these activities note their lack of resources and the extensive effort to achieve the result. 3.2. Infrastructure and Operations Enhancement of underlying Internet services might reduce the effectiveness of some spam transmission mechanisms. For example many spammers prefer to send to domain name service MX secondaries because secondaries are often not as well filtered as MX primaries. Because of MX secondaries lack a coordination protocol, the best advice for all but the largest sites is to stop using MX secondaries. This advice sounds radical, but MX secondaries are no longer needed to compensate for intermittently connected or sending MTAs. Today MX secondaries are generally needed only for ""load balancing" " when there is more incoming mail than can be handled by a single SMTP server. STD [1] An MX secondary coordination OPP protocol could coordinate standardized filtering rules, white- and blacklist entries and other spam control data among MX servers. [2] Best Current Practises (BCP) documentation of preferred MTA operation for spam control, beyond that documented in RFC 2505. For example, it is better to reject spam by rejecting the SMTP transaction with a 5yz status code than to accept the transaction and later send a delivery failure notification. [3] BCPs for operational conventions relevant to other other spam control services, such as DNS blacklists Postal mail imposes a fee on the sender for each message that is sent. Such a fee makes the cost of sending significant, and proportional to the amount sent. In contrast, current Internet mail is very nearly free to the sender. Hence there is interest in exploring "sender pays" email. One form of sender-pays is identical to postal stamping. Another entails imposes post-hoc actions on the sender, taking the fee for their posting only if the recipient indicates they were unhappy to receive it. For both models, it is not clear that it is possible to retroactively fit the necessary mechanisms to Internet mail. Its complete absence from the current service and the existence of anonymous and free email services may provide too much operational inertia. It is also not clear who should receive the fees or how they should be disbursed. 3.3. Filtering The technical mechanism for real-time detection and handling of spam is a filter, placed at MTA.o, MTA.i, MTA.r and/or UA.r. A filter has two functions: detection and action. Action is usually either adding a special label to the message or disposing of it. 3.3.1. Traffic Analysis Spam is often referred to as "unsolicited bulk mail" to highlight that senders typically post very large amounts quickly. Opt-in (subscription) email also demonstrates this traffic pattern. Still there is benefit in measuring aggregate email behavior. STD [4] Traffic reporting protocol, to OPP permit collaboration among independent administrations. 3.3.2. Content Analysis Filters look for message attributes, such as strings of text in the headers or content of the message being inspected. Other attributes include the address or domain name of the originating system or the occurrence of the same message content in multiple messages at the same time. Simple filters look for specific strings. A more powerful approach looks for multiple sets of strings, assigning a positive or negative score to each occurrence; it then labels spam according to its total score. Rule creation is done manually, or by a service, or by analysis of a collection of messages. For example one type of service observes email traffic at many Internet locations and receives reports as recipients see new types of spam. The service then propagates new rules to its subscribers. One example of an analytic approach performs empirical rule creation, using statistical techniques, such as Bayesian, to discern string occurrences in known spam, versus mail that is known not to be spam. As rules become common, spammers adapt their messages to bypass filters, so that existing rules quickly become less effective. Hence a long-term filter must use rules that are continually modified. Empirical rules generation must be repeated, or must operate continuously, analyze all incoming mail. Manual rule maintenance is simply not practical for typical users; the effort is far too great and the nature of rules such as ""regular expressions" " are too arcane. A concern about services is that they are inherently post-hoc. They are always updating the rule- set after an "attack" commences, so that some spam is certain to reach some recipients; however the view that a small amount of spam is not dangerous mitigates this concern. Lastly, methods using automated analysis rely on heuristics, or guesses. They are certain to have some percentage of "false negatives" (FN) that permit real spam to reach the recipients, and some percentage of "false positives" (FP) that incorrectly label legitimate mail as spam. Any effective, long term filtering mechanism must have automatic or semi-automatic rule creation and must upgrade the set of rules continuously or periodically. STD [5] Format and exchange mechanisms, OPP to permit sharing rules, rule templates, white/black list entries. [6] Sample message labeling and exchange, to permit submission of candidate content to remote service. [7] Hash-based identifier of content 3.3.3. Tagging Message originators and transit handlers can facilitate filtering efforts by adding standardized information, or tags. The most serious difficulty with any scheme that relies on tagging is its relationship to the larger body of email that is untagged. What does it mean when the tag is not present? Is presence of the tag a certain indicator of the intended information? Is there benefit in falsely labeling the content? Does the scheme contain a means of preventing this spoofing? If tagging uses a simple string label, such as "ADV" to indicate that the contents contain advertising, how is this useful when most email is not labeled or is labeled incorrectly? This is like postal-based mass marketing that has an envelope marked "personal and confidential" but is neither. Non-forgeable tagging uses cryptographic techniques. If the tagging identifies the sender, then the recipient must have access to the cryptographic identifier. If the tag is independent of the content -- that is, it identifies and authenticates the sender, but uses a scheme that does not integrate the specific content of the contained message -- then what is to prevent re-using the identification inappropriately? STD [8] Standardized tags, according to OPP different criteria 3.3.4. Filter Rules The simplest model for a filtering test is to have entries containing a single, simple attribute, such as sender email address or source system IP address or domain name. For assessments based on the identity of the sender, rather than the content of the message, another concern is validation of the key attribute used for identification. What if the value for that attribute is set falsely? For example, what if email was not send by the address listed in the From field? STD [9] Common metrics about message sender OPP behaviors, to allow calculation of their "reputation". [10] Format and access to filter logs, such as among MX secondaries. Spammers sometimes spread their mail among the MX secondaries for a domain. Correlating typical SMTP log files merely by time and data is onerous. [11] Control protocol between recipient and filtering service server, to permit specifying policies and specific rules. [12] Modify SMTP delivery status notifications to avoid flooding innocent mailboxes because of forged senders. [Needs clarification. /ed] [13] Codify best current practices of filters to minimize sending DSN. Delivery status notifications announcing the rejection of spam often go to innocent third parties when the sending address of the spam has been forged. Rejecting the message during the SMTP transaction often, but not always, prevents this ""collateral damage." ." [This may duplicate a previous opportunity. /ed] [14] Codify DSN and SMTP status message wording, such as saying that rejections resulting from filtering should include a URL for an extended explanation. [Needs clarification. /ed] [15] Replace SMTP. The idea of replacing SMTP is appealing because it permits thinking in terms of creating an infrastructure that has accountability and restrictions built in. Unfortunately an installed base the size of the Internet is not likely to make such a change anytime soon. It seems far more likely that successful spam control mechanisms will be introduced as increments to the existing Internet mail service. Moreover, the feature of SMTP that is most responsible for spam is the ability to receive mail from strangers. Without this feature, there would be no flood of spam, but many of the most valuable Internet commercial and individual activities would also be impossible. Replacing SMTP with a protocol that allows strangers to send each other mail would not stop spam any more than SMTP-AUTH stopped spam, contrary to insistent claims to the contrary, before SMTP-AUTH became widely available and used. 3.4. Negotiation In addition to real-time analysis, a recipient may engage in an explicit negotiation with the sender, to validate them. When this is performed at the time of message receipt, it is called a "Challenge-Response"(CR) mechanism. This mechanism might use regular email exchange, or other media supporting interaction. An example of a mechanism could have the recipient MTA contact the putative sender's host, as addressed by the DNS MX record associated with the Mail-From domain name. It could send that domain a hash of the received message and ask, "Did you really send this"? The effect is essentially the same as a cryptographic message authentication, but implemented through a callback mechanism, rather than being carried with the message content. CR introduces delay in message receipt and creates at least one additional email round-trip exchange for every new sender/recipient pair. This is a substantial burden both on participants and on the transit service. Senders often refuse to respond to the challenge, so that the mechanism dissuades senders from all but the most urgent communications. In addition, the delay imposed by CR can render time-sensitive messages useless. STD [16] Validation protocol (such as OPP "challenge/response") between the recipient's and the sender system 4. EVALUATING TECHNICAL APPROACHES The complexity of Internet mail service and the nature of spam make it difficult evaluate proposals for control mechanisms. In this section, the key technical factors affecting viability are examined. 4.1. Adoption A critical barrier to the success of a new mechanism is the effort it takes to begin using it. It is essential to look carefully at the adoption process. 1) Adoption What is the effort for a new Effort participant to start using the proposed mechanism? This includes installation, learning to use it and performing initial operations. This is also called the "barrier to entry". 2) Threshold What is required before users get to benefit some benefit from the mechanism? Primarily, this looks for the number of users who must adopt the mechanism before the adopters gain utility from it. A key construct to examination of adoption and benefit is "core-vs-edge". Generally, adoption at the edge of a system is easier and quicker than adoption in the core. If a mechanism affects the core (infrastructure) then it usually must be adopted by most or all of the infrastructure before it provides meaningful utility. In something the scale of the Internet, it can take decades to reach that level of adoption, if it ever does. Remember that the Internet comprises a massive number of independent administrations, each with their own politics and funding. What is important and feasible to one might be neither to another. If the latter administration is in the handling path for a message, then it will not have implemented the necessary control mechanism. Worse, it well might not be possible to change this. For example a proposal that requires a brand new mail service is not likely to gain much traction. By contrast, some "edge" mechanisms provide utility to the first one, two or three adopters who interact with each other. No one else is needed for the adopters to gain some benefit. Each additional adopter makes the total system incrementally more useful. For example a filter can be useful to the first recipient to adopt it. A consent mechanism can be useful to the first two or three adopters, depending upon the design of the mechanism. 3) Impact on What is the impact on the senders Participants and receivers who adopt the proposal? Senders and receivers currently have certain styles of operation. How are those styles changed? 4) Impact on What is the impact on the senders Others and receivers who do *not* adopt the proposal? What effect does it have on legitimate users of email? What effect does it have on spammers? Is the nature of Internet mail changed for everyone, including non- adopters? For example, a challenge-response system is irritating for the person being challenged, and it imposes extra delay on the desired communication. If the originator and the recipient both access the Internet only occasionally (such as through dial-up when mobile) a challenge-response model can impose days of delay. For some communications, this can be disastrous. 4.2. Burden The purpose of spam control is to cause some email to fail to reach its intended destination. This is, of course, directly at odds with the constructive goal of email. Hence spam control alters the basic model of email service. Effective mechanisms must place some kind of burden on senders and receivers. Hence a challenge for spam control mechanisms is to require enough of a burden to be effective, but not so much that it makes email unacceptably painful to use. 5) Ongoing UsageOnce a user has chosen to make effort the change to adopt a mechanism, how much effort does it take to use it regularly? After the effort to adopt the mechanism, how does it affect regular email use in an ongoing basis? 6)Balance of What is the nature and burdens distribution of the burdens placed on senders and receivers who are affected by the proposed mechanism? Who must work harder to use the proposed mechanism? 4.3. Scaling "Adoption" is the process of placing a new mechanism into an operational environment. Scaling looks at the effect of having very large numbers of participants use that mechanism. 7) Use by Full What happens if everyone on the Internet Internet adopts the proposed mechanism? How is the fabric Internet mail affected when there is very large-scale use? 8) Growth of What if the Internet grows by a Internet factor of 1000? How is the fabric affected when there is much larger-scale use? Remember that "everyone" is approximately 100 million users at the time of this writing. It will to grow to 10 billion, if we expect the Internet to be useful for some decades. And it is likely there will be more email users/accounts than there are people on the planet, given that individuals and organizations occupy multiple roles. So, what will it be like for 100 million or 10 billion users to employ the proposed mechanism? Are there technology or operations "choke points" in the proposed mechanism? 9) Efficiency Will the proposed mechanism be sufficiently efficient? Is Internet mail delivered in a timely fashion? Is the burden on processing and storage acceptable? 10) Cost Will the proposed mechanism be sufficiently inexpensive? 11) Reliability Will the proposed mechanism be sufficiently reliable? Is non- spam email more likely to be delivered correctly? Less likely? There is another side to the scaling question: 12) Internet How much of the Internet will be Impact affected by a proposal, if the proposal is adopted? 13) Spam How much spam will be controlled Impact by the proposed mechanism? If a proposal requires substantial effort to adopt and use, but will affect only a small percentage of spam, the efficacy of that proposed mechanism is very much in question. One example of this concern might be legal scope, given that spam is global and there is no global law enforcement. 4.4. Robustness After a technique is adopted, spammers will adjust their techniques, attempting to work around the technique. For example, when people started using header filters, spammers started using bland deceptive subject lines, which mean that when spam gets past the filters, people are more likely to open messages and see porn pix. If whitelists become common, it is possible to envision spammers attempting to forge From addresses that are likely to be on the recipient's whitelist. 14) Circumvention How difficult will it be for spammers to change their mail to bypass the proposed scheme? How are circumvention efforts likely to affect non-spam mail? 4.5. Scenarios Almost any proposal will make sense for a particular scenario that is sufficiently constrained. The real test is how the proposal works for other, likely scenarios. Make sure the proposal considers these likely cases carefully. There are many others. Here are some typical scenarios that often discriminate among proposals for changes to email: 15) Personal For two individuals wishing to post/Reply exchange periodic email, how does the proposed mechanism work for initial contact? How does it work for ongoing contact? 16) Mailing Mailing lists are particularly List interesting because special software performs a multi-cast redistribution of a message. Still, the From field of the message is from the originator, rather than the mailing list. How does the mechanism perform in this sort of mediated distribution? Does a recipient "reply" still work properly? 17) Inter- Two or more organizations often Enterprise form special, cross-group teams to collaborate on projects. What is required to configure the proposed mechanism to support such teams? What is required to maintain the mechanism, as membership in the team changes? How are intra-team communications affected? 5. SECURITY CONSIDERATIONS This note discusses types of mechanisms for evaluating and filtering email. As such, it covers topics with extremely sensitive security concerns. However it does not propose any standards and therefore does not have any direct security effects. 5.1. Privacy Considerations Many spam control techniques affect the privacy of mail senders, receivers, or both. Bulk counting techniques can disclose the contents of mail, in systems that exchange message bodies, and can permit traffic analysis, in systems that use non-text message hashes or digests. Content filters can reveal message contents if filtered messages are examined by network personnel to check for false positives or negatives. Aggressive filtering can cause bounces and double bounces that send messages into postmaster mailboxes, disclosing content. If senders or recipients must appeal to have filtering criteria changed to avoid false positives, informal traffic analysis is possible based on the filtering terms in question. Sender tagging and other techniques intended to deter address forgery make it more difficult to send anonymous or pseudonymous mail. E-postage schemes can identify senders unless the scheme allows users to buy and redeem stamps anonymously. Several popular spam control systems involve routing incoming mail through the mail systems of third parties that are responsible for filtering mail. This exposes their contents to those parties. These privacy risks can in principle be known to mail receivers, although operators of mail systems often fail to inform users of the anti-spam tools and third party services through which their mail passes. Mail senders often cannot know even in principle about these risks to their privacy. 6. APPENDIX 6.1. Spam Control Proposal Evaluation Checklist 1) Adoption Effort 2) Threshold to benefit 3) Impact on Participants 4) Impact on Others 5) Ongoing Usage effort 6) Balance of burdens 7) Use by Full Internet 8) Growth of Internet 9) Efficiency 10)Cost 11)Reliability 12)Internet Impact 13)Spam Impact 14)Circumvention 15)Personal post/Reply 16)Mailing List 17)Inter-Enterprise 6.2. Acknowledgements This note is motivate by discussions on the Anti-Spam Research Group (ASRG) mailing list and draws a number of points from discussion there. The sub-section "Burden" was taken from a posting by Dave Hendricks. 6.3. Authors' Addresses Dave Crocker Brandenburg InternetWorking 675 Spruce Drive Sunnyvale, CA 94086 USA Tel: +1.408.246.8253 dcrocker@brandenburg.com Vernon Schryver Rhyolite Software 2482 Lee Hill Drive Boulder, Colorado 80302 vjs@rhyolite.com John R. Levine Taughannock Networks PO Box 727 Trumansburg NY 14886 Tel: +1.607.330.5711 johnl@iecctaugh.com 6.4. Full Copyright Statement Copyright (C) The Internet Society (2003). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.