idnits 2.17.1 draft-ietf-repute-considerations-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 5, 2013) is 4009 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 REPUTE M. Kucherawy 3 Internet-Draft May 5, 2013 4 Intended status: Informational 5 Expires: November 6, 2013 7 Operational Considerations Regarding Reputation Services 8 draft-ietf-repute-considerations-01 10 Abstract 12 The use of reputation systems is has become a common tool in many 13 applications that seek to apply collected intelligence about traffic 14 sources. Often this is done because it is common or even expected 15 operator practice. It is therefore important to be aware of a number 16 of considerations for both operators and consumers of the data. This 17 document includes a collection of the best advice available regarding 18 providers and consumers of reputation data, based on experience to 19 date. Much of this is based on experience with email reputation 20 systems, but the concepts are generally applicable. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on November 6, 2013. 39 Copyright Notice 41 Copyright (c) 2013 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 3. Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 4. Reputation Clients . . . . . . . . . . . . . . . . . . . . . . 4 60 5. Reputation Service Providers . . . . . . . . . . . . . . . . . 6 61 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 62 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7 63 8. Informative References . . . . . . . . . . . . . . . . . . . . 8 64 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . . 8 66 1. Introduction 68 Reputation services involve collecting feedback from the community 69 about sources of Internet traffic and aggregating that feedback into 70 a rating of some kind. Common examples include feedback about 71 traffic associated with specific email addresses, URIs or parts of 72 URIs, IP addresses, etc. The specific collection, analysis, and 73 rating methods vary from one service to the next and one problem 74 domain to the next, but several operational concepts appear to be 75 common to all of these. 77 The promise of the protection that reputation services offers can be 78 enticing, and many users and operators alike typically engage those 79 services merely because it is expected of them. A critical notion, 80 however, is that doing so explicitly involves a third party in the 81 flow of data those parties receive. This is often taken for granted, 82 with potentially disastrous results. 84 This document highlights this and other considerations in providing 85 and consuming reputation data services. 87 2. Background 89 The community has historically focused on identifying sources that 90 misbehave, i.e., that earn negative reputations. The purpose here is 91 to identify and filter traffic from bad actors. This grew out of 92 operational need. As the Internet grew, so did the occurence of 93 problematic traffic, especially in email. The pragmatics of email 94 (i.e., the fact that the total IP address space is more constrained 95 than the total email address space) drove the focus on using IP 96 addresses as the focus of reputation, in addition to the fact that IP 97 addresses have a degree of validation (via the TCP/IP infrastructure) 98 where email addresses have had none. 100 A specific example of a reputation service in common use in the email 101 space is the DNS blacklist [DNSBL]. This is a method of querying a 102 database as to whether a source of incoming [SMTP] email traffic 103 should be allowed to relay email, based on previous observations and 104 feedback. The method uses the IP address of the source as the basis 105 for a query to the database using the Domain Name System [DNS] as the 106 interface. [DNSBL] includes several points in its Security 107 Considerations document that are repeated and further developed here. 109 However, regardless of the identifier used as the identifier for a 110 reputation, bad actors can evade detection or the effects of their 111 observed behavior by changing identifiers (e.g., move to a new IP 112 address, register a new domain name, use a sub-domain). This makes 113 the problem space effectively boundless, especially as IPv6 rolls 114 out. 116 3. Evolution 118 More modern thinking is evolving toward the identification of good 119 actors rather than bad actors, and giving them preferential 120 treatment. This drastically reduces the problem space: There are 121 vastly more IP addresses and email addresses used by bad actors to 122 generate problematic traffic than are used by good actors to generate 123 desirable traffic. 125 Moreover, good actors tend to be represented by stable names and 126 addresses, allowing users to rely on these to identify and give 127 preferential treatment to their traffic. Good actors have no need to 128 hop around to different addresses, and already work to keep their 129 traffic clean. 131 This notion has only been tried to date using manually edited 132 whitelists, but has shown promising results on that scale. 134 4. Reputation Clients 136 Operators that choose to make use of reputation services to influence 137 content allowed to pass into or through their infrastructures need to 138 understand that they are granting a third party (the reputation 139 service provider, or RSP) the ability to affect incoming traffic, for 140 better or worse. Of course, this is the whole point of engaging an 141 RSP when everything is working properly, but a number of issues are 142 worthy of consideration before establishing such a relationship. 144 Some cases have occurred where an RSP made the unilateral decision to 145 terminate its service. To encourage its clients to stop issuing 146 queries, it began reporting a maximally negative reputation about all 147 subjects, causing rejection of all incoming traffic during the 148 incident period. Although one would hope such incidents to be rare, 149 automated means to detect such unfortunate returns (malicious or 150 otherwise) and take remedial should be considered. 152 RSPs will be the subject of attacks once it is understood that sucess 153 in doing so will allow malicious content to evade detection and 154 filtering. Users of RSPs need to be aware of possible interruptions 155 in service availability or quality. 157 Similarly, some actors will try to "game" the service, which is to 158 say that such actors will attempt to determine patterns of behavior 159 that result in the reporting of favorable reputations, and in doing 160 so, acquire artifically inflated reputations. One could reasonably 161 assume that a reputation service is inherently fragile. For 162 operational clients, this should prompt balanced and comparative, 163 rather than unilateral, use of the service. 165 It is suggested that, when engaging an RSP, an operator should try to 166 learn the following things about the RSP in order to understand the 167 exposure potential: 169 o the RSP's basis for listing or not listing particular subjects; 171 o if an RSP is paid by its listees, the rate and criteria for 172 rejection from being listed; 174 o how the RSP collects data about subjects; 176 o how many data points are input to the reported reputation; 178 o whether reputation is based on a reliable identifier; 180 o how the RSP establishes reliability and authenticity of those 181 data; 183 o how data validity is maintained (e.g., on-going monitoring of the 184 reported data and sources); 186 o how actively data validity is tracked (e.g., how changes are 187 detected); 189 o how disputed reputations are handled; 191 o how often input data expire; 193 o whether older information more or less influential than newer; 195 o whether the reported reputation a scalar, a Boolean value, a 196 collection of values, or something else; 198 o when transitioning among RSPs, the differences between them among 199 these above points; that is, whether a particular score from one 200 means the same thing from another. 202 An operator using an RSP would be wise to ensure it has the 203 capability to effect local overrides for cases where the client 204 expects to disagree with the reported reputation. 206 An operator should be able limit the impact of a negative reputation 207 on content acceptance. For example, rather than rejecting content 208 outright when a negative reputation is returned, simply subject it to 209 additional (i.e., more thorough) local analyis before permitting the 210 traffic to pass. 212 A sensible default should apply when the RSP is not available. This 213 may also be a query to a different RSP known to be less robust than 214 the primary one. 216 Recent proposals have focused on tailoring operation to prefer or 217 emphasize content whose sources have positive reputations. As stated 218 above, negative reputations are easy to shed, and the universe of 219 things that will earn and maintain positive reputations is relatively 220 small. Designing a filtering system that observes these notions is 221 expected to be more lightweight to operate and harder to game. 223 One choice is to query and cross-referencing multiple RSPs. This can 224 help to detect which ones under comparison are reliable, and offsets 225 the effect of anomalous replies. 227 5. Reputation Service Providers 229 Operators intending to provide a reptuation service need to consider 230 that there are many flavors of clients. There will be clients that 231 are prepared to make use of a reputation service blindly, while 232 others will be interested in understanding more fully the nature of 233 the service being provided. An operator of an RSP should be prepared 234 to answer as may of the questions identified in Section 4 as 235 possible, not only because wise clients will ask, but also because 236 they reflect issues that have arisen over the years, and exploration 237 of the points they raise will result in a more robust reputation 238 service. 240 Obviously, in computing reputations via traffic analysis, some 241 private algorithms may come into play. For some RSPs, such "secret 242 sauce" comprises their competitive advantage over others in the same 243 space. This document is not suggesting that all private algorithms 244 need to be exposed for a reputation service to be acceptable. 245 Instead, it is anticipated that enough of the above details need to 246 be available to ensure consumers (and in some cases, industry or the 247 general public) that the RSP can be trusted to influence key local 248 policy decisions. 250 Reptuations should be based on accurate identifiers, i.e., some 251 property of the content under analysis that is difficult to falsify. 252 For example, in the realm of email, the address found in the From: 253 field of a message is typically not verifiable, while the domain name 254 found in a validated domain-level signature is. In this case, 255 constructing a reputation system based on the domain name is more 256 useful than one based on the From: field. 258 The biggest frustration with most RSPs to date has been the absence 259 of a visible, accessible, and transparent process for remediating the 260 errant addition of an identifier to a negative reputation list. An 261 RSP in widespread use is perceived to have enormous power when its 262 results are used to reject traffic outright; when a "bad" entry is 263 added referencing a good actor, it can have destructive effects, so 264 an effective mechanism to fix such problems needs to exist. 266 To accommodate clients with varying sensitivities, it is advisable 267 for the query mechanism used to access the RSP to provide the ability 268 to request details in the returned result about how the result was 269 reached, allowing the client to decide if the result should be 270 applied. For example, it shoudl be possible for the reply to 271 contain: 273 o the result itself; 275 o the number of data points used to compute the result; 277 o the age range of the data; 279 o source diversity of the input data; 281 o currency of the result (i.e., when it was computed); 283 o basis of the result (i.e., which identifier was used). 285 The systems and algorithms used by the RSP to compute the reported 286 reputation will need to be hardened as much as practicable against 287 gaming or other forms of data poisoning. Larger source diversities 288 are harder to overcome with poisoned input, but are expensive to 289 build in terms of both infrastructure and time. 291 Systems focused on assigning positive reputations rather than negtive 292 ones are promising since positive reputations, if made difficult to 293 earn, put a large cost on bad actors, which may be enough to dissuade 294 them entirely. 296 6. Security Considerations 298 Several points are raised above that can be described as threats to 299 the delivery of valid user data. This document highlights and 300 discusses those matters, but introduces no new security issues. 302 7. IANA Considerations 304 This memo contains no actions for IANA. 306 [RFC Editor: Please remove this section prior to publication.] 308 8. Informative References 310 [DNS] Mockapetris, P., "Domain Names -- Concepts and Facilities", 311 RFC 1034, November 1987. 313 [DNSBL] Levine, J., "DNS Blacklists and Whitelists", RFC 5782, 314 February 2010. 316 [SMTP] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, 317 October 2008. 319 Appendix A. Acknowledgements 321 The author wishes to acknowledge the following for their review and 322 constructive criticism of this proposal: Chris Barton, Vincent 323 Schonau 325 Author's Address 327 Murray S. Kucherawy 329 EMail: superuser@gmail.com