idnits 2.17.1 draft-ietf-repute-considerations-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 20, 2013) is 3994 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 REPUTE M. Kucherawy 3 Internet-Draft May 20, 2013 4 Intended status: Informational 5 Expires: November 21, 2013 7 Operational Considerations Regarding Reputation Services 8 draft-ietf-repute-considerations-02 10 Abstract 12 The use of reputation systems is has become a common tool in many 13 applications that seek to apply collected intelligence about traffic 14 sources. Often this is done because it is common or even expected 15 operator practice. It is therefore important to be aware of a number 16 of considerations for both operators and consumers of the data. This 17 document includes a collection of the best advice available regarding 18 providers and consumers of reputation data, based on experience to 19 date. Much of this is based on experience with email reputation 20 systems, but the concepts are generally applicable. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on November 21, 2013. 39 Copyright Notice 41 Copyright (c) 2013 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 3. Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 4. Reputation Clients . . . . . . . . . . . . . . . . . . . . . . 4 60 5. Reputation Service Providers . . . . . . . . . . . . . . . . . 6 61 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 62 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 63 8. Informative References . . . . . . . . . . . . . . . . . . . . 8 64 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . . 8 66 1. Introduction 68 Reputation services involve collecting feedback from the community 69 about sources of Internet traffic and aggregating that feedback into 70 a rating of some kind. Common examples include feedback about 71 traffic associated with specific email addresses, URIs or parts of 72 URIs, IP addresses, etc. The specific collection, analysis, and 73 rating methods vary from one service to the next and one problem 74 domain to the next, but several operational concepts appear to be 75 common to all of these. 77 The promise of the protection that reputation services offers can be 78 enticing, and many users and operators alike typically engage those 79 services merely because it is expected of them. A critical notion, 80 however, is that doing so explicitly involves a third party in the 81 flow of data those parties receive. This is often taken for granted, 82 with potentially disastrous results. 84 This document highlights this and other considerations in providing 85 and consuming reputation data services. 87 2. Background 89 The community has historically focused on identifying sources that 90 misbehave, i.e., that earn negative reputations. The purpose here is 91 to identify and filter traffic from bad actors. This grew out of 92 operational need. As the Internet grew, so did the occurence of 93 problematic traffic, especially in email. The pragmatics of email 94 (i.e., the fact that the total IP address space is more constrained 95 than the total email address space) drove the focus on using IP 96 addresses as the focus of reputation, in addition to the fact that IP 97 addresses have a degree of validation (via the TCP/IP infrastructure) 98 where email addresses have had none. 100 A specific example of a reputation service in common use in the email 101 space is the DNS blacklist [DNSBL]. This is a method of querying a 102 database as to whether a source of incoming [SMTP] email traffic 103 should be allowed to relay email, based on previous observations and 104 feedback. The method uses the IP address of the source as the basis 105 for a query to the database using the Domain Name System [DNS] as the 106 interface. [DNSBL] includes several points in its Security 107 Considerations document that are repeated and further developed here. 109 However, regardless of the identifier used as the identifier for a 110 reputation, bad actors can evade detection or its consequences by 111 changing identifiers (e.g., move to a new IP address, register a new 112 domain name, use a sub-domain). This makes the problem space 113 effectively boundless, especially as IPv6 rolls out, with its vastly 114 larger address space. 116 3. Evolution 118 More modern thinking is evolving toward the identification of good 119 actors rather than bad actors, and giving them preferential 120 treatment. This drastically reduces the problem space: There are 121 vastly more IP addresses and email addresses used by bad actors to 122 generate problematic traffic than are used by good actors to generate 123 desirable traffic. 125 Moreover, good actors tend to be represented by stable names and 126 addresses, allowing users to rely on these to identify and give 127 preferential treatment to their traffic. Good actors have no need to 128 hop around to different addresses, and already work to keep their 129 traffic clean. In addition, good actors are willing and able to 130 collaborate in the assessment process, such as by supplying validated 131 identifiers that are associated with their traffic. 133 This new approach of focusing on identification of good actors has 134 only been tried to date using manually edited whitelists, but has 135 shown promising results on that scale. 137 4. Reputation Clients 139 Operators that choose to make use of reputation services to influence 140 content allowed to pass into or through their infrastructures need to 141 understand that they are granting a third party (the reputation 142 service provider, or RSP) the ability to affect incoming traffic, for 143 better or worse. Of course, this is the whole point of engaging an 144 RSP when everything is working properly, but a number of issues are 145 worthy of consideration before establishing such a relationship. 147 Some cases have occurred where an RSP made the unilateral decision to 148 terminate its service. To encourage its clients to stop issuing 149 queries, it began reporting a maximally negative reputation about all 150 subjects, causing rejection of all incoming traffic during the 151 incident period. Although one would hope such incidents to be rare, 152 automated means to detect such unfortunate returns (malicious or 153 otherwise) and take remedial should be considered. 155 RSPs will be the subject of attacks once it is understood that sucess 156 in doing so will allow malicious content to evade detection and 157 filtering. Users of RSPs need to be aware of possible interruptions 158 in service availability or quality. 160 Similarly, some actors will try to "game" the service, which is to 161 say that such actors will attempt to determine patterns of behavior 162 that result in the reporting of favorable reputations, and in doing 163 so, acquire artifically inflated reputations. One could reasonably 164 assume that a reputation service is inherently fragile. For 165 operational clients, this should prompt balanced and comparative, 166 rather than unilateral, use of the service. 168 It is suggested that, when engaging an RSP, an operator should try to 169 learn the following things about the RSP in order to understand the 170 exposure potential: 172 o the RSP's basis for listing or not listing particular subjects; 174 o if an RSP is paid by its listees, the rate and criteria for 175 rejection from being listed; 177 o how the RSP collects data about subjects; 179 o how many data points are input to the reported reputation; 181 o whether reputation is based on a reliable identifier; 183 o how the RSP establishes reliability and authenticity of those 184 data; 186 o how continuing data validity is maintained (e.g., on-going 187 monitoring of the reported data and sources); 189 o how actively data validity is tracked (e.g., how changes are 190 detected); 192 o how disputed reputations are handled; 194 o how often input data expire; 196 o whether older information is more or less influential than newer; 198 o whether the reported reputation a scalar, a Boolean value, a 199 collection of values, or something else; 201 o when transitioning among RSPs, the differences between them among 202 these above points; that is, whether a particular score from one 203 means the same thing from another. 205 An operator using an RSP would be wise to ensure it has the 206 capability to effect local overrides for cases where the client 207 expects to disagree with the reported reputation. 209 An operator should be able limit the impact of a negative reputation 210 on content acceptance. For example, rather than rejecting content 211 outright when a negative reputation is returned, simply subject it to 212 additional (i.e., more thorough) local analysis before permitting the 213 traffic to pass. In other words, the reputation may simply allow 214 certain layers of a multi-layered filtering system to be bypassed 215 when that reputation is favorable. 217 A sensible default should apply when the RSP is not available. This 218 can also be a query to a different RSP known to be less robust than 219 the primary one. 221 Recent proposals have focused on tailoring operation to prefer or 222 emphasize content whose sources have positive reputations. As stated 223 above, negative reputations are easy to shed, while the universe of 224 things that will earn and maintain positive reputations is relatively 225 small. Designing a filtering system that observes these notions is 226 expected to be more lightweight to operate and harder to game. 228 One choice is to query and cross-referencing multiple RSPs. This can 229 help to detect which ones under comparison are reliable, and offsets 230 the effect of anomalous replies. 232 5. Reputation Service Providers 234 Operators intending to provide a reptuation service need to consider 235 that there are many flavors of clients. There will be clients that 236 are prepared to make use of a reputation service blindly, while 237 others will be interested in understanding more fully the nature of 238 the service being provided. These can be likened to a consumer 239 credit check that only seeks a yes-or-no reply versus wanting to 240 review a detailed credit report. An operator of an RSP should be 241 prepared to answer as many of the questions identified in Section 4 242 as possible, not only because wise clients will ask, but also because 243 they reflect issues that have arisen over the years, and exploration 244 of the points they raise will result in a more robust reputation 245 service. 247 Obviously, in computing reputations via traffic analysis, some 248 private algorithms may come into play. For some RSPs, such "secret 249 sauce" comprises their competitive advantage over others in the same 250 space. This document is not suggesting that all private algorithms 251 need to be exposed for a reputation service to be acceptable. 252 Instead, it is anticipated that enough of the above details need to 253 be available to ensure consumers (and in some cases, industry or the 254 general public) that the RSP can be trusted to influence key local 255 policy decisions. 257 Reptuations should be based on accurate identifiers, i.e., some 258 property of the content under analysis that is difficult to falsify. 259 For example, in the realm of email, the address found in the From: 260 header field of a message is typically not verifiable, while the 261 domain name found in a validated domain-level signature is. In this 262 case, constructing a reputation system based on the domain name is 263 more useful than one based on the From: field. 265 The biggest frustration with most RSPs to date has been the absence 266 of a visible, accessible, and transparent process for remediating the 267 errant addition of an identifier to a negative reputation list. An 268 RSP in widespread use is perceived to have enormous power when its 269 results are used to reject traffic outright; when a "bad" entry is 270 added referencing a good actor, it can have destructive effects, so 271 an effective mechanism to fix such problems needs to exist. 273 To accommodate clients with varying sensitivities, it is advisable 274 for the query mechanism used to access the RSP to provide the ability 275 to request details in the returned result about how the result was 276 reached, allowing the client to decide if the result should be 277 applied. For example, it shoudl be possible for the reply to 278 contain: 280 o the result itself; 282 o the number of data points used to compute the result; 284 o the age range of the data; 286 o source diversity of the input data; 288 o currency of the result (i.e., when it was computed); 290 o basis of the result (i.e., which identifier was used). 292 The systems and algorithms used by the RSP to compute the reported 293 reputation will need to be hardened as much as practicable against 294 gaming or other forms of data poisoning. Larger source diversities 295 are harder to overcome with poisoned input, but are expensive to 296 build in terms of both infrastructure and time. 298 Systems focused on assigning positive reputations rather than negtive 299 ones are promising since positive reputations, if made difficult to 300 earn, put a large cost on bad actors, which may be enough to dissuade 301 them entirely. 303 6. Security Considerations 305 Several points are raised above that can be described as threats to 306 the delivery of valid user data. This document highlights and 307 discusses those matters, but introduces no new security issues. 309 7. IANA Considerations 311 This memo contains no actions for IANA. 313 [RFC Editor: Please remove this section prior to publication.] 315 8. Informative References 317 [DNS] Mockapetris, P., "Domain Names -- Concepts and Facilities", 318 RFC 1034, November 1987. 320 [DNSBL] Levine, J., "DNS Blacklists and Whitelists", RFC 5782, 321 February 2010. 323 [SMTP] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, 324 October 2008. 326 Appendix A. Acknowledgements 328 The author wishes to acknowledge the following for their review and 329 constructive criticism of this proposal: Chris Barton, Vincent 330 Schonau 332 Author's Address 334 Murray S. Kucherawy 336 EMail: superuser@gmail.com