idnits 2.17.1 draft-ietf-repute-considerations-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 20, 2013) is 3808 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 REPUTE M. Kucherawy 3 Internet-Draft November 20, 2013 4 Intended status: Informational 5 Expires: May 24, 2014 7 Considerations Regarding Third-Party Reputation Services 8 draft-ietf-repute-considerations-03 10 Abstract 12 Reputation services offer quality assessments about likely future 13 behavior, based on past behaviors. The use of these services has 14 become a common tool in many applications that seek to apply 15 collected intelligence about traffic sources. Often this is done 16 because it is common or even expected operator practice. It is 17 therefore important to be aware of a number of considerations for 18 both operators and consumers of the data. This document includes a 19 collection of the best advice available regarding providers and 20 consumers of reputation data, based on experience to date. Much of 21 this is based on experience with email reputation systems, but the 22 concepts are generally applicable. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on May 24, 2014. 41 Copyright Notice 43 Copyright (c) 2013 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 59 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 3. Using Reputation Services . . . . . . . . . . . . . . . . . . . 4 61 4. Providing Reputation Services . . . . . . . . . . . . . . . . . 6 62 5. Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 63 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 64 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 65 8. Informative References . . . . . . . . . . . . . . . . . . . . 8 66 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . . 9 68 1. Introduction 70 Reputation services involve collecting feedback from the community 71 about sources of Internet traffic and aggregating that feedback into 72 a rating of some kind. Common examples include feedback about 73 traffic associated with specific email addresses, URIs or parts of 74 URIs, IP addresses, etc. The specific collection, analysis, and 75 rating methods vary from one service to the next and one problem 76 domain to the next, but several operational concepts appear to be 77 common to all of these. 79 The promise of the protection that relying on reputation services 80 offers can be enticing, and many users and operators alike typically 81 engage those services merely because it is expected of them. A 82 critical notion, however, is that use of such a service explicitly 83 involves a third party in the flow of data being received. This is 84 often taken for granted, with potentially disastrous results. 86 This document highlights this and other considerations in providing 87 and consuming reputation data services. 89 2. Background 91 The anti-abuse community has historically focused on identifying 92 sources that misbehave, i.e., that earn negative reputations. For 93 email, this means identifying sources of spam; for security, it means 94 identifying sources of penetration attacks. The purpose here is to 95 identify and filter traffic from bad actors. This grew out of 96 operational need. As the Internet grew, so did the occurrence of 97 problematic traffic, especially in email. The pragmatics of email 98 (i.e., the fact that the total IP address space is more constrained 99 than the total email address space) drove the focus on using IP 100 addresses as the focus of reputation, in addition to the fact that IP 101 addresses have a degree of validation (via the TCP/IP infrastructure) 102 where email addresses have had none. 104 The major considerations around a third-party reputation service are: 106 Raw data: The method of obtaining the information that will be 107 analyzed; 109 Rating method: The techniques used on the collected data to compute 110 a rating or other expression of expected behavior; 112 Publication: How consumers obtain the computed ratings. 114 A specific example of a publication method in common use in the email 115 space is the DNS blacklist [DNSBL]. In particular, the operator of a 116 reputation service computes reputations of IP addresses and stores 117 them in a database. Via a DNSBL query, a consumer can query the 118 database as to whether mail should be accepted from a particular 119 source of incoming [SMTP], based on previous observations and 120 feedback. The service uses the IP address of the source as the basis 121 for a query to the database, accessed through the Domain Name System 122 [DNS]. [DNSBL] includes several points in its Security 123 Considerations document that are repeated and further developed here. 125 However, regardless of the identifier used for a reputation, bad 126 actors can evade detection or its consequences by changing 127 identifiers (e.g., move to a new IP address, register a new domain 128 name, use a sub-domain). This makes the problem space effectively 129 boundless, especially as IPv6 rolls out, with its vastly larger 130 address space. 132 A framework for reputation services is introduced in [REPUTE] and the 133 documents it references. 135 3. Using Reputation Services 137 Operators that choose to make use of treputation services to 138 influence content allowed to pass into or through their 139 infrastructures need to understand that they are granting a third 140 party (the reputation service provider, or RSP) the ability to affect 141 the handling of incoming traffic, for better or worse. Of course, 142 this is the whole point of engaging an RSP when everything is working 143 properly, but a number of issues are worthy of consideration before 144 establishing such a relationship. 146 Some cases have occurred where an RSP made the unilateral decision to 147 terminate its service. To encourage its clients to stop issuing 148 queries, it began reporting a maximally negative reputation about all 149 subjects, causing rejection of all incoming traffic during the 150 incident period. Although one would hope such incidents to be rare, 151 automated means to detect such unfortunate returns (malicious or 152 otherwise) and take remedial should be considered. 154 RSPs will be the subject of attacks once it is understood that 155 success in doing so will allow malicious content to evade detection 156 and filtering. Users of RSPs need to plan for possible interruptions 157 in service availability or quality. 159 Similarly, some actors will try to "game" the service, which is to 160 say that such actors will attempt to determine patterns of behavior 161 that result in the reporting of favorable reputations, and in doing 162 so, acquire artificially inflated reputations. One could reasonably 163 assume that a reputation service is inherently fragile. For 164 operational clients, this should prompt balanced and comparative, 165 rather than unilateral, use of the service. 167 It is suggested that, when engaging an RSP, an operator should try to 168 learn the following things about the RSP in order to understand the 169 exposure potential: 171 o the RSP's basis for listing or not listing particular subjects; 173 o if an RSP is paid by its listees, the rate and criteria for 174 rejection from being listed; 176 o how the RSP collects data about subjects; 178 o how many data points are input to the reported reputation; 180 o whether reputation is based on a reliable identifier; 182 o how the RSP establishes reliability and authenticity of those 183 data; 185 o how continuing data validity is maintained (e.g., on-going 186 monitoring of the reported data and sources); 188 o how actively data validity is tracked (e.g., how changes are 189 detected); 191 o how disputed reputations are handled; 193 o how often input data expire; 195 o whether older information is more or less influential than newer; 197 o whether the reported reputation a scalar, a Boolean value, a 198 collection of values, or something else; 200 o when transitioning among RSPs, the differences between them among 201 these above points; that is, whether a particular score from one 202 means the same thing from another. 204 An operator using an RSP would be wise to ensure it has the 205 capability to give preference to local policies, for cases where the 206 client expects to disagree with the reported reputation. 208 An operator should be able limit the impact of a negative reputation 209 on content acceptance. For example, rather than rejecting content 210 outright when a negative reputation is returned, simply subject it to 211 additional (i.e., more thorough) local analysis before permitting the 212 traffic to pass. In other words, the reputation may simply allow 213 certain layers of a multi-layered filtering system to be bypassed 214 when that reputation is favorable. 216 A sensible default should apply when the RSP is not available. This 217 can also be a query to a different RSP known to be less robust than 218 the primary one. 220 Recent proposals such as the experimental system implemented in 221 [OPENDKIM] have focused on tailoring operation to prefer or emphasize 222 content whose sources have positive reputations. See Section 5 for 223 discussion of this notion. As stated in Section 1, negative 224 reputations are easy to shed, while the universe of things that will 225 earn and maintain positive reputations is relatively small. 226 Designing a filtering system that observes these notions is expected 227 to be more lightweight to operate and harder to game. 229 One choice is to query and cross-reference multiple RSPs. This can 230 help to detect which ones under comparison are reliable, and offsets 231 the effect of anomalous replies. More generally, a robust mechanism 232 that is using a third-party service needs to contain an array of 233 mechanisms, and to limit its dependence on any one mechanism, as well 234 as protect against for misbehavior by an individual mechanism. 236 4. Providing Reputation Services 238 Operators intending to provide a reputation service need to consider 239 that there are many flavors of clients. There will be clients that 240 are prepared to make use of a reputation service blindly, while 241 others will be interested in understanding more fully the nature of 242 the service being provided. These can be likened to a consumer 243 credit check that only seeks a yes-or-no reply versus wanting to 244 review a detailed credit report. An operator of an RSP should be 245 prepared to answer as many of the questions identified in Section 3 246 as possible, not only because wise clients will ask, but also because 247 they reflect issues that have arisen over the years, and diligent 248 exploration of the points they raise will result in a better 249 reputation service. 251 Obviously, in computing reputations via traffic analysis, some 252 private algorithms may come into play. For some RSPs, such "secret 253 sauce" comprises their competitive advantage over others in the same 254 space. This document is not suggesting that all private algorithms 255 need to be exposed for a reputation service to be acceptable. 256 Instead, it is anticipated that enough of the above details need to 257 be available to ensure consumers (and in some cases, industry or the 258 general public) that the RSP can be trusted to influence key local 259 policy decisions. 261 Reputations should be based on accurate identifiers, i.e., some 262 property of the content under analysis that is difficult to falsify. 263 For example, in the realm of email, the address found in the From: 264 header field of a message is typically not verifiable, while the 265 domain name found in a validated domain-level signature is. In this 266 case, constructing a reputation system based on the domain name is 267 more useful than one based on the From: field. 269 The biggest frustration with most RSPs to date has been the challenge 270 of dealing with errors: there ofen is no visible, accessible, and 271 transparent process for remediating the errant addition of an 272 identifier to a negative reputation list. An RSP in widespread use 273 is perceived to have enormous power when its results are used to 274 reject traffic outright; when a "bad" entry is added referencing a 275 good actor, it can have destructive effects, so an effective 276 mechanism to fix such problems needs to exist. 278 Clients clients with varying sensitivities need to be accomodated. 279 The mechanism that is used to access the RSP should provide an 280 ability to request that query results include details about the basis 281 for producing those results. This will help the user to decide how 282 to apply those results. For example, it should be possible for the 283 reply to contain: 285 o the result itself; 287 o the number of data points used to compute the result; 289 o the age range of the data; 291 o source diversity of the input data; 293 o currency of the result (i.e., when it was computed); 295 o basis of the result (i.e., which identifier was used). 297 The systems and algorithms used by the RSP to compute the reported 298 reputation will need to be hardened as much as practicable against 299 gaming or other forms of data poisoning. Larger source diversities 300 are harder to overcome with poisoned input, but are expensive to 301 build in terms of both infrastructure and time. 303 Systems focused on assigning positive reputations rather than 304 negative ones are promising since positive reputations, if made 305 difficult to earn, put a large cost on bad actors, which may be 306 enough to dissuade them entirely. 308 5. Evolution 310 Recent consideration of reputation efforts is evolving toward the 311 identification of good actors rather than bad actors, and giving them 312 preferential treatment. This drastically reduces the problem space: 313 There are vastly more IP addresses and email addresses used by bad 314 actors to generate problematic traffic than are used by good actors 315 to generate desirable traffic. 317 Moreover, good actors tend to be represented by stable names and 318 addresses, allowing users to rely on these to identify and give 319 preferential treatment to their traffic. Good actors have no need to 320 hop around to different addresses, and already work to keep their 321 traffic clean. In addition, good actors are willing and able to 322 collaborate in the assessment process, such as by supplying validated 323 identifiers that are associated with their traffic. 325 This new approach of focusing on identification of good actors has 326 only been tried to date using manually edited whitelists, but has 327 shown promising results on that scale. 329 6. Security Considerations 331 Several points are raised above that can be described as threats to 332 the delivery of valid user data. This document highlights and 333 discusses those matters, but introduces no new security issues. 335 7. IANA Considerations 337 This memo contains no actions for IANA. 339 [RFC Editor: Please remove this section prior to publication.] 341 8. Informative References 343 [DNS] Mockapetris, P., "Domain Names -- Concepts and 344 Facilities", RFC 1034, November 1987. 346 [DNSBL] Levine, J., "DNS Blacklists and Whitelists", RFC 5782, 347 February 2010. 349 [OPENDKIM] "OpenDKIM (Open Source DKIM)", July 2013, 350 . 352 [REPUTE] Borenstein, N. and M. Kucherawy, "An Architecture for 353 Reputation Reporting", RFC 7070, November 2013. 355 [SMTP] Klensin, J., "Simple Mail Transfer Protocol", RFC 5321, 356 October 2008. 358 Appendix A. Acknowledgments 360 The author wishes to acknowledge the following for their review and 361 constructive criticism of this proposal: Chris Barton, Dave Crocker, 362 Vincent Schonau 364 Author's Address 366 Murray S. Kucherawy 368 EMail: superuser@gmail.com