idnits 2.17.1 

draft-ietf-repute-considerations-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (November 20, 2013) is 3808 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	REPUTE                                                      M. Kucherawy
3	Internet-Draft                                         November 20, 2013
4	Intended status: Informational
5	Expires: May 24, 2014

7	        Considerations Regarding Third-Party Reputation Services
8	                  draft-ietf-repute-considerations-03

10	Abstract

12	   Reputation services offer quality assessments about likely future
13	   behavior, based on past behaviors.  The use of these services has
14	   become a common tool in many applications that seek to apply
15	   collected intelligence about traffic sources.  Often this is done
16	   because it is common or even expected operator practice.  It is
17	   therefore important to be aware of a number of considerations for
18	   both operators and consumers of the data.  This document includes a
19	   collection of the best advice available regarding providers and
20	   consumers of reputation data, based on experience to date.  Much of
21	   this is based on experience with email reputation systems, but the
22	   concepts are generally applicable.

24	Status of This Memo

26	   This Internet-Draft is submitted in full conformance with the
27	   provisions of BCP 78 and BCP 79.

29	   Internet-Drafts are working documents of the Internet Engineering
30	   Task Force (IETF).  Note that other groups may also distribute
31	   working documents as Internet-Drafts.  The list of current Internet-
32	   Drafts is at http://datatracker.ietf.org/drafts/current/.

34	   Internet-Drafts are draft documents valid for a maximum of six months
35	   and may be updated, replaced, or obsoleted by other documents at any
36	   time.  It is inappropriate to use Internet-Drafts as reference
37	   material or to cite them other than as "work in progress."

39	   This Internet-Draft will expire on May 24, 2014.

41	Copyright Notice

43	   Copyright (c) 2013 IETF Trust and the persons identified as the
44	   document authors.  All rights reserved.

46	   This document is subject to BCP 78 and the IETF Trust's Legal
47	   Provisions Relating to IETF Documents
48	   (http://trustee.ietf.org/license-info) in effect on the date of
49	   publication of this document.  Please review these documents
50	   carefully, as they describe your rights and restrictions with respect
51	   to this document.  Code Components extracted from this document must
52	   include Simplified BSD License text as described in Section 4.e of
53	   the Trust Legal Provisions and are provided without warranty as
54	   described in the Simplified BSD License.

56	Table of Contents

58	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
59	   2.  Background  . . . . . . . . . . . . . . . . . . . . . . . . . . 3
60	   3.  Using Reputation Services . . . . . . . . . . . . . . . . . . . 4
61	   4.  Providing Reputation Services . . . . . . . . . . . . . . . . . 6
62	   5.  Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
63	   6.  Security Considerations . . . . . . . . . . . . . . . . . . . . 8
64	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8
65	   8.  Informative References  . . . . . . . . . . . . . . . . . . . . 8
66	   Appendix A.  Acknowledgments  . . . . . . . . . . . . . . . . . . . 9

68	1.  Introduction

70	   Reputation services involve collecting feedback from the community
71	   about sources of Internet traffic and aggregating that feedback into
72	   a rating of some kind.  Common examples include feedback about
73	   traffic associated with specific email addresses, URIs or parts of
74	   URIs, IP addresses, etc.  The specific collection, analysis, and
75	   rating methods vary from one service to the next and one problem
76	   domain to the next, but several operational concepts appear to be
77	   common to all of these.

79	   The promise of the protection that relying on reputation services
80	   offers can be enticing, and many users and operators alike typically
81	   engage those services merely because it is expected of them.  A
82	   critical notion, however, is that use of such a service explicitly
83	   involves a third party in the flow of data being received.  This is
84	   often taken for granted, with potentially disastrous results.

86	   This document highlights this and other considerations in providing
87	   and consuming reputation data services.

89	2.  Background

91	   The anti-abuse community has historically focused on identifying
92	   sources that misbehave, i.e., that earn negative reputations.  For
93	   email, this means identifying sources of spam; for security, it means
94	   identifying sources of penetration attacks.  The purpose here is to
95	   identify and filter traffic from bad actors.  This grew out of
96	   operational need.  As the Internet grew, so did the occurrence of
97	   problematic traffic, especially in email.  The pragmatics of email
98	   (i.e., the fact that the total IP address space is more constrained
99	   than the total email address space) drove the focus on using IP
100	   addresses as the focus of reputation, in addition to the fact that IP
101	   addresses have a degree of validation (via the TCP/IP infrastructure)
102	   where email addresses have had none.

104	   The major considerations around a third-party reputation service are:

106	   Raw data:  The method of obtaining the information that will be
107	      analyzed;

109	   Rating method:  The techniques used on the collected data to compute
110	      a rating or other expression of expected behavior;

112	   Publication:  How consumers obtain the computed ratings.

114	   A specific example of a publication method in common use in the email
115	   space is the DNS blacklist [DNSBL].  In particular, the operator of a
116	   reputation service computes reputations of IP addresses and stores
117	   them in a database.  Via a DNSBL query, a consumer can query the
118	   database as to whether mail should be accepted from a particular
119	   source of incoming [SMTP], based on previous observations and
120	   feedback.  The service uses the IP address of the source as the basis
121	   for a query to the database, accessed through the Domain Name System
122	   [DNS].  [DNSBL] includes several points in its Security
123	   Considerations document that are repeated and further developed here.

125	   However, regardless of the identifier used for a reputation, bad
126	   actors can evade detection or its consequences by changing
127	   identifiers (e.g., move to a new IP address, register a new domain
128	   name, use a sub-domain).  This makes the problem space effectively
129	   boundless, especially as IPv6 rolls out, with its vastly larger
130	   address space.

132	   A framework for reputation services is introduced in [REPUTE] and the
133	   documents it references.

135	3.  Using Reputation Services

137	   Operators that choose to make use of treputation services to
138	   influence content allowed to pass into or through their
139	   infrastructures need to understand that they are granting a third
140	   party (the reputation service provider, or RSP) the ability to affect
141	   the handling of incoming traffic, for better or worse.  Of course,
142	   this is the whole point of engaging an RSP when everything is working
143	   properly, but a number of issues are worthy of consideration before
144	   establishing such a relationship.

146	   Some cases have occurred where an RSP made the unilateral decision to
147	   terminate its service.  To encourage its clients to stop issuing
148	   queries, it began reporting a maximally negative reputation about all
149	   subjects, causing rejection of all incoming traffic during the
150	   incident period.  Although one would hope such incidents to be rare,
151	   automated means to detect such unfortunate returns (malicious or
152	   otherwise) and take remedial should be considered.

154	   RSPs will be the subject of attacks once it is understood that
155	   success in doing so will allow malicious content to evade detection
156	   and filtering.  Users of RSPs need to plan for possible interruptions
157	   in service availability or quality.

159	   Similarly, some actors will try to "game" the service, which is to
160	   say that such actors will attempt to determine patterns of behavior
161	   that result in the reporting of favorable reputations, and in doing
162	   so, acquire artificially inflated reputations.  One could reasonably
163	   assume that a reputation service is inherently fragile.  For
164	   operational clients, this should prompt balanced and comparative,
165	   rather than unilateral, use of the service.

167	   It is suggested that, when engaging an RSP, an operator should try to
168	   learn the following things about the RSP in order to understand the
169	   exposure potential:

171	   o  the RSP's basis for listing or not listing particular subjects;

173	   o  if an RSP is paid by its listees, the rate and criteria for
174	      rejection from being listed;

176	   o  how the RSP collects data about subjects;

178	   o  how many data points are input to the reported reputation;

180	   o  whether reputation is based on a reliable identifier;

182	   o  how the RSP establishes reliability and authenticity of those
183	      data;

185	   o  how continuing data validity is maintained (e.g., on-going
186	      monitoring of the reported data and sources);

188	   o  how actively data validity is tracked (e.g., how changes are
189	      detected);

191	   o  how disputed reputations are handled;

193	   o  how often input data expire;

195	   o  whether older information is more or less influential than newer;

197	   o  whether the reported reputation a scalar, a Boolean value, a
198	      collection of values, or something else;

200	   o  when transitioning among RSPs, the differences between them among
201	      these above points; that is, whether a particular score from one
202	      means the same thing from another.

204	   An operator using an RSP would be wise to ensure it has the
205	   capability to give preference to local policies, for cases where the
206	   client expects to disagree with the reported reputation.

208	   An operator should be able limit the impact of a negative reputation
209	   on content acceptance.  For example, rather than rejecting content
210	   outright when a negative reputation is returned, simply subject it to
211	   additional (i.e., more thorough) local analysis before permitting the
212	   traffic to pass.  In other words, the reputation may simply allow
213	   certain layers of a multi-layered filtering system to be bypassed
214	   when that reputation is favorable.

216	   A sensible default should apply when the RSP is not available.  This
217	   can also be a query to a different RSP known to be less robust than
218	   the primary one.

220	   Recent proposals such as the experimental system implemented in
221	   [OPENDKIM] have focused on tailoring operation to prefer or emphasize
222	   content whose sources have positive reputations.  See Section 5 for
223	   discussion of this notion.  As stated in Section 1, negative
224	   reputations are easy to shed, while the universe of things that will
225	   earn and maintain positive reputations is relatively small.
226	   Designing a filtering system that observes these notions is expected
227	   to be more lightweight to operate and harder to game.

229	   One choice is to query and cross-reference multiple RSPs.  This can
230	   help to detect which ones under comparison are reliable, and offsets
231	   the effect of anomalous replies.  More generally, a robust mechanism
232	   that is using a third-party service needs to contain an array of
233	   mechanisms, and to limit its dependence on any one mechanism, as well
234	   as protect against for misbehavior by an individual mechanism.

236	4.  Providing Reputation Services

238	   Operators intending to provide a reputation service need to consider
239	   that there are many flavors of clients.  There will be clients that
240	   are prepared to make use of a reputation service blindly, while
241	   others will be interested in understanding more fully the nature of
242	   the service being provided.  These can be likened to a consumer
243	   credit check that only seeks a yes-or-no reply versus wanting to
244	   review a detailed credit report.  An operator of an RSP should be
245	   prepared to answer as many of the questions identified in Section 3
246	   as possible, not only because wise clients will ask, but also because
247	   they reflect issues that have arisen over the years, and diligent
248	   exploration of the points they raise will result in a better
249	   reputation service.

251	   Obviously, in computing reputations via traffic analysis, some
252	   private algorithms may come into play.  For some RSPs, such "secret
253	   sauce" comprises their competitive advantage over others in the same
254	   space.  This document is not suggesting that all private algorithms
255	   need to be exposed for a reputation service to be acceptable.
256	   Instead, it is anticipated that enough of the above details need to
257	   be available to ensure consumers (and in some cases, industry or the
258	   general public) that the RSP can be trusted to influence key local
259	   policy decisions.

261	   Reputations should be based on accurate identifiers, i.e., some
262	   property of the content under analysis that is difficult to falsify.
263	   For example, in the realm of email, the address found in the From:
264	   header field of a message is typically not verifiable, while the
265	   domain name found in a validated domain-level signature is.  In this
266	   case, constructing a reputation system based on the domain name is
267	   more useful than one based on the From: field.

269	   The biggest frustration with most RSPs to date has been the challenge
270	   of dealing with errors: there ofen is no visible, accessible, and
271	   transparent process for remediating the errant addition of an
272	   identifier to a negative reputation list.  An RSP in widespread use
273	   is perceived to have enormous power when its results are used to
274	   reject traffic outright; when a "bad" entry is added referencing a
275	   good actor, it can have destructive effects, so an effective
276	   mechanism to fix such problems needs to exist.

278	   Clients clients with varying sensitivities need to be accomodated.
279	   The mechanism that is used to access the RSP should provide an
280	   ability to request that query results include details about the basis
281	   for producing those results.  This will help the user to decide how
282	   to apply those results.  For example, it should be possible for the
283	   reply to contain:

285	   o  the result itself;

287	   o  the number of data points used to compute the result;

289	   o  the age range of the data;

291	   o  source diversity of the input data;

293	   o  currency of the result (i.e., when it was computed);

295	   o  basis of the result (i.e., which identifier was used).

297	   The systems and algorithms used by the RSP to compute the reported
298	   reputation will need to be hardened as much as practicable against
299	   gaming or other forms of data poisoning.  Larger source diversities
300	   are harder to overcome with poisoned input, but are expensive to
301	   build in terms of both infrastructure and time.

303	   Systems focused on assigning positive reputations rather than
304	   negative ones are promising since positive reputations, if made
305	   difficult to earn, put a large cost on bad actors, which may be
306	   enough to dissuade them entirely.

308	5.  Evolution

310	   Recent consideration of reputation efforts is evolving toward the
311	   identification of good actors rather than bad actors, and giving them
312	   preferential treatment.  This drastically reduces the problem space:
313	   There are vastly more IP addresses and email addresses used by bad
314	   actors to generate problematic traffic than are used by good actors
315	   to generate desirable traffic.

317	   Moreover, good actors tend to be represented by stable names and
318	   addresses, allowing users to rely on these to identify and give
319	   preferential treatment to their traffic.  Good actors have no need to
320	   hop around to different addresses, and already work to keep their
321	   traffic clean.  In addition, good actors are willing and able to
322	   collaborate in the assessment process, such as by supplying validated
323	   identifiers that are associated with their traffic.

325	   This new approach of focusing on identification of good actors has
326	   only been tried to date using manually edited whitelists, but has
327	   shown promising results on that scale.

329	6.  Security Considerations

331	   Several points are raised above that can be described as threats to
332	   the delivery of valid user data.  This document highlights and
333	   discusses those matters, but introduces no new security issues.

335	7.  IANA Considerations

337	   This memo contains no actions for IANA.

339	   [RFC Editor: Please remove this section prior to publication.]

341	8.  Informative References

343	   [DNS]       Mockapetris, P., "Domain Names -- Concepts and
344	               Facilities", RFC 1034, November 1987.

346	   [DNSBL]     Levine, J., "DNS Blacklists and Whitelists", RFC 5782,
347	               February 2010.

349	   [OPENDKIM]  "OpenDKIM (Open Source DKIM)", July 2013,
350	               <http://www.opendkim.org>.

352	   [REPUTE]    Borenstein, N. and M. Kucherawy, "An Architecture for
353	               Reputation Reporting", RFC 7070, November 2013.

355	   [SMTP]      Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
356	               October 2008.

358	Appendix A.  Acknowledgments

360	   The author wishes to acknowledge the following for their review and
361	   constructive criticism of this proposal: Chris Barton, Dave Crocker,
362	   Vincent Schonau

364	Author's Address

366	   Murray S. Kucherawy

368	   EMail: superuser@gmail.com