idnits 2.17.1 

draft-ietf-repute-considerations-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (May 20, 2013) is 3994 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	REPUTE                                                      M. Kucherawy
3	Internet-Draft                                              May 20, 2013
4	Intended status: Informational
5	Expires: November 21, 2013

7	        Operational Considerations Regarding Reputation Services
8	                  draft-ietf-repute-considerations-02

10	Abstract

12	   The use of reputation systems is has become a common tool in many
13	   applications that seek to apply collected intelligence about traffic
14	   sources.  Often this is done because it is common or even expected
15	   operator practice.  It is therefore important to be aware of a number
16	   of considerations for both operators and consumers of the data.  This
17	   document includes a collection of the best advice available regarding
18	   providers and consumers of reputation data, based on experience to
19	   date.  Much of this is based on experience with email reputation
20	   systems, but the concepts are generally applicable.

22	Status of This Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current Internet-
30	   Drafts is at http://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   This Internet-Draft will expire on November 21, 2013.

39	Copyright Notice

41	   Copyright (c) 2013 IETF Trust and the persons identified as the
42	   document authors.  All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents
46	   (http://trustee.ietf.org/license-info) in effect on the date of
47	   publication of this document.  Please review these documents
48	   carefully, as they describe your rights and restrictions with respect
49	   to this document.  Code Components extracted from this document must
50	   include Simplified BSD License text as described in Section 4.e of
51	   the Trust Legal Provisions and are provided without warranty as
52	   described in the Simplified BSD License.

54	Table of Contents

56	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
57	   2.  Background  . . . . . . . . . . . . . . . . . . . . . . . . . . 3
58	   3.  Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
59	   4.  Reputation Clients  . . . . . . . . . . . . . . . . . . . . . . 4
60	   5.  Reputation Service Providers  . . . . . . . . . . . . . . . . . 6
61	   6.  Security Considerations . . . . . . . . . . . . . . . . . . . . 8
62	   7.  IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8
63	   8.  Informative References  . . . . . . . . . . . . . . . . . . . . 8
64	   Appendix A.  Acknowledgements . . . . . . . . . . . . . . . . . . . 8

66	1.  Introduction

68	   Reputation services involve collecting feedback from the community
69	   about sources of Internet traffic and aggregating that feedback into
70	   a rating of some kind.  Common examples include feedback about
71	   traffic associated with specific email addresses, URIs or parts of
72	   URIs, IP addresses, etc.  The specific collection, analysis, and
73	   rating methods vary from one service to the next and one problem
74	   domain to the next, but several operational concepts appear to be
75	   common to all of these.

77	   The promise of the protection that reputation services offers can be
78	   enticing, and many users and operators alike typically engage those
79	   services merely because it is expected of them.  A critical notion,
80	   however, is that doing so explicitly involves a third party in the
81	   flow of data those parties receive.  This is often taken for granted,
82	   with potentially disastrous results.

84	   This document highlights this and other considerations in providing
85	   and consuming reputation data services.

87	2.  Background

89	   The community has historically focused on identifying sources that
90	   misbehave, i.e., that earn negative reputations.  The purpose here is
91	   to identify and filter traffic from bad actors.  This grew out of
92	   operational need.  As the Internet grew, so did the occurence of
93	   problematic traffic, especially in email.  The pragmatics of email
94	   (i.e., the fact that the total IP address space is more constrained
95	   than the total email address space) drove the focus on using IP
96	   addresses as the focus of reputation, in addition to the fact that IP
97	   addresses have a degree of validation (via the TCP/IP infrastructure)
98	   where email addresses have had none.

100	   A specific example of a reputation service in common use in the email
101	   space is the DNS blacklist [DNSBL].  This is a method of querying a
102	   database as to whether a source of incoming [SMTP] email traffic
103	   should be allowed to relay email, based on previous observations and
104	   feedback.  The method uses the IP address of the source as the basis
105	   for a query to the database using the Domain Name System [DNS] as the
106	   interface.  [DNSBL] includes several points in its Security
107	   Considerations document that are repeated and further developed here.

109	   However, regardless of the identifier used as the identifier for a
110	   reputation, bad actors can evade detection or its consequences by
111	   changing identifiers (e.g., move to a new IP address, register a new
112	   domain name, use a sub-domain).  This makes the problem space
113	   effectively boundless, especially as IPv6 rolls out, with its vastly
114	   larger address space.

116	3.  Evolution

118	   More modern thinking is evolving toward the identification of good
119	   actors rather than bad actors, and giving them preferential
120	   treatment.  This drastically reduces the problem space: There are
121	   vastly more IP addresses and email addresses used by bad actors to
122	   generate problematic traffic than are used by good actors to generate
123	   desirable traffic.

125	   Moreover, good actors tend to be represented by stable names and
126	   addresses, allowing users to rely on these to identify and give
127	   preferential treatment to their traffic.  Good actors have no need to
128	   hop around to different addresses, and already work to keep their
129	   traffic clean.  In addition, good actors are willing and able to
130	   collaborate in the assessment process, such as by supplying validated
131	   identifiers that are associated with their traffic.

133	   This new approach of focusing on identification of good actors has
134	   only been tried to date using manually edited whitelists, but has
135	   shown promising results on that scale.

137	4.  Reputation Clients

139	   Operators that choose to make use of reputation services to influence
140	   content allowed to pass into or through their infrastructures need to
141	   understand that they are granting a third party (the reputation
142	   service provider, or RSP) the ability to affect incoming traffic, for
143	   better or worse.  Of course, this is the whole point of engaging an
144	   RSP when everything is working properly, but a number of issues are
145	   worthy of consideration before establishing such a relationship.

147	   Some cases have occurred where an RSP made the unilateral decision to
148	   terminate its service.  To encourage its clients to stop issuing
149	   queries, it began reporting a maximally negative reputation about all
150	   subjects, causing rejection of all incoming traffic during the
151	   incident period.  Although one would hope such incidents to be rare,
152	   automated means to detect such unfortunate returns (malicious or
153	   otherwise) and take remedial should be considered.

155	   RSPs will be the subject of attacks once it is understood that sucess
156	   in doing so will allow malicious content to evade detection and
157	   filtering.  Users of RSPs need to be aware of possible interruptions
158	   in service availability or quality.

160	   Similarly, some actors will try to "game" the service, which is to
161	   say that such actors will attempt to determine patterns of behavior
162	   that result in the reporting of favorable reputations, and in doing
163	   so, acquire artifically inflated reputations.  One could reasonably
164	   assume that a reputation service is inherently fragile.  For
165	   operational clients, this should prompt balanced and comparative,
166	   rather than unilateral, use of the service.

168	   It is suggested that, when engaging an RSP, an operator should try to
169	   learn the following things about the RSP in order to understand the
170	   exposure potential:

172	   o  the RSP's basis for listing or not listing particular subjects;

174	   o  if an RSP is paid by its listees, the rate and criteria for
175	      rejection from being listed;

177	   o  how the RSP collects data about subjects;

179	   o  how many data points are input to the reported reputation;

181	   o  whether reputation is based on a reliable identifier;

183	   o  how the RSP establishes reliability and authenticity of those
184	      data;

186	   o  how continuing data validity is maintained (e.g., on-going
187	      monitoring of the reported data and sources);

189	   o  how actively data validity is tracked (e.g., how changes are
190	      detected);

192	   o  how disputed reputations are handled;

194	   o  how often input data expire;

196	   o  whether older information is more or less influential than newer;

198	   o  whether the reported reputation a scalar, a Boolean value, a
199	      collection of values, or something else;

201	   o  when transitioning among RSPs, the differences between them among
202	      these above points; that is, whether a particular score from one
203	      means the same thing from another.

205	   An operator using an RSP would be wise to ensure it has the
206	   capability to effect local overrides for cases where the client
207	   expects to disagree with the reported reputation.

209	   An operator should be able limit the impact of a negative reputation
210	   on content acceptance.  For example, rather than rejecting content
211	   outright when a negative reputation is returned, simply subject it to
212	   additional (i.e., more thorough) local analysis before permitting the
213	   traffic to pass.  In other words, the reputation may simply allow
214	   certain layers of a multi-layered filtering system to be bypassed
215	   when that reputation is favorable.

217	   A sensible default should apply when the RSP is not available.  This
218	   can also be a query to a different RSP known to be less robust than
219	   the primary one.

221	   Recent proposals have focused on tailoring operation to prefer or
222	   emphasize content whose sources have positive reputations.  As stated
223	   above, negative reputations are easy to shed, while the universe of
224	   things that will earn and maintain positive reputations is relatively
225	   small.  Designing a filtering system that observes these notions is
226	   expected to be more lightweight to operate and harder to game.

228	   One choice is to query and cross-referencing multiple RSPs.  This can
229	   help to detect which ones under comparison are reliable, and offsets
230	   the effect of anomalous replies.

232	5.  Reputation Service Providers

234	   Operators intending to provide a reptuation service need to consider
235	   that there are many flavors of clients.  There will be clients that
236	   are prepared to make use of a reputation service blindly, while
237	   others will be interested in understanding more fully the nature of
238	   the service being provided.  These can be likened to a consumer
239	   credit check that only seeks a yes-or-no reply versus wanting to
240	   review a detailed credit report.  An operator of an RSP should be
241	   prepared to answer as many of the questions identified in Section 4
242	   as possible, not only because wise clients will ask, but also because
243	   they reflect issues that have arisen over the years, and exploration
244	   of the points they raise will result in a more robust reputation
245	   service.

247	   Obviously, in computing reputations via traffic analysis, some
248	   private algorithms may come into play.  For some RSPs, such "secret
249	   sauce" comprises their competitive advantage over others in the same
250	   space.  This document is not suggesting that all private algorithms
251	   need to be exposed for a reputation service to be acceptable.
252	   Instead, it is anticipated that enough of the above details need to
253	   be available to ensure consumers (and in some cases, industry or the
254	   general public) that the RSP can be trusted to influence key local
255	   policy decisions.

257	   Reptuations should be based on accurate identifiers, i.e., some
258	   property of the content under analysis that is difficult to falsify.
259	   For example, in the realm of email, the address found in the From:
260	   header field of a message is typically not verifiable, while the
261	   domain name found in a validated domain-level signature is.  In this
262	   case, constructing a reputation system based on the domain name is
263	   more useful than one based on the From: field.

265	   The biggest frustration with most RSPs to date has been the absence
266	   of a visible, accessible, and transparent process for remediating the
267	   errant addition of an identifier to a negative reputation list.  An
268	   RSP in widespread use is perceived to have enormous power when its
269	   results are used to reject traffic outright; when a "bad" entry is
270	   added referencing a good actor, it can have destructive effects, so
271	   an effective mechanism to fix such problems needs to exist.

273	   To accommodate clients with varying sensitivities, it is advisable
274	   for the query mechanism used to access the RSP to provide the ability
275	   to request details in the returned result about how the result was
276	   reached, allowing the client to decide if the result should be
277	   applied.  For example, it shoudl be possible for the reply to
278	   contain:

280	   o  the result itself;

282	   o  the number of data points used to compute the result;

284	   o  the age range of the data;

286	   o  source diversity of the input data;

288	   o  currency of the result (i.e., when it was computed);

290	   o  basis of the result (i.e., which identifier was used).

292	   The systems and algorithms used by the RSP to compute the reported
293	   reputation will need to be hardened as much as practicable against
294	   gaming or other forms of data poisoning.  Larger source diversities
295	   are harder to overcome with poisoned input, but are expensive to
296	   build in terms of both infrastructure and time.

298	   Systems focused on assigning positive reputations rather than negtive
299	   ones are promising since positive reputations, if made difficult to
300	   earn, put a large cost on bad actors, which may be enough to dissuade
301	   them entirely.

303	6.  Security Considerations

305	   Several points are raised above that can be described as threats to
306	   the delivery of valid user data.  This document highlights and
307	   discusses those matters, but introduces no new security issues.

309	7.  IANA Considerations

311	   This memo contains no actions for IANA.

313	   [RFC Editor: Please remove this section prior to publication.]

315	8.  Informative References

317	   [DNS]    Mockapetris, P., "Domain Names -- Concepts and Facilities",
318	            RFC 1034, November 1987.

320	   [DNSBL]  Levine, J., "DNS Blacklists and Whitelists", RFC 5782,
321	            February 2010.

323	   [SMTP]   Klensin, J., "Simple Mail Transfer Protocol", RFC 5321,
324	            October 2008.

326	Appendix A.  Acknowledgements

328	   The author wishes to acknowledge the following for their review and
329	   constructive criticism of this proposal: Chris Barton, Vincent
330	   Schonau

332	Author's Address

334	   Murray S. Kucherawy

336	   EMail: superuser@gmail.com