idnits 2.17.1 

draft-iab-privacy-considerations-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 1339 has weird spacing: '... states  on th...'

  == Line 1340 has weird spacing: '...cessing  of...'

  -- The document date (January 12, 2013) is 4122 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  -- Obsolete informational reference (is this intentional?): RFC 2616
     (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235)

  -- Obsolete informational reference (is this intentional?): RFC 4282
     (Obsoleted by RFC 7542)

  -- Obsolete informational reference (is this intentional?): RFC 5077
     (Obsoleted by RFC 8446)

  -- Obsolete informational reference (is this intentional?): RFC 5246
     (Obsoleted by RFC 8446)


     Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                          A. Cooper
3	Internet-Draft                                                       CDT
4	Intended status: Informational                             H. Tschofenig
5	Expires: July 16, 2013                            Nokia Siemens Networks
6	                                                                B. Aboba
7	                                                   Microsoft Corporation
8	                                                             J. Peterson
9	                                                           NeuStar, Inc.
10	                                                               J. Morris

12	                                                               M. Hansen
13	                                                                ULD Kiel
14	                                                                R. Smith
15	                                                               JANET(UK)
16	                                                        January 12, 2013

18	             Privacy Considerations for Internet Protocols
19	                draft-iab-privacy-considerations-06.txt

21	Abstract

23	   This document offers guidance for developing privacy considerations
24	   for inclusion in protocol specifications.  It aims to make protocol
25	   designers aware of privacy-related design choices.  It suggests that
26	   whether any individual RFC warrants a specific privacy considerations
27	   section will depend on the document's content.

29	   Discussion of this document is taking place on the IETF Privacy
30	   Discussion mailing list (see
31	   https://www.ietf.org/mailman/listinfo/ietf-privacy).

33	Status of this Memo

35	   This Internet-Draft is submitted in full conformance with the
36	   provisions of BCP 78 and BCP 79.

38	   Internet-Drafts are working documents of the Internet Engineering
39	   Task Force (IETF).  Note that other groups may also distribute
40	   working documents as Internet-Drafts.  The list of current Internet-
41	   Drafts is at http://datatracker.ietf.org/drafts/current/.

43	   Internet-Drafts are draft documents valid for a maximum of six months
44	   and may be updated, replaced, or obsoleted by other documents at any
45	   time.  It is inappropriate to use Internet-Drafts as reference
46	   material or to cite them other than as "work in progress."

48	   This Internet-Draft will expire on July 16, 2013.

50	Copyright Notice

52	   Copyright (c) 2013 IETF Trust and the persons identified as the
53	   document authors.  All rights reserved.

55	   This document is subject to BCP 78 and the IETF Trust's Legal
56	   Provisions Relating to IETF Documents
57	   (http://trustee.ietf.org/license-info) in effect on the date of
58	   publication of this document.  Please review these documents
59	   carefully, as they describe your rights and restrictions with respect
60	   to this document.  Code Components extracted from this document must
61	   include Simplified BSD License text as described in Section 4.e of
62	   the Trust Legal Provisions and are provided without warranty as
63	   described in the Simplified BSD License.

65	Table of Contents

67	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
68	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  6
69	     2.1.  Entities . . . . . . . . . . . . . . . . . . . . . . . . .  6
70	     2.2.  Data and Analysis  . . . . . . . . . . . . . . . . . . . .  7
71	     2.3.  Identifiability  . . . . . . . . . . . . . . . . . . . . .  7
72	   3.  Communications Model . . . . . . . . . . . . . . . . . . . . . 10
73	   4.  Privacy Threats  . . . . . . . . . . . . . . . . . . . . . . . 12
74	     4.1.  Combined Security-Privacy Threats  . . . . . . . . . . . . 12
75	       4.1.1.  Surveillance . . . . . . . . . . . . . . . . . . . . . 12
76	       4.1.2.  Stored Data Compromise . . . . . . . . . . . . . . . . 13
77	       4.1.3.  Intrusion  . . . . . . . . . . . . . . . . . . . . . . 13
78	       4.1.4.  Misattribution . . . . . . . . . . . . . . . . . . . . 13
79	     4.2.  Privacy-Specific Threats . . . . . . . . . . . . . . . . . 14
80	       4.2.1.  Correlation  . . . . . . . . . . . . . . . . . . . . . 14
81	       4.2.2.  Identification . . . . . . . . . . . . . . . . . . . . 15
82	       4.2.3.  Secondary Use  . . . . . . . . . . . . . . . . . . . . 15
83	       4.2.4.  Disclosure . . . . . . . . . . . . . . . . . . . . . . 16
84	       4.2.5.  Exclusion  . . . . . . . . . . . . . . . . . . . . . . 16
85	   5.  Threat Mitigations . . . . . . . . . . . . . . . . . . . . . . 18
86	     5.1.  Data Minimization  . . . . . . . . . . . . . . . . . . . . 18
87	       5.1.1.  Anonymity  . . . . . . . . . . . . . . . . . . . . . . 19
88	       5.1.2.  Pseudonymity . . . . . . . . . . . . . . . . . . . . . 19
89	       5.1.3.  Identity Confidentiality . . . . . . . . . . . . . . . 20
90	       5.1.4.  Data Minimization within Identity Management . . . . . 20
91	     5.2.  User Participation . . . . . . . . . . . . . . . . . . . . 21
92	     5.3.  Security . . . . . . . . . . . . . . . . . . . . . . . . . 22
93	   6.  Scope  . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
94	   7.  Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 25
95	     7.1.  Data Minimization  . . . . . . . . . . . . . . . . . . . . 25
96	     7.2.  User Participation . . . . . . . . . . . . . . . . . . . . 26
97	     7.3.  Security . . . . . . . . . . . . . . . . . . . . . . . . . 27
98	     7.4.  General  . . . . . . . . . . . . . . . . . . . . . . . . . 27
99	   8.  Example  . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
100	   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 33
101	   10. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 34
102	   11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 35
103	   12. IAB Members at the Time of Approval  . . . . . . . . . . . . . 36
104	   13. Informative References . . . . . . . . . . . . . . . . . . . . 37
105	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 40

107	1.  Introduction

109	   [RFC3552] provides detailed guidance to protocol designers about both
110	   how to consider security as part of protocol design and how to inform
111	   readers of protocol specifications about security issues.  This
112	   document intends to provide a similar set of guidance for considering
113	   privacy in protocol design.

115	   Privacy is a complicated concept with a rich history that spans many
116	   disciplines.  With regard to data, often it is a concept applied to
117	   "personal data," information relating to an identified or
118	   identifiable individual.  Many sets of privacy principles and privacy
119	   design frameworks have been developed in different forums over the
120	   years.  These include the Fair Information Practices [FIPs], a
121	   baseline set of privacy protections pertaining to the collection and
122	   use of personal data (often based on the principles established in
123	   [OECD], for example), and the Privacy by Design concept, which
124	   provides high-level privacy guidance for systems design (see [PbD]
125	   for one example).  The guidance provided in this document is inspired
126	   by this prior work, but it aims to be more concrete, pointing
127	   protocol designers to specific engineering choices that can impact
128	   the privacy of the individuals that make use of Internet protocols.

130	   Different people have radically different conceptions of what privacy
131	   means, both in general, and as it relates to them personally
132	   [Westin].  Furthermore, privacy as a legal concept is understood
133	   differently in different jurisdictions.  The guidance provided in
134	   this document is generic and can be used to inform the design of any
135	   protocol to be used anywhere in the world, without reference to
136	   specific legal frameworks.

138	   Whether any individual document warrants a specific privacy
139	   considerations section will depend on the document's content.
140	   Documents whose entire focus is privacy may not merit a separate
141	   section (for example, "Private Extensions to the Session Initiation
142	   Protocol (SIP) for Asserted Identity within Trusted Networks"
143	   [RFC3325]).  For certain specifications, privacy considerations are a
144	   subset of security considerations and can be discussed explicitly in
145	   the security considerations section.  Some documents will not require
146	   discussion of privacy considerations (for example, "Definition of the
147	   Opus Audio Codec" [RFC6716]).  The guidance provided here can and
148	   should be used to assess the privacy considerations of protocol,
149	   architectural, and operational specifications and to decide whether
150	   those considerations are to be documented in a stand-alone section,
151	   within the security considerations section, or throughout the
152	   document.

154	   This document is organized as follows.  Section 2 explains the
155	   terminology used in this document.  Section 3 reviews typical
156	   communications architectures to understand at which points there may
157	   be privacy threats.  Section 4 discusses threats to privacy as they
158	   apply to Internet protocols.  Section 5 outlines mitigations of those
159	   threats.  Section 6 describes the extent to which the guidance
160	   offered is applicable within the IETF.  Section 7 provides the
161	   guidelines for analyzing and documenting privacy considerations
162	   within IETF specifications.  Section 8 examines the privacy
163	   characteristics of an IETF protocol to demonstrate the use of the
164	   guidance framework.

166	2.  Terminology

168	   This section defines basic terms used in this document, with
169	   references to pre-existing definitions as appropriate.  As in
170	   [RFC4949], each entry is preceded by a dollar sign ($) and a space
171	   for automated searching.  Note that this document does not try to
172	   attempt to define the term 'privacy' itself.  Instead privacy is the
173	   sum of what is contained in this document.  We therefore follow the
174	   approach taken by [RFC3552].

176	2.1.  Entities

178	   Several of these terms are further elaborated in Section 3.

180	   $ Attacker:   An entity that intentionally works against some
181	      protection goal.

183	   $ Eavesdropper:   A type of attacker that passively observes an
184	      initiator's communications without the initiator's knowledge or
185	      authorization.  See [RFC4949].

187	   $ Enabler:   A protocol entity that facilitates communication between
188	      an initiator and a recipient without being directly in the
189	      communications path.

191	   $ Individual:   A human being.

193	   $ Initiator:   A protocol entity that initiates communications with a
194	      recipient.

196	   $ Intermediary:   A protocol entity that sits between the initiator
197	      and the recipient and is necessary for the initiator and recipient
198	      to communicate.  Unlike an eavesdropper, an intermediary is an
199	      entity that is part of the communication architecture.  For
200	      example, a SIP proxy is an intermediary in the SIP architecture.

202	   $ Observer:   An entity that is able to observe and collect
203	      information from communications, potentially posing privacy
204	      threats depending on the context.  As defined in this document,
205	      initiators, recipients, intermediaries, and enablers can all be
206	      observers.  Observers are distinguished from eavesdroppers by
207	      being at least tacitly authorized.

209	   $ Recipient:   A protocol entity that receives communications from an
210	      initiator.

212	2.2.  Data and Analysis

214	   $ Correlation:   The combination of various pieces of information
215	      relating to an individual.

217	   $ Fingerprint:   A set of information elements that identifies a
218	      device or application instance.

220	   $ Fingerprinting:   The process of an observer or attacker uniquely
221	      identifying (with a sufficiently high probability) a device or
222	      application instance based on multiple information elements
223	      communicated to the observer or attacker.  See [EFF].

225	   $ Item of Interest (IOI):   Any data item that an observer or
226	      attacker might be interested in.  This includes attributes,
227	      identifiers, identities, communications content, and the fact that
228	      a communication interaction has taken place.

230	   $ Personal Data:   Any information relating to an individual who can
231	      be identified, directly or indirectly.

233	   $ (Protocol) Interaction:   A unit of communication within a
234	      particular protocol.  A single interaction may be compromised of a
235	      single message between an initiator and recipient or multiple
236	      messages, depending on the protocol.

238	   $ Traffic Analysis:   The inference of information from observation
239	      of traffic flows (presence, absence, amount, direction, and
240	      frequency).  See [RFC4949].

242	   $ Undetectability:   The inability of an observer or attacker to
243	      sufficiently distinguish whether an item of interest exists or
244	      not.

246	   $ Unlinkability:   Within a particular set of information, the
247	      inability of an observer or attacker to distinguish whether two
248	      items of interest are related or not (with a high enough degree of
249	      probability to be useful to the observer or attacker).

251	2.3.  Identifiability

253	   $ Anonymity:   The state of being anonymous.

255	   $ Anonymity Set:   A set of individuals that have the same
256	      attributes, making them indistinguishable from each other from the
257	      perspective of a particular attacker or observer.

259	   $ Anonymous:   A state of an individual in which an observer or
260	      attacker cannot identify the individual within a set of other
261	      individuals (the anonymity set).

263	   $ Attribute:   A property of an individual.

265	   $ Identifiable:   A property in which an individual's identity is
266	      capable of being known to an observer or attacker.

268	   $ Identifiability:   The extent to which an individual is
269	      identifiable.

271	   $ Identified:   A state in which an individual's identity is known.

273	   $ Identifier:   A data object uniquely referring to a specific
274	      identity of a protocol entity or individual in some context.  See
275	      [RFC4949].  Identifiers can be based upon natural names --
276	      official names, personal names, and/or nicknames -- or can be
277	      artificial (for example, x9z32vb).  However, identifiers are by
278	      definition unique within their context of use, while natural names
279	      are often not unique.

281	   $ Identification:   The linking of information to a particular
282	      individual to infer the individual's identity or to allow the
283	      inference of the individual's identity in some context.

285	   $ Identity:   Any subset of an individual's attributes, including
286	      names, that identifies the individual within a given context.
287	      Individuals usually have multiple identities for use in different
288	      contexts.

290	   $ Identity Confidentiality:   A property of an individual wherein any
291	      party other than the recipient cannot sufficiently identify the
292	      individual within a set of other individuals (the anonymity set).
293	      This is a desirable property of authentication protocols.

295	   $ Identity Provider:   An entity (usually an organization) that is
296	      responsible for establishing, maintaining, securing, and vouching
297	      for the identity associated with individuals.

299	   $ Official Name:   A personal name for an individual which is
300	      registered in some official context.  For example, the name on an
301	      individual's birth certificate.

303	   $ Personal Name:   A natural name for an individual.  Personal names
304	      are often not unique, and often comprise given names in
305	      combination with a family name.  An individual may have multiple
306	      personal names at any time and over a lifetime, including official
307	      names.  From a technological perspective, it cannot always be
308	      determined whether a given reference to an individual is, or is
309	      based upon, the individual's personal name(s) (see Pseudonym).

311	   $ Pseudonym:   A name assumed by an individual in some context,
312	      unrelated to the individual's personal names known by others in
313	      that context, with an intent of not revealing the individual's
314	      identities associated with her other names.

316	   $ Pseudonymity:   The state of being pseudonymous.

318	   $ Pseudonymous:   A property of an individual in which the individual
319	      is identified by a pseudonym.

321	   $ Real name:   See personal name and official name.

323	   $ Relying party:   An entity that relies on assertions of
324	      individuals' identities from identity providers in order to
325	      provide services to individuals.  In effect, the relying party
326	      delegates aspects of identity management to the identity
327	      provider(s).  Such delegation requires protocol exchanges, trust,
328	      and a common understanding of semantics of information exchanged
329	      between the relying party and the identity provider.

331	3.  Communications Model

333	   To understand attacks in the privacy-harm sense, it is helpful to
334	   consider the overall communication architecture and different actors'
335	   roles within it.  Consider a protocol entity, the "initiator", that
336	   initiates communication with some recipient.  Privacy analysis is
337	   most relevant for protocols with use cases in which the initiator
338	   acts on behalf of an individual (or different individuals at
339	   different times).  It is this individual whose privacy is potentially
340	   threatened.

342	   Communications may be direct between the initiator and the recipient,
343	   or they may involve an application-layer intermediary (such as a
344	   proxy or cache) that is necessary for the two parties to communicate.
345	   In some cases this intermediary stays in the communication path for
346	   the entire duration of the communication and sometimes it is only
347	   used for communication establishment, for either inbound or outbound
348	   communication.  In rare cases there may be a series of intermediaries
349	   that are traversed.  At lower layers, additional entities are
350	   involved in packet forwarding that may interfere with privacy
351	   protection goals as well.

353	   Some communications tasks require multiple protocol interactions with
354	   different entities.  For example, a request to an HTTP server may be
355	   preceded by an interaction between the initiator and an
356	   Authentication, Authorization, and Accounting (AAA) server for
357	   network access and to a DNS server for name resolution.  In this
358	   case, the HTTP server is the recipient and the other entities are
359	   enablers of the initiator-to-recipient communication.  Similarly, a
360	   single communication with the recipient might generate further
361	   protocol interactions between either the initiator or the recipient
362	   and other entities, and the roles of the entities might change with
363	   each interaction.  For example, an HTTP request might trigger
364	   interactions with an authentication server or with other resource
365	   servers wherein the recipient becomes an initiator in those later
366	   interactions.

368	   Thus, when conducting privacy analysis of an architecture that
369	   involves multiple communications phases, the entities involved may
370	   take on different -- or opposing -- roles from a privacy
371	   considerations perspective in each phase.  Understanding the privacy
372	   implications of the architecture as a whole may require a separate
373	   analysis of each phase.

375	   Protocol design is often predicated on the notion that recipients,
376	   intermediaries, and enablers are assumed to be authorized to receive
377	   and handle data from initiators.  As [RFC3552] explains, "we assume
378	   that the end-systems engaging in a protocol exchange have not
379	   themselves been compromised."  However, by its nature privacy
380	   analysis requires questioning this assumption since systems are often
381	   compromised for the purpose of obtaining personal data.

383	   Although recipients, intermediaries, and enablers may not generally
384	   be considered as attackers, they may all pose privacy threats
385	   (depending on the context) because they are able to observe, collect,
386	   process, and transfer privacy-relevant data.  These entities are
387	   collectively described below as "observers" to distinguish them from
388	   traditional attackers.  From a privacy perspective, one important
389	   type of attacker is an eavesdropper: an entity that passively
390	   observes the initiator's communications without the initiator's
391	   knowledge or authorization.

393	   The threat descriptions in the next section explain how observers and
394	   attackers might act to harm individuals' privacy.  Different kinds of
395	   attacks may be feasible at different points in the communications
396	   path.  For example, an observer could mount surveillance or
397	   identification attacks between the initiator and intermediary, or
398	   instead could surveil an enabler (e.g., by observing DNS queries from
399	   the initiator).

401	4.  Privacy Threats

403	   Privacy harms come in a number of forms, including harms to financial
404	   standing, reputation, solitude, autonomy, and safety.  A victim of
405	   identity theft or blackmail, for example, may suffer a financial loss
406	   as a result.  Reputational harm can occur when disclosure of
407	   information about an individual, whether true or false, subjects that
408	   individual to stigma, embarrassment, or loss of personal dignity.
409	   Intrusion or interruption of an individual's life or activities can
410	   harm the individual's ability to be left alone.  When individuals or
411	   their activities are monitored, exposed, or at risk of exposure,
412	   those individuals may be stifled from expressing themselves,
413	   associating with others, and generally conducting their lives freely.
414	   They may also feel a general sense of unease, in that it is "creepy"
415	   to be monitored or to have data collected about them.  In cases where
416	   such monitoring is for the purpose of stalking or violence (for
417	   example, monitoring communications to or from a domestic abuse
418	   shelter), it can put individuals in physical danger.

420	   This section lists common privacy threats (drawing liberally from
421	   [Solove], as well as [CoE]), showing how each of them may cause
422	   individuals to incur privacy harms and providing examples of how
423	   these threats can exist on the Internet.

425	   Some privacy threats are already considered in IETF protocols as a
426	   matter of routine security analysis.  Others are more pure privacy
427	   threats that existing security considerations do not usually address.
428	   The threats described here are divided into those that may also be
429	   considered security threats and those that are primarily privacy
430	   threats.

432	   Note that an individual's awareness of and consent to the practices
433	   described below can greatly affect the extent to which they threaten
434	   privacy.  If an individual authorizes surveillance of his own
435	   activities, for example, the harms associated with it may be
436	   mitigated, or the individual may accept the risk of harm.

438	4.1.  Combined Security-Privacy Threats

440	4.1.1.  Surveillance

442	   Surveillance is the observation or monitoring of an individual's
443	   communications or activities.  The effects of surveillance on the
444	   individual can range from anxiety and discomfort to behavioral
445	   changes such as inhibition and self-censorship to the perpetration of
446	   violence against the individual.  The individual need not be aware of
447	   the surveillance for it to impact privacy -- the possibility of
448	   surveillance may be enough to harm individual autonomy.

450	   Surveillance can be conducted by observers or eavesdroppers at any
451	   point along the communications path.  Confidentiality protections (as
452	   discussed in [RFC3552] Section 3) are necessary to prevent
453	   surveillance of the content of communications.  To prevent traffic
454	   analysis or other surveillance of communications patterns, other
455	   measures may be necessary, such as [Tor].

457	4.1.2.  Stored Data Compromise

459	   End systems that do not take adequate measures to secure stored data
460	   from unauthorized or inappropriate access expose individuals to
461	   potential financial, reputational, or physical harm.

463	   Protecting against stored data compromise is typically outside the
464	   scope of IETF protocols.  However, a number of common protocol
465	   functions -- key management, access control, or operational logging,
466	   for example -- require the storage of data about initiators of
467	   communications.  When requiring or recommending that information
468	   about initiators or their communications be stored or logged by end
469	   systems (see, e.g., RFC 6302 [RFC6302]), it is important to recognize
470	   the potential for that information to be compromised and for that
471	   potential to be weighed against the benefits of data storage.  Any
472	   recipient, intermediary, or enabler that stores data may be
473	   vulnerable to compromise.

475	4.1.3.  Intrusion

477	   Intrusion consists of invasive acts that disturb or interrupt one's
478	   life or activities.  Intrusion can thwart individuals' desires to be
479	   left alone, sap their time or attention, or interrupt their
480	   activities.  This threat is focused on intrusion into one's life
481	   rather than direct intrusion into one's communications.  The latter
482	   is captured in Section 4.1.1.

484	   Unsolicited messages and denial-of-service attacks are the most
485	   common types of intrusion on the Internet.  Intrusion can be
486	   perpetrated by any attacker that is capable of sending unwanted
487	   traffic to the initiator.

489	4.1.4.  Misattribution

491	   Misattribution occurs when data or communications related to one
492	   individual are attributed to another.  Misattribution can result in
493	   adverse reputational, financial, or other consequences for
494	   individuals that are misidentified.

496	   Misattribution in the protocol context comes as a result of using
497	   inadequate or insecure forms of identity or authentication.  For
498	   example, as [RFC6269] notes, abuse mitigation is often conducted on
499	   the basis of source IP address, such that connections from individual
500	   IP addresses may be prevented or temporarily blacklisted if abusive
501	   activity is determined to be sourced from those addresses.  However,
502	   in the case where a single IP address is shared by multiple
503	   individuals, those penalties may be suffered by all individuals
504	   sharing the address, even if they were not involved in the abuse.
505	   This threat can be mitigated by using identity management mechanisms
506	   with proper forms of authentication (ideally with cryptographic
507	   properties) so that actions can be attributed uniquely to an
508	   individual to provide the basis for accountability without generating
509	   false-positives.

511	4.2.  Privacy-Specific Threats

513	4.2.1.  Correlation

515	   Correlation is the combination of various pieces of information
516	   related to an individual.  Correlation can defy people's expectations
517	   of the limits of what others know about them.  It can increase the
518	   power that those doing the correlating have over individuals as well
519	   as correlators' ability to pass judgment, threatening individual
520	   autonomy and reputation.

522	   Correlation is closely related to identification.  Internet protocols
523	   can facilitate correlation by allowing individuals' activities to be
524	   tracked and combined over time.  The use of persistent or
525	   infrequently replaced identifiers at any layer of the stack can
526	   facilitate correlation.  For example, an initiator's persistent use
527	   of the same device ID, certificate, or email address across multiple
528	   interactions could allow recipients (and observers) to correlate all
529	   of the initiator's communications over time.

531	   As an example, consider Transport Layer Security (TLS) session
532	   resumption [RFC5246] or TLS session resumption without server side
533	   state [RFC5077].  In RFC 5246 [RFC5246] a server provides the client
534	   with a session_id in the ServerHello message and caches the
535	   master_secret for later exchanges.  When the client initiates a new
536	   connection with the server it re-uses the previously obtained
537	   session_id in its ClientHello message.  The server agrees to resume
538	   the session by using the same session_id and the previously stored
539	   master_secret for the generation of the TLS Record Layer security
540	   association.  RFC 5077 [RFC5077] borrows from the session resumption
541	   design idea but the server encapsulates all state information into a
542	   ticket instead of caching it.  An attacker who is able to observe the
543	   protocol exchanges between the TLS client and the TLS server is able
544	   to link the initial exchange to subsequently resumed TLS sessions
545	   when the session_id and the ticket are exchanged in the clear (which
546	   is the case with data exchanged in the initial handshake messages).

548	   In theory any observer or attacker that receives an initiator's
549	   communications can engage in correlation.  The extent of the
550	   potential for correlation will depend on what data the entity
551	   receives from the initiator and has access to otherwise.  Often,
552	   intermediaries only require a small amount of information for message
553	   routing and/or security.  In theory, protocol mechanisms could ensure
554	   that end-to-end information is not made accessible to these entities,
555	   but in practice the difficulty of deploying end-to-end security
556	   procedures, additional messaging or computational overhead, and other
557	   business or legal requirements often slow or prevent the deployment
558	   of end-to-end security mechanisms, giving intermediaries greater
559	   exposure to initiators' data than is strictly necessary from a
560	   technical point of view.

562	4.2.2.  Identification

564	   Identification is the linking of information to a particular
565	   individual.  In some contexts it is perfectly legitimate to identify
566	   individuals, whereas in others identification may potentially stifle
567	   individuals' activities or expression by inhibiting their ability to
568	   be anonymous or pseudonymous.  Identification also makes it easier
569	   for individuals to be explicitly controlled by others (e.g.,
570	   governments) and to be treated differentially compared to other
571	   individuals.

573	   Many protocols provide functionality to convey the idea that some
574	   means has been provided to guarantee that entities are who they claim
575	   to be.  Often, this is accomplished with cryptographic
576	   authentication.  Furthermore, many protocol identifiers, such as
577	   those used in SIP or XMPP, may allow for the direct identification of
578	   individuals.  Protocol identifiers may also contribute indirectly to
579	   identification via correlation.  For example, a web site that does
580	   not directly authenticate users may be able to match its HTTP header
581	   logs with logs from another site that does authenticate users,
582	   rendering users on the first site identifiable.

584	   As with correlation, any observer or attacker may be able to engage
585	   in identification depending on the information about the initiator
586	   that is available via the protocol mechanism or other channels.

588	4.2.3.  Secondary Use

590	   Secondary use is the use of collected information without the
591	   individual's consent for a purpose different from that for which the
592	   information was collected.  Secondary use may violate people's
593	   expectations or desires.  The potential for secondary use can
594	   generate uncertainty over how one's information will be used in the
595	   future, potentially discouraging information exchange in the first
596	   place.

598	   One example of secondary use would be an authentication server that
599	   uses a network access server's Access-Requests to track an
600	   initiator's location.  Any observer or attacker could potentially
601	   make unwanted secondary uses of initiators' data.  Protecting against
602	   secondary use is typically outside the scope of IETF protocols.

604	4.2.4.  Disclosure

606	   Disclosure is the revelation of information about an individual that
607	   affects the way others judge the individual.  Disclosure can violate
608	   individuals' expectations of the confidentiality of the data they
609	   share.  The threat of disclosure may deter people from engaging in
610	   certain activities for fear of reputational harm, or simply because
611	   they do not wish to be observed.

613	   Any observer or attacker that receives data about an initiator may
614	   engage in disclosure.  Sometimes disclosure is unintentional because
615	   system designers do not realize that information being exchanged
616	   relates to individuals.  The most common way for protocols to limit
617	   disclosure is by providing access control mechanisms (discussed in
618	   Section 4.2.5).  A further example is provided by the IETF
619	   geolocation privacy architecture [RFC6280], which supports a way for
620	   users to express a preference that their location information not be
621	   disclosed beyond the intended recipient.

623	4.2.5.  Exclusion

625	   Exclusion is the failure to allow individuals to know about the data
626	   that others have about them and to participate in its handling and
627	   use.  Exclusion reduces accountability on the part of entities that
628	   maintain information about people and creates a sense of
629	   vulnerability about individuals' ability to control how information
630	   about them is collected and used.

632	   The most common way for Internet protocols to be involved in
633	   enforcing exclusion is through access control mechanisms.  The
634	   presence architecture developed in the IETF is a good example where
635	   individuals are included in the control of information about them.
636	   Using a rules expression language (e.g., Presence Authorization Rules
637	   [RFC5025]), presence clients can authorize the specific conditions
638	   under which their presence information may be shared.

640	   Exclusion is primarily considered problematic when the recipient
641	   fails to involve the initiator in decisions about data collection,
642	   handling, and use.  Eavesdroppers engage in exclusion by their very
643	   nature since their data collection and handling practices are covert.

645	5.  Threat Mitigations

647	   Privacy is notoriously difficult to measure and quantify.  The extent
648	   to which a particular protocol, system, or architecture "protects" or
649	   "enhances" privacy is dependent on a large number of factors relating
650	   to its design, use, and potential misuse.  However, there are certain
651	   widely recognized classes of mitigations against the threats
652	   discussed in Section 4.  This section describes three categories of
653	   relevant mitigations: (1) data minimization, (2) user participation,
654	   and (3) security.  The privacy mitigations described in this chapter
655	   can loosely be mapped to existing privacy principles, such as the
656	   Fair Information Practices, but they have been adapted to fit the
657	   target audience of this document.

659	5.1.  Data Minimization

661	   Data minimization refers to collecting, using, disclosing, and
662	   storing the minimal data necessary to perform a task.  The less data
663	   about individuals that gets exchanged in the first place, the lower
664	   the chances of that data being misused or leaked.

666	   Data minimization can be effectuated in a number of different ways,
667	   including by limiting collection, use, disclosure, retention,
668	   identifiability, sensitivity, and access to personal data.  Limiting
669	   the data collected by protocol elements only to what is necessary
670	   (collection limitation) is the most straightforward way to help
671	   reduce privacy risks associated with the use of the protocol.  In
672	   some cases, protocol designers may also be able to recommend limits
673	   to the use or retention of data, although protocols themselves are
674	   not often capable of controlling these properties.

676	   However, the most direct application of data minimization to protocol
677	   design is limiting identifiability.  Reducing the identifiability of
678	   data by using pseudonyms or no identifiers at all helps to weaken the
679	   link between an individual and his or her communications.  Allowing
680	   for the periodic creation of new identifiers reduces the possibility
681	   that multiple protocol interactions or communications can be
682	   correlated back to the same individual.  The following sections
683	   explore a number of different properties related to identifiability
684	   that protocol designers may seek to achieve.

686	   Data minimization mitigates the following threats: surveillance,
687	   stored data compromise, correlation, identification, secondary use,
688	   disclosure.

690	5.1.1.  Anonymity

692	   To enable anonymity of an individual, there must exist a set of
693	   individuals with potentially the same attributes.  To the attacker or
694	   the observer these individuals must appear indistinguishable from
695	   each other.  The set of all such individuals is known as the
696	   anonymity set and membership of this set may vary over time.

698	   The composition of the anonymity set depends on the knowledge of the
699	   observer or attacker.  Thus anonymity is relative with respect to the
700	   observer or attacker.  An initiator may be anonymous only within a
701	   set of potential initiators -- its initiator anonymity set -- which
702	   itself may be a subset of all individuals that may initiate
703	   communications.  Conversely, a recipient may be anonymous only within
704	   a set of potential recipients -- its recipient anonymity set.  Both
705	   anonymity sets may be disjoint, may overlap, or may be the same.

707	   As an example, consider RFC 3325 (P-Asserted-Identity, PAI)
708	   [RFC3325], an extension for the Session Initiation Protocol (SIP),
709	   that allows an individual, such as a VoIP caller, to instruct an
710	   intermediary that he or she trusts not to populate the SIP From
711	   header field with the individual's authenticated and verified
712	   identity.  The recipient of the call, as well as any other entity
713	   outside of the individual's trust domain, would therefore only learn
714	   that the SIP message (typically a SIP INVITE) was sent with a header
715	   field 'From: "Anonymous" <sip:anonymous@anonymous.invalid>' rather
716	   than the individual's address-of-record, which is typically thought
717	   of as the "public address" of the user.  When PAI is used, the
718	   individual becomes anonymous within the initiator anonymity set that
719	   is populated by every individual making use of that specific
720	   intermediary.

722	   Note that this example ignores the fact that the recipient may infer
723	   or obtain personal data from the other SIP protocol payloads (e.g.,
724	   SIP Via and Contact headers, SDP).  The implication is that PAI only
725	   attempts to address a particular threat, namely the disclosure of
726	   identity in the From header) with respect to the recipient.  This
727	   caveat makes the analysis of the specific protocol extension easier
728	   but cannot be assumed when conducting analysis of an entire
729	   architecture.

731	5.1.2.  Pseudonymity

733	   In the context of Internet protocols, almost all identifiers can be
734	   nicknames or pseudonyms since there is typically no requirement to
735	   use personal names in protocols.  However, in certain scenarios it is
736	   reasonable to assume that personal names will be used (with vCard
737	   [RFC6350], for example).

739	   Pseudonymity is strengthened when less personal data can be linked to
740	   the pseudonym; when the same pseudonym is used less often and across
741	   fewer contexts; and when independently chosen pseudonyms are more
742	   frequently used for new actions (making them, from an observer's or
743	   attacker's perspective, unlinkable).

745	   For Internet protocols it is important whether protocols allow
746	   pseudonyms to be changed without human interaction, the default
747	   length of pseudonym lifetimes, to whom pseudonyms are exposed, how
748	   individuals are able to control disclosure, how often pseudonyms can
749	   be changed, and the consequences of changing them.

751	5.1.3.  Identity Confidentiality

753	   An initiator has identity confidentiality when any party other than
754	   the recipient cannot sufficiently identify the initiator within the
755	   anonymity set.  The size of the anonymity set has a direct impact on
756	   identity confidentiality since the smaller the set is, the easier it
757	   is to identify the initiator.  Identity confidentiality aims to
758	   provide a protection against eavesdroppers and intermediaries rather
759	   than the intended communication end points.

761	   As an example, consider the network access authentication procedures
762	   utilizing the Extensible Authentication Protocol (EAP) [RFC3748].
763	   EAP includes an identity exchange where the Identity Response is
764	   primarily used for routing purposes and selecting which EAP method to
765	   use.  Since EAP Identity Requests and Responses are sent in
766	   cleartext, eavesdroppers and intermediaries along the communication
767	   path between the EAP peer and the EAP server can snoop on the
768	   identity, which is encoded in the form of the Network Access
769	   Identifier (NAI) defined in RFC 4282 [RFC4282]).  To address this
770	   threat, as discussed in RFC 4282 [RFC4282], the username part of the
771	   NAI (but not the realm-part) can be hidden from these eavesdroppers
772	   and intermediaries with the cryptographic support offered by EAP
773	   methods.  Identity confidentiality has become a recommended design
774	   criteria for EAP (see [RFC4017]).  EAP-AKA [RFC4187], for example,
775	   protects the EAP peer's identity against passive adversaries by
776	   utilizing temporal identities.  EAP-IKEv2 [RFC5106] is an example of
777	   an EAP method that offers protection against active attackers with
778	   regard to the individual's identity.

780	5.1.4.  Data Minimization within Identity Management

782	   Modern systems are increasingly relying on multi-party transactions
783	   to authenticate individuals.  Many of these systems make use of an
784	   identity provider that is responsible for providing authentication,
785	   authorization, and accounting functionality to relying parties that
786	   offer some protected resources.  To facilitate these functions an
787	   identity provider will usually go through a process of verifying the
788	   individual's identity and issuing credentials to the individual.
789	   When an individual seeks to make use of a service provided by the
790	   relying party, the relying party relies on the authentication
791	   assertions provided by its identity provider.  Note that in more
792	   sophisticated scenarios the authentication assertions are traits that
793	   demonstrate the individual's capabilities and roles.  The
794	   authorization responsibility may also be shared between the identity
795	   provider and the relying party and does not necessarily only need to
796	   reside with the identity provider.

798	   Such systems have the ability to support a number of properties that
799	   minimize data collection in different ways:

801	      In certain use cases relying parties do not need to know the real
802	      name of an individual (for example, when the individual's age is
803	      the only attribute that needs to be authenticated).

805	      Relying parties that collude can be prevented from using an
806	      individual's credentials to track the individual.  That is, two
807	      different relying parties can be prevented from determining that
808	      the same individual has authenticated to both of them.  This
809	      typically requires identity management protocol support and as
810	      well as support by both the relying party and the identity
811	      provider.

813	      The identity provider can be prevented from knowing which relying
814	      parties an individual interacted with.  This requires avoiding
815	      direct communication between the identity provider and the relying
816	      party at the time when access to a resource by the initiator is
817	      made.

819	5.2.  User Participation

821	   As explained in Section 4.2.5, data collection and use that happens
822	   "in secret," without the individual's knowledge, is apt to violate
823	   the individual's expectation of privacy and may create incentives for
824	   misuse of data.  As a result, privacy regimes tend to include
825	   provisions to require informing individuals about data collection and
826	   use and involving them in decisions about the treatment of their
827	   data.  In an engineering context, supporting the goal of user
828	   participation usually means providing ways for users to control the
829	   data that is shared about them.  It may also mean providing ways for
830	   users to signal how they expect their data to be used and shared.

832	   User participation mitigates the following threats: surveillance,
833	   secondary use, disclosure, exclusion

835	5.3.  Security

837	   Keeping data secure at rest and in transit is another important
838	   component of privacy protection.  As they are described in [RFC3552]
839	   Section 2, a number of security goals also serve to enhance privacy:

841	   o  Confidentiality: Keeping data secret from unintended listeners.

843	   o  Peer entity authentication: Ensuring that the endpoint of a
844	      communication is the one that is intended (in support of
845	      maintaining confidentiality).

847	   o  Unauthorized usage: Limiting data access to only those users who
848	      are authorized.  (Note that this goal also falls within data
849	      minimization.)

851	   o  Inappropriate usage: Limiting how authorized users can use data.
852	      (Note that this goal also falls within data minimization.)

854	   Note that even when these goals are achieved, the existence of items
855	   of interest -- attributes, identifiers, identities, communications,
856	   actions (such as the sending or receiving of a communication), or
857	   anything else an attacker or observer might be interested in -- may
858	   still be detectable, even if they are not readable.  Thus
859	   undetectability, in which an observer or attacker cannot sufficiently
860	   distinguish whether an item of interest exists or not, may be
861	   considered as a further security goal (albeit one that can be
862	   extremely difficult to accomplish).

864	   By providing proper security protection the following threats can be
865	   mitigated: surveillance, stored data compromise, misattribution,
866	   secondary use, disclosure, intrusion

868	6.  Scope

870	   Internet protocols are often built flexibly, making them useful in a
871	   variety of architectures, contexts, and deployment scenarios without
872	   requiring significant interdependency between disparately designed
873	   components.  Although protocol designers often have a particular
874	   target architecture or set of architectures in mind at design time,
875	   it is not uncommon for architectural frameworks to develop later,
876	   after implementations exist and have been deployed in combination
877	   with other protocols or components to form complete systems.

879	   As a consequence, the extent to which protocol designers can foresee
880	   all of the privacy implications of a particular protocol at design
881	   time is limited.  An individual protocol may be relatively benign on
882	   its own, and it may make use of privacy and security features at
883	   lower layers of the protocol stack (Internet Protocol Security,
884	   Transport Layer Security, and so forth) to mitigate the risk of
885	   attack.  But when deployed within a larger system or used in a way
886	   not envisioned at design time, its use may create new privacy risks.
887	   Protocols are often implemented and deployed long after design time
888	   by different people than those who did the protocol design.  The
889	   guidelines in Section 7 ask protocol designers to consider how their
890	   protocols are expected to interact with systems and information that
891	   exist outside the protocol bounds, but not to imagine every possible
892	   deployment scenario.

894	   Furthermore, in many cases the privacy properties of a system are
895	   dependent upon the complete system design where various protocols are
896	   combined together to form a product solution; the implementation,
897	   which includes the user interface design; and operational deployment
898	   practices, including default privacy settings and security processes
899	   within the company doing the deployment.  These details are specific
900	   to particular instantiations and generally outside the scope of the
901	   work conducted in the IETF.  The guidance provided here may be useful
902	   in making choices about these details, but its primary aim is to
903	   assist with the design, implementation, and operation of protocols.

905	   Transparency of data collection and use -- often effectuated through
906	   user interface design -- is normally a key factor in determining the
907	   privacy impact of a system.  Although most IETF activities do not
908	   involve standardizing user interfaces or user-facing communications,
909	   in some cases understanding expected user interactions can be
910	   important for protocol design.  Unexpected user behavior may have an
911	   adverse impact on security and/or privacy.

913	   In sum, privacy issues, even those related to protocol development,
914	   go beyond the technical guidance discussed herein.  As an example,
915	   consider HTTP [RFC2616], which was designed to allow the exchange of
916	   arbitrary data.  A complete analysis of the privacy considerations
917	   for uses of HTTP might include what type of data is exchanged, how
918	   this data is stored, and how it is processed.  Hence the analysis for
919	   an individual's static personal web page would be different than the
920	   use of HTTP for exchanging health records.  A protocol designer
921	   working on HTTP extensions (such as WebDAV [RFC4918]) is not expected
922	   to describe the privacy risks derived from all possible usage
923	   scenarios, but rather the privacy properties specific to the
924	   extensions and any particular uses of the extensions that are
925	   expected and foreseen at design time.

927	7.  Guidelines

929	   This section provides guidance for document authors in the form of a
930	   questionnaire about a protocol being designed.  The questionnaire may
931	   be useful at any point in the design process, particularly after
932	   document authors have developed a high-level protocol model as
933	   described in [RFC4101].

935	   Note that the guidance does not recommend specific practices.  The
936	   range of protocols developed in the IETF is too broad to make
937	   recommendations about particular uses of data or how privacy might be
938	   balanced against other design goals.  However, by carefully
939	   considering the answers to each question, document authors should be
940	   able to produce a comprehensive analysis that can serve as the basis
941	   for discussion of whether the protocol adequately protects against
942	   privacy threats.

944	   The framework is divided into four sections that address each of the
945	   mitigation classes from Section 5, plus a general section.  Security
946	   is not fully elaborated since substantial guidance already exists in
947	   [RFC3552].

949	7.1.  Data Minimization

951	      a.  Identifiers.  What identifiers does the protocol use for
952	      distinguishing initiators of communications?  Does the protocol
953	      use identifiers that allow different protocol interactions to be
954	      correlated?  What identifiers could be omitted or be made less
955	      identifying while still fulfilling the protocol's goals?

957	      b.  Data.  What information does the protocol expose about
958	      individuals, their devices, and/or their device usage (other than
959	      the identifiers discussed in (a))?  To what extent is this
960	      information linked to the identities of the individuals?  How does
961	      the protocol combine personal data with the identifiers discussed
962	      in (a)?

964	      c.  Observers.  Which information discussed in (a) and (b) is
965	      exposed to each other protocol entity (i.e., recipients,
966	      intermediaries, and enablers)?  Are there ways for protocol
967	      implementers to choose to limit the information shared with each
968	      entity?  Are there operational controls available to limit the
969	      information shared with each entity?

971	      d.  Fingerprinting.  In many cases the specific ordering and/or
972	      occurrences of information elements in a protocol allow users,
973	      devices, or software using the protocol to be fingerprinted.  Is
974	      this protocol vulnerable to fingerprinting?  If so, how?  Can it
975	      be designed to reduce or eliminate the vulnerability?  If not, why
976	      not?

978	      e.  Persistence of identifiers.  What assumptions are made in the
979	      protocol design about the lifetime of the identifiers discussed in
980	      (a)?  Does the protocol allow implementers or users to delete or
981	      replace identifiers?  How often does the specification recommend
982	      to delete or replace identifiers by default?  Can the identifiers,
983	      along with other state information, be set to automatically
984	      expire?

986	      f.  Correlation.  Does the protocol allow for correlation of
987	      identifiers?  Are there expected ways that information exposed by
988	      the protocol will be combined or correlated with information
989	      obtained outside the protocol?  How will such combination or
990	      correlation facilitate fingerprinting of a user, device, or
991	      application?  Are there expected combinations or correlations with
992	      outside data that will make users of the protocol more
993	      identifiable?

995	      g.  Retention.  Does the protocol or its anticipated uses require
996	      that the information discussed in (a) or (b) be retained by
997	      recipients, intermediaries, or enablers?  If so, why?  Is the
998	      retention expected to be persistent or temporary?

1000	7.2.  User Participation

1002	      a.  User control.  What controls or consent mechanisms does the
1003	      protocol define or require before personal data or identifiers are
1004	      shared or exposed via the protocol?  If no such mechanisms or
1005	      controls are specified, is it expected that control and consent
1006	      will be handled outside of the protocol?

1008	      b.  Control over sharing with individual recipients.  Does the
1009	      protocol provide ways for initiators to share different
1010	      information with different recipients?  If not, are there
1011	      mechanisms that exist outside of the protocol to provide
1012	      initiators with such control?

1014	      c.  Control over sharing with intermediaries.  Does the protocol
1015	      provide ways for initiators to limit which information is shared
1016	      with intermediaries?  If not, are there mechanisms that exist
1017	      outside of the protocol to provide users with such control?  Is it
1018	      expected that users will have relationships that govern the use of
1019	      the information (contractual or otherwise) with those who operate
1020	      these intermediaries?
1021	      d.  Preference expression.  Does the protocol provide ways for
1022	      initiators to express individuals' preferences to recipients or
1023	      intermediaries with regard to the collection, use, or disclosure
1024	      of their personal data?

1026	7.3.  Security

1028	      a.  Surveillance.  How do the protocol's security considerations
1029	      prevent surveillance, including eavesdropping and traffic
1030	      analysis?

1032	      b.  Stored data compromise.  How do the protocol's security
1033	      considerations prevent or mitigate stored data compromise?

1035	      c.  Intrusion.  How do the protocol's security considerations
1036	      prevent or mitigate intrusion, including denial-of-service attacks
1037	      and unsolicited communications more generally?

1039	      d.  Misattribution.  How do the protocol's mechanisms for
1040	      identifying and/or authenticating individuals prevent
1041	      misattribution?

1043	7.4.  General

1045	      a.  Trade-offs.  Does the protocol make trade-offs between privacy
1046	      and usability, privacy and efficiency, privacy and
1047	      implementability, or privacy and other design goals?  Describe the
1048	      trade-offs and the rationale for the design chosen.

1050	      b.  Defaults.  If the protocol can be operated in multiple modes
1051	      or with multiple configurable options, does the default mode or
1052	      option minimize the amount, identifiability, and persistence of
1053	      the data and identifiers exposed by the protocol?  Does the
1054	      default mode or option maximize the opportunity for user
1055	      participation?  Does it provide the strictest security features of
1056	      all the modes/options?  If any of these answers are no, explain
1057	      why less protective defaults were chosen.

1059	8.  Example

1061	   The following section gives an example of the threat analysis and
1062	   threat mitigation recommended by this document.  It covers a
1063	   particularly difficult application protocol, presence, to try to
1064	   demonstrate these principles on an architecture that is vulnerable to
1065	   many of the threats described above.  This text is not intended as an
1066	   example of a Privacy Considerations section that might appear in an
1067	   IETF specification, but rather as an example of the thinking that
1068	   should go into the design of a protocol when considering privacy as a
1069	   first principle.

1071	   A presence service, as defined in the abstract in [RFC2778], allows
1072	   users of a communications service to monitor one another's
1073	   availability and disposition in order to make decisions about
1074	   communicating.  Presence information is highly dynamic, and generally
1075	   characterizes whether a user is online or offline, busy or idle, away
1076	   from communications devices or nearby, and the like.  Necessarily,
1077	   this information has certain privacy implications, and from the start
1078	   the IETF approached this work with the aim to provide users with the
1079	   controls to determine how their presence information would be shared.
1080	   The Common Profile for Presence (CPP) [RFC3859] defines a set of
1081	   logical operations for delivery of presence information.  This
1082	   abstract model is applicable to multiple presence systems.  The SIP-
1083	   based SIMPLE presence system [RFC3261] uses CPP as its baseline
1084	   architecture, and the presence operations in the Extensible Messaging
1085	   and Presence Protocol (XMPP) have also been mapped to CPP [RFC3922].

1087	   The fundamental architecture defined in RFC 2778 and RFC 3859 is a
1088	   mediated one.  Clients (presentities in RFC 2778 terms) publish their
1089	   presence information to presence servers, which in turn distribute
1090	   information to authorized watchers.  Presence servers thus retain
1091	   presence information for an interval of time, until it either changes
1092	   or expires, so that it can be revealed to authorized watchers upon
1093	   request.  This architecture mirrors existing pre-standard deployment
1094	   models.  The integration of an explicit authorization mechanism into
1095	   the presence architecture has been widely successful in involving the
1096	   end users in the decision making process before sharing information.
1097	   Nearly all presence systems deployed today provide such a mechanism,
1098	   typically through a reciprocal authorization system by which a pair
1099	   of users, when they agree to be "buddies," consent to divulge their
1100	   presence information to one another.  Buddylists are managed by
1101	   servers but controlled by end users.  Users can also explicitly block
1102	   one another through a similar interface, and in some deployments it
1103	   is desirable to provide "polite blocking" of various kinds.

1105	   From a perspective of privacy design, however, the classical presence
1106	   architecture represents nearly a worst-case scenario.  In terms of
1107	   data minimization, presentities share their sensitive information
1108	   with presence services, and while services only share this presence
1109	   information with watchers authorized by the user, no technical
1110	   mechanism constrains those watchers from relaying presence to further
1111	   third parties.  Any of these entities could conceivably log or retain
1112	   presence information indefinitely.  The sensitivity cannot be
1113	   mitigated by rendering the user anonymous, as it is indeed the
1114	   purpose of the system to facilitate communications between users who
1115	   know one another.  The identifiers employed by users are long-lived
1116	   and often contain personal information, including personal names and
1117	   the domains of service providers.  While users do participate in the
1118	   construction of buddylists and blacklists, they do so with little
1119	   prospect for accountability: the user effectively throws their
1120	   presence information over the wall to a presence server that in turn
1121	   distributes the information to watchers.  Users typically have no way
1122	   to verify that presence is being distributed only to authorized
1123	   watchers, especially as it is the server that authenticates watchers,
1124	   not the end user.  Connections between the server and all publishers
1125	   and consumers of presence data are moreover an attractive target for
1126	   eavesdroppers, and require strong confidentiality mechanisms, though
1127	   again the end user has no way to verify what mechanisms are in place
1128	   between the presence server and a watcher.

1130	   Moreover, the sensitivity of presence information is not limited to
1131	   the disposition and capability to communicate.  Capabilities can
1132	   reveal the type of device that a user employs, for example, and since
1133	   multiple devices can publish the same user's presence, there are
1134	   significant risks of allowing attackers to correlate user devices.
1135	   An important extension to presence was developed to enable the
1136	   support for location sharing.  The effort to standardize protocols
1137	   for systems sharing geolocation was started in the GEOPRIV working
1138	   group.  During the initial requirements and privacy threat analysis
1139	   in the process of chartering the working group, it became clear that
1140	   the system would require an underlying communication mechanism
1141	   supporting user consent to share location information.  The
1142	   resemblance of these requirements to the presence framework was
1143	   quickly recognized, and this design decision was documented in
1144	   [RFC4079].  Location information thus mingles with other presence
1145	   information available through the system to intermediaries and to
1146	   authorized watchers.

1148	   Privacy concerns about presence information largely arise due to the
1149	   built-in mediation of the presence architecture.  The need for a
1150	   presence server is motivated by two primary design requirements of
1151	   presence: in the first place, the server can respond with an
1152	   "offline" indication when the user is not online; in the second
1153	   place, the server can compose presence information published by
1154	   different devices under the user's control.  Additionally, to
1155	   preserve the use of URIs as identifiers for entities, some service
1156	   must operate a host with the domain name appearing in a presence URI,
1157	   and in practical terms no commercial presence architecture would
1158	   force end users to own and operate their own domain names.  Many end
1159	   users of applications like presence are behind NATs or firewalls, and
1160	   effectively cannot receive direct connections from the Internet - the
1161	   persistent bidirectional channel these clients open and maintain with
1162	   a presence server is essential to the operation of the protocol.

1164	   One must first ask if the trade-off of mediation for presence is
1165	   worth it.  Does a server need to be in the middle of all publications
1166	   of presence information?  It might seem that end-to-end encryption of
1167	   the presence information could solve many of these problems.  A
1168	   presentity could encrypt the presence information with the public key
1169	   of a watcher, and only then send the presence information through the
1170	   server.  The IETF defined an object format for presence information
1171	   called the Presence Information Data Format (PIDF), which for the
1172	   purposes of conveying location information was extended to the PIDF
1173	   Location Object (PIDF-LO) - these XML objects were designed to
1174	   accommodate an encrypted wrapper.  Encrypting this data would have
1175	   the added benefit of preventing stored cleartext presence information
1176	   from being seized by an attacker who manages to compromise a presence
1177	   server.  This proposal, however, quickly runs into usability
1178	   problems.  Discovering the public keys of watchers is the first
1179	   difficulty, one that few Internet protocols have addressed
1180	   successfully.  This solution would then require the presentity to
1181	   publish one encrypted copy of its presence information per authorized
1182	   watcher to the presence service, regardless of whether or not a
1183	   watcher is actively seeking presence information - for a presentity
1184	   with many watchers, this may place an unacceptable burden on the
1185	   presence server, especially given the dynamism of presence
1186	   information.  Finally, it prevents the server from composing presence
1187	   information reported by multiple devices under the same user's
1188	   control.  On the whole, these difficulties render object encryption
1189	   of presence information a doubtful prospect.

1191	   Some protocols that provide presence information, such as SIP, can
1192	   operate intermediaries in a redirecting mode, rather than a
1193	   publishing or proxying mode.  Instead of sending presence information
1194	   through the server, in other words, these protocols can merely
1195	   redirect watchers to the presentity, and then presence information
1196	   could pass directly and securely from the presentity to the watcher.
1197	   It is worth noting that this would disclose the IP address of the
1198	   presentity to the watcher, which has its own set of risks.  In that
1199	   case, the presentity can decide exactly what information it would
1200	   like to share with the watcher in question, it can authenticate the
1201	   watcher itself with whatever strength of credential it chooses, and
1202	   with end-to-end encryption it can reduce the likelihood of any
1203	   eavesdropping.  In a redirection architecture, a presence server
1204	   could still provide the necessary "offline" indication, without
1205	   requiring the presence server to observe and forward all information
1206	   itself.  This mechanism is more promising than encryption, but also
1207	   suffers from significant difficulties.  It too does not provide for
1208	   composition of presence information from multiple devices - it in
1209	   fact forces the watcher to perform this composition itself.  The
1210	   largest single impediment to this approach is however the difficulty
1211	   of creating end-to-end connections between the presentity's device(s)
1212	   and a watcher, as some or all of these endpoints may be behind NATs
1213	   or firewalls that prevent peer-to-peer connections.  While there are
1214	   potential solutions for this problem, like STUN and TURN, they add
1215	   complexity to the overall system.

1217	   Consequently, mediation is a difficult feature of the presence
1218	   architecture to remove, and due especially to the requirement for
1219	   composition it is hard to minimize the data shared with
1220	   intermediaries.  Control over sharing with intermediaries must
1221	   therefore come from some other explicit component of the
1222	   architecture.  As such, the presence work in the IETF focused on
1223	   improving the user participation over the activities of the presence
1224	   server.  This work began in the GEOPRIV working group, with controls
1225	   on location privacy, as location of users is perceived as having
1226	   especially sensitive properties.  With the aim to meet the privacy
1227	   requirements defined in [RFC2779] a set of usage indications, such as
1228	   whether retransmission is allowed or when the retention period
1229	   expires, have been added to PIDF-LO that always travel with location
1230	   information itself.  These privacy preferences apply not only to the
1231	   intermediaries that store and forward presence information, but also
1232	   to the watchers who consume it.

1234	   This approach very much follows the spirit of Creative Commons [CC],
1235	   namely the usage of a limited number of conditions (such as 'Share
1236	   Alike' [CC-SA]).  Unlike Creative Commons, the GEOPRIV working group
1237	   did not, however, initiate work to produce legal language nor to
1238	   design graphical icons since this would fall outside the scope of the
1239	   IETF.  In particular, the GEOPRIV rules state a preference on the
1240	   retention and retransmission of location information; while GEOPRIV
1241	   cannot force any entity receiving a PIDF-LO object to abide by those
1242	   preferences, if users lack the ability to express them at all, we can
1243	   guarantee their preferences will not be honored.

1245	   The retention and retransmission elements were envisioned as the most
1246	   essential examples of preference expression in sharing presence.  The
1247	   PIDF object was designed for extensibility, and the rulesets created
1248	   for PIDF-LO can also be extended to provide new expressions of user
1249	   preference.  Not all user preference information should be bound into
1250	   a particular PIDF object, however - many forms of access control
1251	   policy assumed by the presence architecture need to be provisioned in
1252	   the presence server by some interface with the user.  This
1253	   requirement eventually triggered the standardization of a general
1254	   access control policy language called the Common Policy (defined in
1255	   [RFC4745]) framework.  This language allows one to express ways to
1256	   control the distribution of information as simple conditions,
1257	   actions, and transformations rules expressed in an XML format.
1258	   Common Policy itself is an abstract format which needs to be
1259	   instantiated: two examples can be found with the Presence
1260	   Authorization Rules [RFC5025] and the Geolocation Policy
1261	   [I-D.ietf-geopriv-policy].  The former provides additional
1262	   expressiveness for presence based systems, while the latter defines
1263	   syntax and semantic for location based conditions and
1264	   transformations.

1266	   Ultimately, the privacy work on presence represents a compromise
1267	   between privacy principles and the needs of the architecture and
1268	   marketplace.  While it was not feasible to remove intermediaries from
1269	   the architecture entirely, nor to prevent their access to presence
1270	   information, the IETF did provide a way for users to express their
1271	   preferences and provision their controls at the presence service.  We
1272	   have not had great successes in the implementation space with privacy
1273	   mechanisms thus far, but by documenting and acknowledging the
1274	   limitations of these mechanisms, the designers were able to provide
1275	   implementers, and end users, with an informed perspective on the
1276	   privacy properties of the IETF's presence protocols.

1278	9.  Security Considerations

1280	   This document describes privacy aspects that protocol designers
1281	   should consider in addition to regular security analysis.

1283	10.  IANA Considerations

1285	   This document does not require actions by IANA.

1287	11.  Acknowledgements

1289	   We would like to thank Christine Runnegar for her extensive helpful
1290	   review comments.

1292	   We would like to thank Scott Brim, Kasey Chappelle, Marc Linsner,
1293	   Bryan McLaughlin, Nick Mathewson, Eric Rescorla, Scott Bradner, Nat
1294	   Sakimura, Bjoern Hoehrmann, David Singer, Dean Willis, Christine
1295	   Runnegar, Lucy Lynch, Trent Adams, Mark Lizar, Martin Thomson, Josh
1296	   Howlett, Mischa Tuffield, S. Moonesamy, Zhou Sujing, Claudia Diaz,
1297	   Leif Johansson, Jeff Hodges, Stephen Farrel, Steven Johnston, Cullen
1298	   Jennings, Ted Hardie, Dave Thaler, and Klaas Wierenga.

1300	   Finally, we would like to thank the participants for the feedback
1301	   they provided during the December 2010 Internet Privacy workshop co-
1302	   organized by MIT, ISOC, W3C and the IAB.

1304	12.  IAB Members at the Time of Approval

1306	      Bernard Aboba

1308	      Jari Arkko

1310	      Marc Blanchet

1312	      Ross Callon

1314	      Alissa Cooper

1316	      Spencer Dawkins

1318	      Joel Halpern

1320	      Russ Housley

1322	      David Kessens

1324	      Danny McPherson

1326	      Jon Peterson

1328	      Dave Thaler

1330	      Hannes Tschofenig

1332	13.  Informative References

1334	   [CC]       Creative Commons, "Creative Commons", 2012.

1336	   [CC-SA]    Creative Commons, "Share Alike", 2012.

1338	   [CoE]      Council of Europe, "Recommendation CM/Rec(2010)13  of the
1339	              Committee of Ministers to member states  on the protection
1340	              of individuals with regard to automatic processing  of
1341	              personal data in the context of profiling", available at
1342	              (November 2010) ,
1343	              https://wcd.coe.int/ViewDoc.jsp?Ref=CM/Rec%282010%2913,
1344	              2010.

1346	   [EFF]      Electronic Frontier Foundation, "Panopticlick", 2011.

1348	   [FIPs]     Gellman, B., "Fair Information Practices: A Basic
1349	              History", 2012.

1351	   [I-D.ietf-geopriv-policy]
1352	              Schulzrinne, H., Tschofenig, H., Cuellar, J., Polk, J.,
1353	              Morris, J., and M. Thomson, "Geolocation Policy: A
1354	              Document Format for Expressing Privacy Preferences for
1355	              Location Information", draft-ietf-geopriv-policy-27 (work
1356	              in progress), August 2012.

1358	   [OECD]     Organization for Economic Co-operation and Development,
1359	              "OECD Guidelines on the Protection of Privacy and
1360	              Transborder Flows of Personal Data", available at
1361	              (September 2010) , http://www.oecd.org/EN/document/
1362	              0,,EN-document-0-nodirectorate-no-24-10255-0,00.html,
1363	              1980.

1365	   [PbD]      Office of the Information and Privacy Commissioner,
1366	              Ontario, Canada, "Privacy by Design", 2011.

1368	   [RFC2616]  Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
1369	              Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext
1370	              Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.

1372	   [RFC2778]  Day, M., Rosenberg, J., and H. Sugano, "A Model for
1373	              Presence and Instant Messaging", RFC 2778, February 2000.

1375	   [RFC2779]  Day, M., Aggarwal, S., Mohr, G., and J. Vincent, "Instant
1376	              Messaging / Presence Protocol Requirements", RFC 2779,
1377	              February 2000.

1379	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
1380	              A., Peterson, J., Sparks, R., Handley, M., and E.
1381	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
1382	              June 2002.

1384	   [RFC3325]  Jennings, C., Peterson, J., and M. Watson, "Private
1385	              Extensions to the Session Initiation Protocol (SIP) for
1386	              Asserted Identity within Trusted Networks", RFC 3325,
1387	              November 2002.

1389	   [RFC3552]  Rescorla, E. and B. Korver, "Guidelines for Writing RFC
1390	              Text on Security Considerations", BCP 72, RFC 3552,
1391	              July 2003.

1393	   [RFC3748]  Aboba, B., Blunk, L., Vollbrecht, J., Carlson, J., and H.
1394	              Levkowetz, "Extensible Authentication Protocol (EAP)",
1395	              RFC 3748, June 2004.

1397	   [RFC3859]  Peterson, J., "Common Profile for Presence (CPP)",
1398	              RFC 3859, August 2004.

1400	   [RFC3922]  Saint-Andre, P., "Mapping the Extensible Messaging and
1401	              Presence Protocol (XMPP) to Common Presence and Instant
1402	              Messaging (CPIM)", RFC 3922, October 2004.

1404	   [RFC4017]  Stanley, D., Walker, J., and B. Aboba, "Extensible
1405	              Authentication Protocol (EAP) Method Requirements for
1406	              Wireless LANs", RFC 4017, March 2005.

1408	   [RFC4079]  Peterson, J., "A Presence Architecture for the
1409	              Distribution of GEOPRIV Location Objects", RFC 4079,
1410	              July 2005.

1412	   [RFC4101]  Rescorla, E. and IAB, "Writing Protocol Models", RFC 4101,
1413	              June 2005.

1415	   [RFC4187]  Arkko, J. and H. Haverinen, "Extensible Authentication
1416	              Protocol Method for 3rd Generation Authentication and Key
1417	              Agreement (EAP-AKA)", RFC 4187, January 2006.

1419	   [RFC4282]  Aboba, B., Beadles, M., Arkko, J., and P. Eronen, "The
1420	              Network Access Identifier", RFC 4282, December 2005.

1422	   [RFC4745]  Schulzrinne, H., Tschofenig, H., Morris, J., Cuellar, J.,
1423	              Polk, J., and J. Rosenberg, "Common Policy: A Document
1424	              Format for Expressing Privacy Preferences", RFC 4745,
1425	              February 2007.

1427	   [RFC4918]  Dusseault, L., "HTTP Extensions for Web Distributed
1428	              Authoring and Versioning (WebDAV)", RFC 4918, June 2007.

1430	   [RFC4949]  Shirey, R., "Internet Security Glossary, Version 2",
1431	              RFC 4949, August 2007.

1433	   [RFC5025]  Rosenberg, J., "Presence Authorization Rules", RFC 5025,
1434	              December 2007.

1436	   [RFC5077]  Salowey, J., Zhou, H., Eronen, P., and H. Tschofenig,
1437	              "Transport Layer Security (TLS) Session Resumption without
1438	              Server-Side State", RFC 5077, January 2008.

1440	   [RFC5106]  Tschofenig, H., Kroeselberg, D., Pashalidis, A., Ohba, Y.,
1441	              and F. Bersani, "The Extensible Authentication Protocol-
1442	              Internet Key Exchange Protocol version 2 (EAP-IKEv2)
1443	              Method", RFC 5106, February 2008.

1445	   [RFC5246]  Dierks, T. and E. Rescorla, "The Transport Layer Security
1446	              (TLS) Protocol Version 1.2", RFC 5246, August 2008.

1448	   [RFC6269]  Ford, M., Boucadair, M., Durand, A., Levis, P., and P.
1449	              Roberts, "Issues with IP Address Sharing", RFC 6269,
1450	              June 2011.

1452	   [RFC6280]  Barnes, R., Lepinski, M., Cooper, A., Morris, J.,
1453	              Tschofenig, H., and H. Schulzrinne, "An Architecture for
1454	              Location and Location Privacy in Internet Applications",
1455	              BCP 160, RFC 6280, July 2011.

1457	   [RFC6302]  Durand, A., Gashinsky, I., Lee, D., and S. Sheppard,
1458	              "Logging Recommendations for Internet-Facing Servers",
1459	              BCP 162, RFC 6302, June 2011.

1461	   [RFC6350]  Perreault, S., "vCard Format Specification", RFC 6350,
1462	              August 2011.

1464	   [RFC6716]  Valin, JM., Vos, K., and T. Terriberry, "Definition of the
1465	              Opus Audio Codec", RFC 6716, September 2012.

1467	   [Solove]   Solove, D., "Understanding Privacy", 2010.

1469	   [Tor]      The Tor Project, Inc., "Tor", 2011.

1471	   [Westin]   Kumaraguru, P. and L. Cranor, "Privacy Indexes: A Survey
1472	              of Westin's Studies", 2005.

1474	Authors' Addresses

1476	   Alissa Cooper
1477	   CDT
1478	   1634 Eye St. NW, Suite 1100
1479	   Washington, DC  20006
1480	   US

1482	   Phone: +1-202-637-9800
1483	   Email: acooper@cdt.org
1484	   URI:   http://www.cdt.org/

1486	   Hannes Tschofenig
1487	   Nokia Siemens Networks
1488	   Linnoitustie 6
1489	   Espoo  02600
1490	   Finland

1492	   Phone: +358 (50) 4871445
1493	   Email: Hannes.Tschofenig@gmx.net
1494	   URI:   http://www.tschofenig.priv.at

1496	   Bernard Aboba
1497	   Microsoft Corporation
1498	   One Microsoft Way
1499	   Redmond, WA  98052
1500	   US

1502	   Email: bernarda@microsoft.com

1504	   Jon Peterson
1505	   NeuStar, Inc.
1506	   1800 Sutter St Suite 570
1507	   Concord, CA  94520
1508	   US

1510	   Email: jon.peterson@neustar.biz

1512	   John B. Morris, Jr.

1514	   Email: ietf@jmorris.org
1515	   Marit Hansen
1516	   ULD Kiel

1518	   Email: marit.hansen@datenschutzzentrum.de

1520	   Rhys Smith
1521	   JANET(UK)

1523	   Email: rhys.smith@ja.net