idnits 2.17.1 draft-iab-privacy-considerations-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 22, 2013) is 3990 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 4282 (Obsoleted by RFC 7542) -- Obsolete informational reference (is this intentional?): RFC 5077 (Obsoleted by RFC 8446) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Cooper 3 Internet-Draft CDT 4 Intended status: Informational H. Tschofenig 5 Expires: November 23, 2013 Nokia Siemens Networks 6 B. Aboba 7 Microsoft Corporation 8 J. Peterson 9 NeuStar, Inc. 10 J. Morris 11 M. Hansen 12 ULD Kiel 13 R. Smith 14 Janet 15 May 22, 2013 17 Privacy Considerations for Internet Protocols 18 draft-iab-privacy-considerations-09.txt 20 Abstract 22 This document offers guidance for developing privacy considerations 23 for inclusion in protocol specifications. It aims to make designers, 24 implementers, and users of Internet protocols aware of privacy- 25 related design choices. It suggests that whether any individual RFC 26 warrants a specific privacy considerations section will depend on the 27 document's content. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on November 23, 2013. 46 Copyright Notice 47 Copyright (c) 2013 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 63 2. Scope of Privacy Implications of Internet Protocols . . . . . 4 64 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 65 3.1. Entities . . . . . . . . . . . . . . . . . . . . . . . . 5 66 3.2. Data and Analysis . . . . . . . . . . . . . . . . . . . . 6 67 3.3. Identifiability . . . . . . . . . . . . . . . . . . . . . 7 68 4. Communications Model . . . . . . . . . . . . . . . . . . . . 9 69 5. Privacy Threats . . . . . . . . . . . . . . . . . . . . . . . 10 70 5.1. Combined Security-Privacy Threats . . . . . . . . . . . . 11 71 5.1.1. Surveillance . . . . . . . . . . . . . . . . . . . . 11 72 5.1.2. Stored Data Compromise . . . . . . . . . . . . . . . 12 73 5.1.3. Intrusion . . . . . . . . . . . . . . . . . . . . . . 13 74 5.1.4. Misattribution . . . . . . . . . . . . . . . . . . . 13 75 5.2. Privacy-Specific Threats . . . . . . . . . . . . . . . . 13 76 5.2.1. Correlation . . . . . . . . . . . . . . . . . . . . . 13 77 5.2.2. Identification . . . . . . . . . . . . . . . . . . . 14 78 5.2.3. Secondary Use . . . . . . . . . . . . . . . . . . . . 15 79 5.2.4. Disclosure . . . . . . . . . . . . . . . . . . . . . 15 80 5.2.5. Exclusion . . . . . . . . . . . . . . . . . . . . . . 16 81 6. Threat Mitigations . . . . . . . . . . . . . . . . . . . . . 16 82 6.1. Data Minimization . . . . . . . . . . . . . . . . . . . . 17 83 6.1.1. Anonymity . . . . . . . . . . . . . . . . . . . . . . 17 84 6.1.2. Pseudonymity . . . . . . . . . . . . . . . . . . . . 18 85 6.1.3. Identity Confidentiality . . . . . . . . . . . . . . 18 86 6.1.4. Data Minimization within Identity Management . . . . 19 87 6.2. User Participation . . . . . . . . . . . . . . . . . . . 20 88 6.3. Security . . . . . . . . . . . . . . . . . . . . . . . . 20 89 7. Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . 22 90 7.1. Data Minimization . . . . . . . . . . . . . . . . . . . . 22 91 7.2. User Participation . . . . . . . . . . . . . . . . . . . 23 92 7.3. Security . . . . . . . . . . . . . . . . . . . . . . . . 24 93 7.4. General . . . . . . . . . . . . . . . . . . . . . . . . . 24 94 8. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 95 9. Security Considerations . . . . . . . . . . . . . . . . . . . 29 96 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 97 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 29 98 12. IAB Members at the Time of Approval . . . . . . . . . . . . . 30 99 13. Informative References . . . . . . . . . . . . . . . . . . . 30 100 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 102 1. Introduction 104 [RFC3552] provides detailed guidance to protocol designers about both 105 how to consider security as part of protocol design and how to inform 106 readers of protocol specifications about security issues. This 107 document intends to provide a similar set of guidance for considering 108 privacy in protocol design. 110 Privacy is a complicated concept with a rich history that spans many 111 disciplines. With regard to data, often it is a concept applied to 112 "personal data," commonly defined as information relating to an 113 identified or identifiable individual. Many sets of privacy 114 principles and privacy design frameworks have been developed in 115 different forums over the years. These include the Fair Information 116 Practices [FIPs], a baseline set of privacy protections pertaining to 117 the collection and use of personal data (often based on the 118 principles established in [OECD], for example), and the Privacy by 119 Design concept, which provides high-level privacy guidance for 120 systems design (see [PbD] for one example). The guidance provided in 121 this document is inspired by this prior work, but it aims to be more 122 concrete, pointing protocol designers to specific engineering choices 123 that can impact the privacy of the individuals that make use of 124 Internet protocols. 126 Different people have radically different conceptions of what privacy 127 means, both in general, and as it relates to them personally 128 [Westin]. Furthermore, privacy as a legal concept is understood 129 differently in different jurisdictions. The guidance provided in 130 this document is generic and can be used to inform the design of any 131 protocol to be used anywhere in the world, without reference to 132 specific legal frameworks. 134 Whether any individual document warrants a specific privacy 135 considerations section will depend on the document's content. 136 Documents whose entire focus is privacy may not merit a separate 137 section (for example, "Private Extensions to the Session Initiation 138 Protocol (SIP) for Asserted Identity within Trusted Networks" 139 [RFC3325]). For certain specifications, privacy considerations are a 140 subset of security considerations and can be discussed explicitly in 141 the security considerations section. Some documents will not require 142 discussion of privacy considerations (for example, "Definition of the 143 Opus Audio Codec" [RFC6716]). The guidance provided here can and 144 should be used to assess the privacy considerations of protocol, 145 architectural, and operational specifications and to decide whether 146 those considerations are to be documented in a stand-alone section, 147 within the security considerations section, or throughout the 148 document. The guidance is meant to help the thought process of 149 privacy analysis; it does not provide a specific directions for how 150 to write a privacy considerations section. 152 This document is organized as follows. Section 3 explains the 153 terminology used in this document. Section 4 reviews typical 154 communications architectures to understand at which points there may 155 be privacy threats. Section 5 discusses threats to privacy as they 156 apply to Internet protocols. Section 6 outlines mitigations of those 157 threats. Section 2 describes the extent to which the guidance 158 offered is applicable within the IETF and within the larger Internet 159 community. Section 7 provides the guidelines for analyzing and 160 documenting privacy considerations within IETF specifications. 161 Section 8 examines the privacy characteristics of an IETF protocol to 162 demonstrate the use of the guidance framework. 164 2. Scope of Privacy Implications of Internet Protocols 166 Internet protocols are often built flexibly, making them useful in a 167 variety of architectures, contexts, and deployment scenarios without 168 requiring significant interdependency between disparately designed 169 components. Although protocol designers often have a particular 170 target architecture or set of architectures in mind at design time, 171 it is not uncommon for architectural frameworks to develop later, 172 after implementations exist and have been deployed in combination 173 with other protocols or components to form complete systems. 175 As a consequence, the extent to which protocol designers can foresee 176 all of the privacy implications of a particular protocol at design 177 time is limited. An individual protocol may be relatively benign on 178 its own, and it may make use of privacy and security features at 179 lower layers of the protocol stack (Internet Protocol Security, 180 Transport Layer Security, and so forth) to mitigate the risk of 181 attack. But when deployed within a larger system or used in a way 182 not envisioned at design time, its use may create new privacy risks. 183 Protocols are often implemented and deployed long after design time 184 by different people than those who did the protocol design. The 185 guidelines in Section 7 ask protocol designers to consider how their 186 protocols are expected to interact with systems and information that 187 exist outside the protocol bounds, but not to imagine every possible 188 deployment scenario. 190 Furthermore, in many cases the privacy properties of a system are 191 dependent upon the complete system design where various protocols are 192 combined together to form a product solution; the implementation, 193 which includes the user interface design; and operational deployment 194 practices, including default privacy settings and security processes 195 of the company doing the deployment. These details are specific to 196 particular instantiations and generally outside the scope of the work 197 conducted in the IETF. The guidance provided here may be useful in 198 making choices about these details, but its primary aim is to assist 199 with the design, implementation, and operation of protocols. 201 Transparency of data collection and use -- often effectuated through 202 user interface design -- is normally relied on (whether rightly or 203 wrongly) as a key factor in determining the privacy impact of a 204 system. Although most IETF activities do not involve standardizing 205 user interfaces or user-facing communications, in some cases 206 understanding expected user interactions can be important for 207 protocol design. Unexpected user behavior may have an adverse impact 208 on security and/or privacy. 210 In sum, privacy issues, even those related to protocol development, 211 go beyond the technical guidance discussed herein. As an example, 212 consider HTTP [RFC2616], which was designed to allow the exchange of 213 arbitrary data. A complete analysis of the privacy considerations 214 for uses of HTTP might include what type of data is exchanged, how 215 this data is stored, and how it is processed. Hence the analysis for 216 an individual's static personal web page would be different than the 217 use of HTTP for exchanging health records. A protocol designer 218 working on HTTP extensions (such as WebDAV [RFC4918]) is not expected 219 to describe the privacy risks derived from all possible usage 220 scenarios, but rather the privacy properties specific to the 221 extensions and any particular uses of the extensions that are 222 expected and foreseen at design time. 224 3. Terminology 226 This section defines basic terms used in this document, with 227 references to pre-existing definitions as appropriate. As in 228 [RFC4949], each entry is preceded by a dollar sign ($) and a space 229 for automated searching. Note that this document does not try to 230 attempt to define the term 'privacy' with a brief definition. 231 Instead, privacy is the sum of what is contained in this document. 232 We therefore follow the approach taken by [RFC3552]. Examples of 233 several different brief definitions are provided in [RFC4949]. 235 3.1. Entities 237 Several of these terms are further elaborated in Section 4. 239 $ Attacker: An entity that works against one or more privacy 240 protection goals. Unlike observers, attackers' behavior is 241 unauthorized. 243 $ Eavesdropper: A type of attacker that passively observes an 244 initiator's communications without the initiator's knowledge or 245 authorization. See [RFC4949]. 247 $ Enabler: A protocol entity that facilitates communication between 248 an initiator and a recipient without being directly in the 249 communications path. 251 $ Individual: A human being. 253 $ Initiator: A protocol entity that initiates communications with a 254 recipient. 256 $ Intermediary: A protocol entity that sits between the initiator 257 and the recipient and is necessary for the initiator and recipient 258 to communicate. Unlike an eavesdropper, an intermediary is an 259 entity that is part of the communication architecture, and 260 therefore at least tacitly authorized. For example, a SIP proxy 261 is an intermediary in the SIP architecture. 263 $ Observer: An entity that is able to observe and collect 264 information from communications, potentially posing privacy 265 threats depending on the context. As defined in this document, 266 initiators, recipients, intermediaries, and enablers can all be 267 observers. Observers are distinguished from eavesdroppers by 268 being at least tacitly authorized. 270 $ Recipient: A protocol entity that receives communications from an 271 initiator. 273 3.2. Data and Analysis 275 $ Attack: An intentional act by which an entity attempts to violate 276 an individual's privacy. See [RFC4949]. 278 $ Correlation: The combination of various pieces of information 279 that relate to an individual or that obtain that characteristic 280 when combined. 282 $ Fingerprint: A set of information elements that identifies a 283 device or application instance. 285 $ Fingerprinting: The process of an observer or attacker uniquely 286 identifying (with a sufficiently high probability) a device or 287 application instance based on multiple information elements 288 communicated to the observer or attacker. See [EFF]. 290 $ Item of Interest (IOI): Any data item that an observer or 291 attacker might be interested in. This includes attributes, 292 identifiers, identities, communications content, and the fact that 293 a communication interaction has taken place. 295 $ Personal Data: Any information relating to an individual who can 296 be identified, directly or indirectly. 298 $ (Protocol) Interaction: A unit of communication within a 299 particular protocol. A single interaction may be comprised of a 300 single message between an initiator and recipient or multiple 301 messages, depending on the protocol. 303 $ Traffic Analysis: The inference of information from observation 304 of traffic flows (presence, absence, amount, direction, timing, 305 packet size, packet composition, and/or frequency), even if flows 306 are encrypted. See [RFC4949]. 308 $ Undetectability: The inability of an observer or attacker to 309 sufficiently distinguish whether an item of interest exists or 310 not. 312 $ Unlinkability: Within a particular set of information, the 313 inability of an observer or attacker to distinguish whether two 314 items of interest are related or not (with a high enough degree of 315 probability to be useful to the observer or attacker). 317 3.3. Identifiability 319 $ Anonymity: The state of being anonymous. 321 $ Anonymity Set: A set of individuals that have the same 322 attributes, making them indistinguishable from each other from the 323 perspective of a particular attacker or observer. 325 $ Anonymous: A state of an individual in which an observer or 326 attacker cannot identify the individual within a set of other 327 individuals (the anonymity set). 329 $ Attribute: A property of an individual. 331 $ Identifiable: A property in which an individual's identity is 332 capable of being known to an observer or attacker. 334 $ Identifiability: The extent to which an individual is 335 identifiable. 337 $ Identified: A state in which an individual's identity is known. 339 $ Identifier: A data object uniquely referring to a specific 340 identity of a protocol entity or individual in some context. See 341 [RFC4949]. Identifiers can be based upon natural names -- 342 official names, personal names, and/or nicknames -- or can be 343 artificial (for example, x9z32vb). However, identifiers are by 344 definition unique within their context of use, while natural names 345 are often not unique. 347 $ Identification: The linking of information to a particular 348 individual to infer an individual's identity or to allow the 349 inference of an individual's identity in some context. 351 $ Identity: Any subset of an individual's attributes, including 352 names, that identifies the individual within a given context. 353 Individuals usually have multiple identities for use in different 354 contexts. 356 $ Identity Confidentiality: A property of an individual where only 357 the recipient can sufficiently identify the individual within a 358 set of other individuals. This can be a desirable property of 359 authentication protocols. 361 $ Identity Provider: An entity (usually an organization) that is 362 responsible for establishing, maintaining, securing, and vouching 363 for the identities associated with individuals. 365 $ Official Name: A personal name for an individual which is 366 registered in some official context. For example, the name on an 367 individual's birth certificate. Official names are often not 368 unique. 370 $ Personal Name: A natural name for an individual. Personal names 371 are often not unique, and often comprise given names in 372 combination with a family name. An individual may have multiple 373 personal names at any time and over a lifetime, including official 374 names. From a technological perspective, it cannot always be 375 determined whether a given reference to an individual is, or is 376 based upon, the individual's personal name(s) (see Pseudonym). 378 $ Pseudonym: A name assumed by an individual in some context, 379 unrelated to the individual's personal names known by others in 380 that context, with an intent of not revealing the individual's 381 identities associated with his or her other names. Pseudonyms are 382 often not unique. 384 $ Pseudonymity: The state of being pseudonymous. 386 $ Pseudonymous: A property of an individual in which the individual 387 is identified by a pseudonym. 389 $ Real name: See personal name and official name. 391 $ Relying party: An entity that relies on assertions of 392 individuals' identities from identity providers in order to 393 provide services to individuals. In effect, the relying party 394 delegates aspects of identity management to the identity 395 provider(s). Such delegation requires protocol exchanges, trust, 396 and a common understanding of semantics of information exchanged 397 between the relying party and the identity provider. 399 4. Communications Model 401 To understand attacks in the privacy-harm sense, it is helpful to 402 consider the overall communication architecture and different actors' 403 roles within it. Consider a protocol entity, the "initiator," that 404 initiates communication with some recipient. Privacy analysis is 405 most relevant for protocols with use cases in which the initiator 406 acts on behalf of an individual (or different individuals at 407 different times). It is this individual whose privacy is potentially 408 threatened. (Although in some instances an initiator communicates 409 information about another individual, in which case both of their 410 privacy interests will be implicated.) 412 Communications may be direct between the initiator and the recipient, 413 or they may involve an application-layer intermediary (such as a 414 proxy, cache, or relay) that is necessary for the two parties to 415 communicate. In some cases this intermediary stays in the 416 communication path for the entire duration of the communication and 417 sometimes it is only used for communication establishment, for either 418 inbound or outbound communication. In some cases there may be a 419 series of intermediaries that are traversed. At lower layers, 420 additional entities are involved in packet forwarding that may 421 interfere with privacy protection goals as well. 423 Some communications tasks require multiple protocol interactions with 424 different entities. For example, a request to an HTTP server may be 425 preceded by an interaction between the initiator and an 426 Authentication, Authorization, and Accounting (AAA) server for 427 network access and to a Domain Name System (DNS) server for name 428 resolution. In this case, the HTTP server is the recipient and the 429 other entities are enablers of the initiator-to-recipient 430 communication. Similarly, a single communication with the recipient 431 might generate further protocol interactions between either the 432 initiator or the recipient and other entities, and the roles of the 433 entities might change with each interaction. For example, an HTTP 434 request might trigger interactions with an authentication server or 435 with other resource servers wherein the recipient becomes an 436 initiator in those later interactions. 438 Thus, when conducting privacy analysis of an architecture that 439 involves multiple communications phases, the entities involved may 440 take on different -- or opposing -- roles from a privacy 441 considerations perspective in each phase. Understanding the privacy 442 implications of the architecture as a whole may require a separate 443 analysis of each phase. 445 Protocol design is often predicated on the notion that recipients, 446 intermediaries, and enablers are assumed to be authorized to receive 447 and handle data from initiators. As [RFC3552] explains, "we assume 448 that the end-systems engaging in a protocol exchange have not 449 themselves been compromised." However, privacy analysis requires 450 questioning this assumption since systems are often compromised for 451 the purpose of obtaining personal data. 453 Although recipients, intermediaries, and enablers may not generally 454 be considered as attackers, they may all pose privacy threats 455 (depending on the context) because they are able to observe, collect, 456 process, and transfer privacy-relevant data. These entities are 457 collectively described below as "observers" to distinguish them from 458 traditional attackers. From a privacy perspective, one important 459 type of attacker is an eavesdropper: an entity that passively 460 observes the initiator's communications without the initiator's 461 knowledge or authorization. 463 The threat descriptions in the next section explain how observers and 464 attackers might act to harm individuals' privacy. Different kinds of 465 attacks may be feasible at different points in the communications 466 path. For example, an observer could mount surveillance or 467 identification attacks between the initiator and intermediary, or 468 instead could surveil an enabler (e.g., by observing DNS queries from 469 the initiator). 471 5. Privacy Threats 472 Privacy harms come in a number of forms, including harms to financial 473 standing, reputation, solitude, autonomy, and safety. A victim of 474 identity theft or blackmail, for example, may suffer a financial loss 475 as a result. Reputational harm can occur when disclosure of 476 information about an individual, whether true or false, subjects that 477 individual to stigma, embarrassment, or loss of personal dignity. 478 Intrusion or interruption of an individual's life or activities can 479 harm the individual's ability to be left alone. When individuals or 480 their activities are monitored, exposed, or at risk of exposure, 481 those individuals may be stifled from expressing themselves, 482 associating with others, and generally conducting their lives freely. 483 They may also feel a general sense of unease, in that it is "creepy" 484 to be monitored or to have data collected about them. In cases where 485 such monitoring is for the purpose of stalking or violence (for 486 example, monitoring communications to or from a domestic abuse 487 shelter), it can put individuals in physical danger. 489 This section lists common privacy threats (drawing liberally from 490 [Solove], as well as [CoE]), showing how each of them may cause 491 individuals to incur privacy harms and providing examples of how 492 these threats can exist on the Internet. This threat modeling is 493 inspired by security threat analysis. Although it is not a perfect 494 fit for assessing privacy risks in Internet protocols and systems, no 495 better methodology has been developed to date. 497 Some privacy threats are already considered in Internet protocols as 498 a matter of routine security analysis. Others are more pure privacy 499 threats that existing security considerations do not usually address. 500 The threats described here are divided into those that may also be 501 considered security threats and those that are primarily privacy 502 threats. 504 Note that an individual's awareness of and consent to the practices 505 described below may change an individual's perception of and concern 506 for the extent to which they threaten privacy. If an individual 507 authorizes surveillance of his own activities, for example, the 508 individual may be able to take actions to mitigate the harms 509 associated with it, or may consider the risk of harm to be tolerable. 511 5.1. Combined Security-Privacy Threats 513 5.1.1. Surveillance 515 Surveillance is the observation or monitoring of an individual's 516 communications or activities. The effects of surveillance on the 517 individual can range from anxiety and discomfort to behavioral 518 changes such as inhibition and self-censorship to the perpetration of 519 violence against the individual. The individual need not be aware of 520 the surveillance for it to impact his or her privacy -- the 521 possibility of surveillance may be enough to harm individual 522 autonomy. 524 Surveillance can impact privacy even if the individuals being 525 surveilled are not identifiable or if their communications are 526 encrypted. For example, an observer or eavesdropper that conducts 527 traffic analysis may be able to determine what type of traffic is 528 present (real-time communications or bulk file transfers, for 529 example) or which protocols are in use even if the observed 530 communications are encrypted or the communicants are unidentifiable. 531 This kind of surveillance can adversely impact the individuals 532 involved by causing them to become targets for further investigation 533 or enforcement activities. It may also enable attacks that are 534 specific to the protocol, such as redirection to a specialized 535 interception point or protocol-specific denials of service. 536 Protocols that use predictable packet sizes or timing or include 537 fixed tokens at predictable offsets within a packet can facilitate 538 this kind of surveillance. 540 Surveillance can be conducted by observers or eavesdroppers at any 541 point along the communications path. Confidentiality protections (as 542 discussed in [RFC3552] Section 3) are necessary to prevent 543 surveillance of the content of communications. To prevent traffic 544 analysis or other surveillance of communications patterns, other 545 measures may be necessary, such as [Tor]. 547 5.1.2. Stored Data Compromise 549 End systems that do not take adequate measures to secure stored data 550 from unauthorized or inappropriate access expose individuals to 551 potential financial, reputational, or physical harm. 553 Protecting against stored data compromise is typically outside the 554 scope of IETF protocols. However, a number of common protocol 555 functions -- key management, access control, or operational logging, 556 for example -- require the storage of data about initiators of 557 communications. When requiring or recommending that information 558 about initiators or their communications be stored or logged by end 559 systems (see, e.g., RFC 6302 [RFC6302]), it is important to recognize 560 the potential for that information to be compromised and for that 561 potential to be weighed against the benefits of data storage. Any 562 recipient, intermediary, or enabler that stores data may be 563 vulnerable to compromise. (Note that stored data compromise is 564 distinct from purposeful disclosure, which is discussed in 565 Section 5.2.4.) 567 5.1.3. Intrusion 569 Intrusion consists of invasive acts that disturb or interrupt one's 570 life or activities. Intrusion can thwart individuals' desires to be 571 left alone, sap their time or attention, or interrupt their 572 activities. This threat is focused on intrusion into one's life 573 rather than direct intrusion into one's communications. The latter 574 is captured in Section 5.1.1. 576 Unsolicited messages and denial-of-service attacks are the most 577 common types of intrusion on the Internet. Intrusion can be 578 perpetrated by any attacker that is capable of sending unwanted 579 traffic to the initiator. 581 5.1.4. Misattribution 583 Misattribution occurs when data or communications related to one 584 individual are attributed to another. Misattribution can result in 585 adverse reputational, financial, or other consequences for 586 individuals that are misidentified. 588 Misattribution in the protocol context comes as a result of using 589 inadequate or insecure forms of identity or authentication, and is 590 sometimes related to spoofing. For example, as [RFC6269] notes, 591 abuse mitigation is often conducted on the basis of source IP 592 address, such that connections from individual IP addresses may be 593 prevented or temporarily blacklisted if abusive activity is 594 determined to be sourced from those addresses. However, in the case 595 where a single IP address is shared by multiple individuals, those 596 penalties may be suffered by all individuals sharing the address, 597 even if they were not involved in the abuse. This threat can be 598 mitigated by using identity management mechanisms with proper forms 599 of authentication (ideally with cryptographic properties) so that 600 actions can be attributed uniquely to an individual to provide the 601 basis for accountability without generating false-positives. 603 5.2. Privacy-Specific Threats 605 5.2.1. Correlation 607 Correlation is the combination of various pieces of information 608 related to an individual or that obtain that characteristic when 609 combined. Correlation can defy people's expectations of the limits 610 of what others know about them. It can increase the power that those 611 doing the correlating have over individuals as well as correlators' 612 ability to pass judgment, threatening individual autonomy and 613 reputation. 615 Correlation is closely related to identification. Internet protocols 616 can facilitate correlation by allowing individuals' activities to be 617 tracked and combined over time. The use of persistent or 618 infrequently replaced identifiers at any layer of the stack can 619 facilitate correlation. For example, an initiator's persistent use 620 of the same device ID, certificate, or email address across multiple 621 interactions could allow recipients (and observers) to correlate all 622 of the initiator's communications over time. 624 As an example, consider Transport Layer Security (TLS) session 625 resumption [RFC5246] or TLS session resumption without server side 626 state [RFC5077]. In RFC 5246 [RFC5246] a server provides the client 627 with a session_id in the ServerHello message and caches the 628 master_secret for later exchanges. When the client initiates a new 629 connection with the server it re-uses the previously obtained 630 session_id in its ClientHello message. The server agrees to resume 631 the session by using the same session_id and the previously stored 632 master_secret for the generation of the TLS Record Layer security 633 association. RFC 5077 [RFC5077] borrows from the session resumption 634 design idea but the server encapsulates all state information into a 635 ticket instead of caching it. An attacker who is able to observe the 636 protocol exchanges between the TLS client and the TLS server is able 637 to link the initial exchange to subsequently resumed TLS sessions 638 when the session_id and the ticket are exchanged in the clear (which 639 is the case with data exchanged in the initial handshake messages). 641 In theory any observer or attacker that receives an initiator's 642 communications can engage in correlation. The extent of the 643 potential for correlation will depend on what data the entity 644 receives from the initiator and has access to otherwise. Often, 645 intermediaries only require a small amount of information for message 646 routing and/or security. In theory, protocol mechanisms could ensure 647 that end-to-end information is not made accessible to these entities, 648 but in practice the difficulty of deploying end-to-end security 649 procedures, additional messaging or computational overhead, and other 650 business or legal requirements often slow or prevent the deployment 651 of end-to-end security mechanisms, giving intermediaries greater 652 exposure to initiators' data than is strictly necessary from a 653 technical point of view. 655 5.2.2. Identification 657 Identification is the linking of information to a particular 658 individual to infer an individual's identity or to allow the 659 inference of an individual's identity. In some contexts it is 660 perfectly legitimate to identify individuals, whereas in others 661 identification may potentially stifle individuals' activities or 662 expression by inhibiting their ability to be anonymous or 663 pseudonymous. Identification also makes it easier for individuals to 664 be explicitly controlled by others (e.g., governments) and to be 665 treated differentially compared to other individuals. 667 Many protocols provide functionality to convey the idea that some 668 means has been provided to validate that entities are who they claim 669 to be. Often, this is accomplished with cryptographic 670 authentication. Furthermore, many protocol identifiers, such as 671 those used in SIP or XMPP, may allow for the direct identification of 672 individuals. Protocol identifiers may also contribute indirectly to 673 identification via correlation. For example, a web site that does 674 not directly authenticate users may be able to match its HTTP header 675 logs with logs from another site that does authenticate users, 676 rendering users on the first site identifiable. 678 As with correlation, any observer or attacker may be able to engage 679 in identification depending on the information about the initiator 680 that is available via the protocol mechanism or other channels. 682 5.2.3. Secondary Use 684 Secondary use is the use of collected information about an individual 685 without the individual's consent for a purpose different from that 686 for which the information was collected. Secondary use may violate 687 people's expectations or desires. The potential for secondary use 688 can generate uncertainty as to how one's information will be used in 689 the future, potentially discouraging information exchange in the 690 first place. Secondary use encompasses any use of data, including 691 disclosure. 693 One example of secondary use would be an authentication server that 694 uses a network access server's Access-Requests to track an 695 initiator's location. Any observer or attacker could potentially 696 make unwanted secondary uses of initiators' data. Protecting against 697 secondary use is typically outside the scope of IETF protocols. 699 5.2.4. Disclosure 701 Disclosure is the revelation of information about an individual that 702 affects the way others judge the individual. Disclosure can violate 703 individuals' expectations of the confidentiality of the data they 704 share. The threat of disclosure may deter people from engaging in 705 certain activities for fear of reputational harm, or simply because 706 they do not wish to be observed. 708 Any observer or attacker that receives data about an initiator may 709 engage in disclosure. Sometimes disclosure is unintentional because 710 system designers do not realize that information being exchanged 711 relates to individuals. The most common way for protocols to limit 712 disclosure is by providing access control mechanisms (discussed in 713 Section 5.2.5). A further example is provided by the IETF 714 geolocation privacy architecture [RFC6280], which supports a way for 715 users to express a preference that their location information not be 716 disclosed beyond the intended recipient. 718 5.2.5. Exclusion 720 Exclusion is the failure to allow individuals to know about the data 721 that others have about them and to participate in its handling and 722 use. Exclusion reduces accountability on the part of entities that 723 maintain information about people and creates a sense of 724 vulnerability about individuals' ability to control how information 725 about them is collected and used. 727 The most common way for Internet protocols to be involved in 728 enforcing exclusion is through access control mechanisms. The 729 presence architecture developed in the IETF is a good example where 730 individuals are included in the control of information about them. 731 Using a rules expression language (e.g., Presence Authorization Rules 732 [RFC5025]), presence clients can authorize the specific conditions 733 under which their presence information may be shared. 735 Exclusion is primarily considered problematic when the recipient 736 fails to involve the initiator in decisions about data collection, 737 handling, and use. Eavesdroppers engage in exclusion by their very 738 nature since their data collection and handling practices are covert. 740 6. Threat Mitigations 742 Privacy is notoriously difficult to measure and quantify. The extent 743 to which a particular protocol, system, or architecture "protects" or 744 "enhances" privacy is dependent on a large number of factors relating 745 to its design, use, and potential misuse. However, there are certain 746 widely recognized classes of mitigations against the threats 747 discussed in Section 5. This section describes three categories of 748 relevant mitigations: (1) data minimization, (2) user participation, 749 and (3) security. The privacy mitigations described in this chapter 750 can loosely be mapped to existing privacy principles, such as the 751 Fair Information Practices, but they have been adapted to fit the 752 target audience of this document. 754 6.1. Data Minimization 756 Data minimization refers to collecting, using, disclosing, and 757 storing the minimal data necessary to perform a task. Reducing the 758 amount of data exchanged reduces the amount of data that can be 759 misused or leaked. 761 Data minimization can be effectuated in a number of different ways, 762 including by limiting collection, use, disclosure, retention, 763 identifiability, sensitivity, and access to personal data. Limiting 764 the data collected by protocol elements to only what is necessary 765 (collection limitation) is the most straightforward way to help 766 reduce privacy risks associated with the use of the protocol. In 767 some cases, protocol designers may also be able to recommend limits 768 to the use or retention of data, although protocols themselves are 769 not often capable of controlling these properties. 771 However, the most direct application of data minimization to protocol 772 design is limiting identifiability. Reducing the identifiability of 773 data by using pseudonyms or no identifiers at all helps to weaken the 774 link between an individual and his or her communications. Allowing 775 for the periodic creation of new or randomized identifiers reduces 776 the possibility that multiple protocol interactions or communications 777 can be correlated back to the same individual. The following 778 sections explore a number of different properties related to 779 identifiability that protocol designers may seek to achieve. 781 Data minimization mitigates the following threats: surveillance, 782 stored data compromise, correlation, identification, secondary use, 783 disclosure. 785 6.1.1. Anonymity 787 To enable anonymity of an individual, there must exist a set of 788 individuals that appear to have the same attribute(s) as the 789 individual. To the attacker or the observer these individuals must 790 appear indistinguishable from each other. The set of all such 791 individuals is known as the anonymity set and membership of this set 792 may vary over time. 794 The composition of the anonymity set depends on the knowledge of the 795 observer or attacker. Thus anonymity is relative with respect to the 796 observer or attacker. An initiator may be anonymous only within a 797 set of potential initiators -- its initiator anonymity set -- which 798 itself may be a subset of all individuals that may initiate 799 communications. Conversely, a recipient may be anonymous only within 800 a set of potential recipients -- its recipient anonymity set. Both 801 anonymity sets may be disjoint, may overlap, or may be the same. 803 As an example, consider RFC 3325 (P-Asserted-Identity, PAI) 804 [RFC3325], an extension for the Session Initiation Protocol (SIP), 805 that allows an individual, such as a VoIP caller, to instruct an 806 intermediary that he or she trusts not to populate the SIP From 807 header field with the individual's authenticated and verified 808 identity. The recipient of the call, as well as any other entity 809 outside of the individual's trust domain, would therefore only learn 810 that the SIP message (typically a SIP INVITE) was sent with a header 811 field 'From: "Anonymous" ' rather 812 than the individual's address-of-record, which is typically thought 813 of as the "public address" of the user. When PAI is used, the 814 individual becomes anonymous within the initiator anonymity set that 815 is populated by every individual making use of that specific 816 intermediary. 818 Note that this example ignores the fact that the recipient may infer 819 or obtain personal data from the other SIP protocol payloads (e.g., 820 SIP Via and Contact headers, SDP). The implication is that PAI only 821 attempts to address a particular threat, namely the disclosure of 822 identity in the From header) with respect to the recipient. This 823 caveat makes the analysis of the specific protocol extension easier 824 but cannot be assumed when conducting analysis of an entire 825 architecture. 827 6.1.2. Pseudonymity 829 In the context of Internet protocols, almost all identifiers can be 830 nicknames or pseudonyms since there is typically no requirement to 831 use personal names in protocols. However, in certain scenarios it is 832 reasonable to assume that personal names will be used (with vCard 833 [RFC6350], for example). 835 Pseudonymity is strengthened when less personal data can be linked to 836 the pseudonym; when the same pseudonym is used less often and across 837 fewer contexts; and when independently chosen pseudonyms are more 838 frequently used for new actions (making them, from an observer's or 839 attacker's perspective, unlinkable). 841 For Internet protocols it is important whether protocols allow 842 pseudonyms to be changed without human interaction, the default 843 length of pseudonym lifetimes, to whom pseudonyms are exposed, how 844 individuals are able to control disclosure, how often pseudonyms can 845 be changed, and the consequences of changing them. 847 6.1.3. Identity Confidentiality 849 An initiator has identity confidentiality when any party other than 850 the recipient cannot sufficiently identify the initiator within the 851 anonymity set. The size of the anonymity set has a direct impact on 852 identity confidentiality since the smaller the set is, the easier it 853 is to identify the initiator. Identity confidentiality aims to 854 provide a protection against eavesdroppers and intermediaries rather 855 than against the intended communication end points. 857 As an example, consider the network access authentication procedures 858 utilizing the Extensible Authentication Protocol (EAP) [RFC3748]. 859 EAP includes an identity exchange where the Identity Response is 860 primarily used for routing purposes and selecting which EAP method to 861 use. Since EAP Identity Requests and Responses are sent in 862 cleartext, eavesdroppers and intermediaries along the communication 863 path between the EAP peer and the EAP server can snoop on the 864 identity, which is encoded in the form of the Network Access 865 Identifier (NAI) defined in RFC 4282 [RFC4282]). To address this 866 threat, as discussed in RFC 4282 [RFC4282], the username part of the 867 NAI (but not the realm-part) can be hidden from these eavesdroppers 868 and intermediaries with the cryptographic support offered by EAP 869 methods. Identity confidentiality has become a recommended design 870 criteria for EAP (see [RFC4017]). EAP-AKA [RFC4187], for example, 871 protects the EAP peer's identity against passive adversaries by 872 utilizing temporal identities. EAP-IKEv2 [RFC5106] is an example of 873 an EAP method that offers protection against active attackers with 874 regard to the individual's identity. 876 6.1.4. Data Minimization within Identity Management 878 Modern systems are increasingly relying on multi-party transactions 879 to authenticate individuals. Many of these systems make use of an 880 identity provider that is responsible for providing authentication, 881 authorization, and accounting functionality to relying parties that 882 offer some protected resources. To facilitate these functions an 883 identity provider will usually go through a process of verifying the 884 individual's identity and issuing credentials to the individual. 885 When an individual seeks to make use of a service provided by the 886 relying party, the relying party relies on the authentication 887 assertions provided by its identity provider. Note that in more 888 sophisticated scenarios the authentication assertions are traits that 889 demonstrate the individual's capabilities and roles. The 890 authorization responsibility may also be shared between the identity 891 provider and the relying party and does not necessarily need to 892 reside only with the identity provider. 894 Such systems have the ability to support a number of properties that 895 minimize data collection in different ways: 897 In certain use cases relying parties do not need to know the real 898 name or date of birth of an individual (for example, when the 899 individual's age is the only attribute that needs to be 900 authenticated). 902 Relying parties that collude can be prevented from using an 903 individual's credentials to track the individual. That is, two 904 different relying parties can be prevented from determining that 905 the same individual has authenticated to both of them. This 906 typically requires identity management protocol support and as 907 well as support by both the relying party and the identity 908 provider. 910 The identity provider can be prevented from knowing which relying 911 parties an individual interacted with. This requires, at a 912 minimum, avoiding direct communication between the identity 913 provider and the relying party at the time when access to a 914 resource by the initiator is made. 916 6.2. User Participation 918 As explained in Section 5.2.5, data collection and use that happens 919 "in secret," without the individual's knowledge, is apt to violate 920 the individual's expectation of privacy and may create incentives for 921 misuse of data. As a result, privacy regimes tend to include 922 provisions to require informing individuals about data collection and 923 use and involving them in decisions about the treatment of their 924 data. In an engineering context, supporting the goal of user 925 participation usually means providing ways for users to control the 926 data that is shared about them. It may also mean providing ways for 927 users to signal how they expect their data to be used and shared. 928 Different protocol and architectural designs can make supporting user 929 participation (for example, the ability to support a dialog box for 930 user interaction) easier or harder; for example, OAUTH-based services 931 may have more natural hooks for user input than Authentication, 932 Authorization, and Accounting (AAA) services. 934 User participation mitigates the following threats: surveillance, 935 secondary use, disclosure, exclusion 937 6.3. Security 939 Keeping data secure at rest and in transit is another important 940 component of privacy protection. As they are described in [RFC3552] 941 Section 2, a number of security goals also serve to enhance privacy: 943 o Confidentiality: Keeping data secret from unintended listeners. 945 o Peer entity authentication: Ensuring that the endpoint of a 946 communication is the one that is intended (in support of 947 maintaining confidentiality). 949 o Unauthorized usage: Limiting data access to only those users who 950 are authorized. (Note that this goal also falls within data 951 minimization.) 953 o Inappropriate usage: Limiting how authorized users can use data. 954 (Note that this goal also falls within data minimization.) 956 Note that even when these goals are achieved, the existence of items 957 of interest -- attributes, identifiers, identities, communications, 958 actions (such as the sending or receiving of a communication), or 959 anything else an attacker or observer might be interested in -- may 960 still be detectable, even if they are not readable. Thus 961 undetectability, in which an observer or attacker cannot sufficiently 962 distinguish whether an item of interest exists or not, may be 963 considered as a further security goal (albeit one that can be 964 extremely difficult to accomplish). 966 Detection of the protocols or applications in use via traffic 967 analysis may be particularly difficult to defend against. As with 968 the anonymity of individuals, achieving "protocol anonymity" requires 969 that multiple protocols or applications exist that appear to have the 970 same attributes -- packet sizes, content, token locations, or inter- 971 packet timing, for example. An attacker or observer will not be able 972 to use traffic analysis to identify which protocol or application is 973 in use if multiple protocols or applications are indistinguishable. 975 Defending against the threat of traffic analysis will be possible to 976 different extents for different protocols, may depend on 977 implementation- or use-specific details, and may depend on which 978 other protocols already exist and whether they share similar traffic 979 characteristics. The defenses will also vary depending on what the 980 protocol is designed to do; for example, in some situations 981 randomizing packet sizes, timing, or token locations will reduce the 982 threat of traffic analysis, whereas in other situations (real-time 983 communications, for example) holding some or all of those factors 984 constant is a more appropriate defense. See "Guidelines for the Use 985 of Variable Bit Rate Audio with Secure RTP" [RFC6562] for an example 986 of how these kinds of tradeoffs should be evaluated. 988 By providing proper security protection the following threats can be 989 mitigated: surveillance, stored data compromise, misattribution, 990 secondary use, disclosure, intrusion 992 7. Guidelines 994 This section provides guidance for document authors in the form of a 995 questionnaire about a protocol being designed. The questionnaire may 996 be useful at any point in the design process, particularly after 997 document authors have developed a high-level protocol model as 998 described in [RFC4101]. 1000 Note that the guidance does not recommend specific practices. The 1001 range of protocols developed in the IETF is too broad to make 1002 recommendations about particular uses of data or how privacy might be 1003 balanced against other design goals. However, by carefully 1004 considering the answers to each question, document authors should be 1005 able to produce a comprehensive analysis that can serve as the basis 1006 for discussion of whether the protocol adequately protects against 1007 privacy threats. The guidance is meant to help the thought process 1008 of privacy analysis; it does not provide specific directions for how 1009 to write a privacy considerations section. 1011 The framework is divided into four sections that address each of the 1012 mitigation classes from Section 6, plus a general section. Security 1013 is not fully elaborated since substantial guidance already exists in 1014 [RFC3552]. 1016 7.1. Data Minimization 1018 a. Identifiers. What identifiers does the protocol use for 1019 distinguishing initiators of communications? Does the protocol 1020 use identifiers that allow different protocol interactions to be 1021 correlated? What identifiers could be omitted or be made less 1022 identifying while still fulfilling the protocol's goals? 1024 b. Data. What information does the protocol expose about 1025 individuals, their devices, and/or their device usage (other than 1026 the identifiers discussed in (a))? To what extent is this 1027 information linked to the identities of the individuals? How does 1028 the protocol combine personal data with the identifiers discussed 1029 in (a)? 1031 c. Observers. Which information discussed in (a) and (b) is 1032 exposed to each other protocol entity (i.e., recipients, 1033 intermediaries, and enablers)? Are there ways for protocol 1034 implementers to choose to limit the information shared with each 1035 entity? Are there operational controls available to limit the 1036 information shared with each entity? 1038 d. Fingerprinting. In many cases the specific ordering and/or 1039 occurrences of information elements in a protocol allow users, 1040 devices, or software using the protocol to be fingerprinted. Is 1041 this protocol vulnerable to fingerprinting? If so, how? Can it 1042 be designed to reduce or eliminate the vulnerability? If not, why 1043 not? 1045 e. Persistence of identifiers. What assumptions are made in the 1046 protocol design about the lifetime of the identifiers discussed in 1047 (a)? Does the protocol allow implementers or users to delete or 1048 replace identifiers? How often does the specification recommend 1049 to delete or replace identifiers by default? Can the identifiers, 1050 along with other state information, be set to automatically 1051 expire? 1053 f. Correlation. Does the protocol allow for correlation of 1054 identifiers? Are there expected ways that information exposed by 1055 the protocol will be combined or correlated with information 1056 obtained outside the protocol? How will such combination or 1057 correlation facilitate fingerprinting of a user, device, or 1058 application? Are there expected combinations or correlations with 1059 outside data that will make users of the protocol more 1060 identifiable? 1062 g. Retention. Does the protocol or its anticipated uses require 1063 that the information discussed in (a) or (b) be retained by 1064 recipients, intermediaries, or enablers? If so, why? Is the 1065 retention expected to be persistent or temporary? 1067 7.2. User Participation 1069 a. User control. What controls or consent mechanisms does the 1070 protocol define or require before personal data or identifiers are 1071 shared or exposed via the protocol? If no such mechanisms or 1072 controls are specified, is it expected that control and consent 1073 will be handled outside of the protocol? 1075 b. Control over sharing with individual recipients. Does the 1076 protocol provide ways for initiators to share different 1077 information with different recipients? If not, are there 1078 mechanisms that exist outside of the protocol to provide 1079 initiators with such control? 1081 c. Control over sharing with intermediaries. Does the protocol 1082 provide ways for initiators to limit which information is shared 1083 with intermediaries? If not, are there mechanisms that exist 1084 outside of the protocol to provide users with such control? Is it 1085 expected that users will have relationships that govern the use of 1086 the information (contractual or otherwise) with those who operate 1087 these intermediaries? 1088 d. Preference expression. Does the protocol provide ways for 1089 initiators to express individuals' preferences to recipients or 1090 intermediaries with regard to the collection, use, or disclosure 1091 of their personal data? 1093 7.3. Security 1095 a. Surveillance. How do the protocol's security considerations 1096 prevent surveillance, including eavesdropping and traffic 1097 analysis? Does the protocol leak information that can be observed 1098 through traffic analysis, such as by using a fixed token at fixed 1099 offsets, or packet sizes or timing that allow observers to 1100 determine characteristics of the traffic (e.g., which protocol is 1101 in use or whether the traffic is part of a real-time flow)? 1103 b. Stored data compromise. How do the protocol's security 1104 considerations prevent or mitigate stored data compromise? 1106 c. Intrusion. How do the protocol's security considerations 1107 prevent or mitigate intrusion, including denial-of-service attacks 1108 and unsolicited communications more generally? 1110 d. Misattribution. How do the protocol's mechanisms for 1111 identifying and/or authenticating individuals prevent 1112 misattribution? 1114 7.4. General 1116 a. Trade-offs. Does the protocol make trade-offs between privacy 1117 and usability, privacy and efficiency, privacy and 1118 implementability, or privacy and other design goals? Describe the 1119 trade-offs and the rationale for the design chosen. 1121 b. Defaults. If the protocol can be operated in multiple modes 1122 or with multiple configurable options, does the default mode or 1123 option minimize the amount, identifiability, and persistence of 1124 the data and identifiers exposed by the protocol? Does the 1125 default mode or option maximize the opportunity for user 1126 participation? Does it provide the strictest security features of 1127 all the modes/options? If any of these answers are no, explain 1128 why less protective defaults were chosen. 1130 8. Example 1132 The following section gives an example of the threat analysis and 1133 threat mitigation recommended by this document. It covers a 1134 particularly difficult application protocol, presence, to try to 1135 demonstrate these principles on an architecture that is vulnerable to 1136 many of the threats described above. This text is not intended as an 1137 example of a Privacy Considerations section that might appear in an 1138 IETF specification, but rather as an example of the thinking that 1139 should go into the design of a protocol when considering privacy as a 1140 first principle. 1142 A presence service, as defined in the abstract in [RFC2778], allows 1143 users of a communications service to monitor one another's 1144 availability and disposition in order to make decisions about 1145 communicating. Presence information is highly dynamic, and generally 1146 characterizes whether a user is online or offline, busy or idle, away 1147 from communications devices or nearby, and the like. Necessarily, 1148 this information has certain privacy implications, and from the start 1149 the IETF approached this work with the aim of providing users with 1150 the controls to determine how their presence information would be 1151 shared. The Common Profile for Presence (CPP) [RFC3859] defines a 1152 set of logical operations for delivery of presence information. This 1153 abstract model is applicable to multiple presence systems. The SIP- 1154 based SIMPLE presence system [RFC3261] uses CPP as its baseline 1155 architecture, and the presence operations in the Extensible Messaging 1156 and Presence Protocol (XMPP) have also been mapped to CPP [RFC3922]. 1158 The fundamental architecture defined in RFC 2778 and RFC 3859 is a 1159 mediated one. Clients (presentities in RFC 2778 terms) publish their 1160 presence information to presence servers, which in turn distribute 1161 information to authorized watchers. Presence servers thus retain 1162 presence information for an interval of time, until it either changes 1163 or expires, so that it can be revealed to authorized watchers upon 1164 request. This architecture mirrors existing pre-standard deployment 1165 models. The integration of an explicit authorization mechanism into 1166 the presence architecture has been widely successful in involving the 1167 end users in the decision making process before sharing information. 1168 Nearly all presence systems deployed today provide such a mechanism, 1169 typically through a reciprocal authorization system by which a pair 1170 of users, when they agree to be "buddies," consent to divulge their 1171 presence information to one another. Buddylists are managed by 1172 servers but controlled by end users. Users can also explicitly block 1173 one another through a similar interface, and in some deployments it 1174 is desirable to provide "polite blocking" of various kinds. 1176 From a perspective of privacy design, however, the classical presence 1177 architecture represents nearly a worst-case scenario. In terms of 1178 data minimization, presentities share their sensitive information 1179 with presence services, and while services only share this presence 1180 information with watchers authorized by the user, no technical 1181 mechanism constrains those watchers from relaying presence to further 1182 third parties. Any of these entities could conceivably log or retain 1183 presence information indefinitely. The sensitivity cannot be 1184 mitigated by rendering the user anonymous, as it is indeed the 1185 purpose of the system to facilitate communications between users who 1186 know one another. The identifiers employed by users are long-lived 1187 and often contain personal information, including personal names and 1188 the domains of service providers. While users do participate in the 1189 construction of buddylists and blacklists, they do so with little 1190 prospect for accountability: the user effectively throws their 1191 presence information over the wall to a presence server that in turn 1192 distributes the information to watchers. Users typically have no way 1193 to verify that presence is being distributed only to authorized 1194 watchers, especially as it is the server that authenticates watchers, 1195 not the end user. Connections between the server and all publishers 1196 and consumers of presence data are moreover an attractive target for 1197 eavesdroppers, and require strong confidentiality mechanisms, though 1198 again the end user has no way to verify what mechanisms are in place 1199 between the presence server and a watcher. 1201 Moreover, the sensitivity of presence information is not limited to 1202 the disposition and capability to communicate. Capabilities can 1203 reveal the type of device that a user employs, for example, and since 1204 multiple devices can publish the same user's presence, there are 1205 significant risks of allowing attackers to correlate user devices. 1206 An important extension to presence was developed to enable the 1207 support for location sharing. The effort to standardize protocols 1208 for systems sharing geolocation was started in the GEOPRIV working 1209 group. During the initial requirements and privacy threat analysis 1210 in the process of chartering the working group, it became clear that 1211 the system would require an underlying communication mechanism 1212 supporting user consent to share location information. The 1213 resemblance of these requirements to the presence framework was 1214 quickly recognized, and this design decision was documented in 1215 [RFC4079]. Location information thus mingles with other presence 1216 information available through the system to intermediaries and to 1217 authorized watchers. 1219 Privacy concerns about presence information largely arise due to the 1220 built-in mediation of the presence architecture. The need for a 1221 presence server is motivated by two primary design requirements of 1222 presence: in the first place, the server can respond with an 1223 "offline" indication when the user is not online; in the second 1224 place, the server can compose presence information published by 1225 different devices under the user's control. Additionally, to 1226 facilitate the use of URIs as identifiers for entities, some service 1227 must operate a host with the domain name appearing in a presence URI, 1228 and in practical terms no commercial presence architecture would 1229 force end users to own and operate their own domain names. Many end 1230 users of applications like presence are behind NATs or firewalls, and 1231 effectively cannot receive direct connections from the Internet - the 1232 persistent bidirectional channel these clients open and maintain with 1233 a presence server is essential to the operation of the protocol. 1235 One must first ask if the trade-off of mediation for presence is 1236 worthwhile. Does a server need to be in the middle of all 1237 publications of presence information? It might seem that end-to-end 1238 encryption of the presence information could solve many of these 1239 problems. A presentity could encrypt the presence information with 1240 the public key of a watcher, and only then send the presence 1241 information through the server. The IETF defined an object format 1242 for presence information called the Presence Information Data Format 1243 (PIDF), which for the purposes of conveying location information was 1244 extended to the PIDF Location Object (PIDF-LO) - these XML objects 1245 were designed to accommodate an encrypted wrapper. Encrypting this 1246 data would have the added benefit of preventing stored cleartext 1247 presence information from being seized by an attacker who manages to 1248 compromise a presence server. This proposal, however, quickly runs 1249 into usability problems. Discovering the public keys of watchers is 1250 the first difficulty, one that few Internet protocols have addressed 1251 successfully. This solution would then require the presentity to 1252 publish one encrypted copy of its presence information per authorized 1253 watcher to the presence service, regardless of whether or not a 1254 watcher is actively seeking presence information - for a presentity 1255 with many watchers, this may place an unacceptable burden on the 1256 presence server, especially given the dynamism of presence 1257 information. Finally, it prevents the server from composing presence 1258 information reported by multiple devices under the same user's 1259 control. On the whole, these difficulties render object encryption 1260 of presence information a doubtful prospect. 1262 Some protocols that support presence information, such as SIP, can 1263 operate intermediaries in a redirecting mode, rather than a 1264 publishing or proxying mode. Instead of sending presence information 1265 through the server, in other words, these protocols can merely 1266 redirect watchers to the presentity, and then presence information 1267 could pass directly and securely from the presentity to the watcher. 1268 It is worth noting that this would disclose the IP address of the 1269 presentity to the watcher, which has its own set of risks. In that 1270 case, the presentity can decide exactly what information it would 1271 like to share with the watcher in question, it can authenticate the 1272 watcher itself with whatever strength of credential it chooses, and 1273 with end-to-end encryption it can reduce the likelihood of any 1274 eavesdropping. In a redirection architecture, a presence server 1275 could still provide the necessary "offline" indication, without 1276 requiring the presence server to observe and forward all information 1277 itself. This mechanism is more promising than encryption, but also 1278 suffers from significant difficulties. It too does not provide for 1279 composition of presence information from multiple devices - it in 1280 fact forces the watcher to perform this composition itself. The 1281 largest single impediment to this approach is however the difficulty 1282 of creating end-to-end connections between the presentity's device(s) 1283 and a watcher, as some or all of these endpoints may be behind NATs 1284 or firewalls that prevent peer-to-peer connections. While there are 1285 potential solutions for this problem, like STUN and TURN, they add 1286 complexity to the overall system. 1288 Consequently, mediation is a difficult feature of the presence 1289 architecture to remove. Especially due to the requirement for 1290 composition, it is hard to minimize the data shared with 1291 intermediaries. Control over sharing with intermediaries must 1292 therefore come from some other explicit component of the 1293 architecture. As such, the presence work in the IETF focused on 1294 improving the user participation in the activities of the presence 1295 server. This work began in the GEOPRIV working group, with controls 1296 on location privacy, as location of users is perceived as having 1297 especially sensitive properties. With the aim of meeting the privacy 1298 requirements defined in [RFC2779], a set of usage indications, such 1299 as whether retransmission is allowed or when the retention period 1300 expires, have been added to the PIDF-LO such that they always travel 1301 with location information itself. These privacy preferences apply 1302 not only to the intermediaries that store and forward presence 1303 information, but also to the watchers who consume it. 1305 This approach very much follows the spirit of Creative Commons [CC], 1306 namely the usage of a limited number of conditions (such as 'Share 1307 Alike' [CC-SA]). Unlike Creative Commons, the GEOPRIV working group 1308 did not, however, initiate work to produce legal language nor to 1309 design graphical icons since this would fall outside the scope of the 1310 IETF. In particular, the GEOPRIV rules state a preference on the 1311 retention and retransmission of location information; while GEOPRIV 1312 cannot force any entity receiving a PIDF-LO object to abide by those 1313 preferences, if users lack the ability to express them at all, we can 1314 guarantee their preferences will not be honored. The GEOPRIV rules 1315 can provide a means to establish accountability. 1317 The retention and retransmission elements were envisioned as the most 1318 essential examples of preference expression in sharing presence. The 1319 PIDF object was designed for extensibility, and the rulesets created 1320 for PIDF-LO can also be extended to provide new expressions of user 1321 preference. Not all user preference information should be bound into 1322 a particular PIDF object, however; many forms of access control 1323 policy assumed by the presence architecture need to be provisioned in 1324 the presence server by some interface with the user. This 1325 requirement eventually triggered the standardization of a general 1326 access control policy language called the Common Policy (defined in 1327 [RFC4745]) framework. This language allows one to express ways to 1328 control the distribution of information as simple conditions, 1329 actions, and transformations rules expressed in an XML format. 1330 Common Policy itself is an abstract format which needs to be 1331 instantiated: two examples can be found with the Presence 1332 Authorization Rules [RFC5025] and the Geolocation Policy [RFC6772]. 1333 The former provides additional expressiveness for presence based 1334 systems, while the latter defines syntax and semantic for location 1335 based conditions and transformations. 1337 Ultimately, the privacy work on presence represents a compromise 1338 between privacy principles and the needs of the architecture and 1339 marketplace. While it was not feasible to remove intermediaries from 1340 the architecture entirely, nor to prevent their access to presence 1341 information, the IETF did provide a way for users to express their 1342 preferences and provision their controls at the presence service. We 1343 have not had great successes in the implementation space with privacy 1344 mechanisms thus far, but by documenting and acknowledging the 1345 limitations of these mechanisms, the designers were able to provide 1346 implementers, and end users, with an informed perspective on the 1347 privacy properties of the IETF's presence protocols. 1349 9. Security Considerations 1351 This document describes privacy aspects that protocol designers 1352 should consider in addition to regular security analysis. 1354 10. IANA Considerations 1356 This document does not require actions by IANA. 1358 11. Acknowledgements 1360 We would like to thank Christine Runnegar for her extensive helpful 1361 review comments. 1363 We would like to thank Scott Brim, Kasey Chappelle, Marc Linsner, 1364 Bryan McLaughlin, Nick Mathewson, Eric Rescorla, Scott Bradner, Nat 1365 Sakimura, Bjoern Hoehrmann, David Singer, Dean Willis, Lucy Lynch, 1366 Trent Adams, Mark Lizar, Martin Thomson, Josh Howlett, Mischa 1367 Tuffield, S. Moonesamy, Zhou Sujing, Claudia Diaz, Leif Johansson, 1368 Jeff Hodges, Stephen Farrel, Steven Johnston, Cullen Jennings, Ted 1369 Hardie, Dave Thaler, Klaas Wierenga, Adrian Farrell, Stephane 1370 Bortzmeyer, Dave Crocker, and Hector Santos for their useful feedback 1371 on this document. 1373 Finally, we would like to thank the participants for the feedback 1374 they provided during the December 2010 Internet Privacy workshop co- 1375 organized by MIT, ISOC, W3C and the IAB. 1377 12. IAB Members at the Time of Approval 1379 Bernard Aboba 1381 Jari Arkko 1383 Marc Blanchet 1385 Ross Callon 1387 Alissa Cooper 1389 Spencer Dawkins 1391 Joel Halpern 1393 Russ Housley 1395 Eliot Lear 1397 Xing Li 1399 Andrew Sullivan 1401 Dave Thaler 1403 Hannes Tschofenig 1405 13. Informative References 1407 [CC-SA] Creative Commons, "Share Alike", 2012. 1409 [CC] Creative Commons, "Creative Commons", 2012. 1411 [CoE] Council of Europe, "Recommendation CM/Rec(2010)13 of the 1412 Committee of Ministers to member states on the protection 1413 of individuals with regard to automatic processing of 1414 personal data in the context of profiling", available at 1415 (November 2010) , https://wcd.coe.int/ViewDoc.jsp?Ref=CM/ 1416 Rec%282010%2913, 2010. 1418 [EFF] Electronic Frontier Foundation, "Panopticlick", 2011. 1420 [FIPs] Gellman, B., "Fair Information Practices: A Basic 1421 History", 2012. 1423 [OECD] Organisation for Economic Co-operation and Development, 1424 "OECD Guidelines on the Protection of Privacy and 1425 Transborder Flows of Personal Data", available at 1426 (September 2010) , http://www.oecd.org/EN/document/0,,EN- 1427 document-0-nodirectorate-no-24-10255-0,00.html, 1980. 1429 [PbD] Office of the Information and Privacy Commissioner, 1430 Ontario, Canada, "Privacy by Design", 2011. 1432 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 1433 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 1434 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 1436 [RFC2778] Day, M., Rosenberg, J., and H. Sugano, "A Model for 1437 Presence and Instant Messaging", RFC 2778, February 2000. 1439 [RFC2779] Day, M., Aggarwal, S., Mohr, G., and J. Vincent, "Instant 1440 Messaging / Presence Protocol Requirements", RFC 2779, 1441 February 2000. 1443 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1444 A., Peterson, J., Sparks, R., Handley, M., and E. 1445 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1446 June 2002. 1448 [RFC3325] Jennings, C., Peterson, J., and M. Watson, "Private 1449 Extensions to the Session Initiation Protocol (SIP) for 1450 Asserted Identity within Trusted Networks", RFC 3325, 1451 November 2002. 1453 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 1454 Text on Security Considerations", BCP 72, RFC 3552, July 1455 2003. 1457 [RFC3748] Aboba, B., Blunk, L., Vollbrecht, J., Carlson, J., and H. 1458 Levkowetz, "Extensible Authentication Protocol (EAP)", RFC 1459 3748, June 2004. 1461 [RFC3859] Peterson, J., "Common Profile for Presence (CPP)", RFC 1462 3859, August 2004. 1464 [RFC3922] Saint-Andre, P., "Mapping the Extensible Messaging and 1465 Presence Protocol (XMPP) to Common Presence and Instant 1466 Messaging (CPIM)", RFC 3922, October 2004. 1468 [RFC4017] Stanley, D., Walker, J., and B. Aboba, "Extensible 1469 Authentication Protocol (EAP) Method Requirements for 1470 Wireless LANs", RFC 4017, March 2005. 1472 [RFC4079] Peterson, J., "A Presence Architecture for the 1473 Distribution of GEOPRIV Location Objects", RFC 4079, July 1474 2005. 1476 [RFC4101] Rescorla, E. IAB, "Writing Protocol Models", RFC 4101, 1477 June 2005. 1479 [RFC4187] Arkko, J. and H. Haverinen, "Extensible Authentication 1480 Protocol Method for 3rd Generation Authentication and Key 1481 Agreement (EAP-AKA)", RFC 4187, January 2006. 1483 [RFC4282] Aboba, B., Beadles, M., Arkko, J., and P. Eronen, "The 1484 Network Access Identifier", RFC 4282, December 2005. 1486 [RFC4745] Schulzrinne, H., Tschofenig, H., Morris, J., Cuellar, J., 1487 Polk, J., and J. Rosenberg, "Common Policy: A Document 1488 Format for Expressing Privacy Preferences", RFC 4745, 1489 February 2007. 1491 [RFC4918] Dusseault, L., "HTTP Extensions for Web Distributed 1492 Authoring and Versioning (WebDAV)", RFC 4918, June 2007. 1494 [RFC4949] Shirey, R., "Internet Security Glossary, Version 2", RFC 1495 4949, August 2007. 1497 [RFC5025] Rosenberg, J., "Presence Authorization Rules", RFC 5025, 1498 December 2007. 1500 [RFC5077] Salowey, J., Zhou, H., Eronen, P., and H. Tschofenig, 1501 "Transport Layer Security (TLS) Session Resumption without 1502 Server-Side State", RFC 5077, January 2008. 1504 [RFC5106] Tschofenig, H., Kroeselberg, D., Pashalidis, A., Ohba, Y., 1505 and F. Bersani, "The Extensible Authentication Protocol- 1506 Internet Key Exchange Protocol version 2 (EAP-IKEv2) 1507 Method", RFC 5106, February 2008. 1509 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 1510 (TLS) Protocol Version 1.2", RFC 5246, August 2008. 1512 [RFC6269] Ford, M., Boucadair, M., Durand, A., Levis, P., and P. 1513 Roberts, "Issues with IP Address Sharing", RFC 6269, June 1514 2011. 1516 [RFC6280] Barnes, R., Lepinski, M., Cooper, A., Morris, J., 1517 Tschofenig, H., and H. Schulzrinne, "An Architecture for 1518 Location and Location Privacy in Internet Applications", 1519 BCP 160, RFC 6280, July 2011. 1521 [RFC6302] Durand, A., Gashinsky, I., Lee, D., and S. Sheppard, 1522 "Logging Recommendations for Internet-Facing Servers", BCP 1523 162, RFC 6302, June 2011. 1525 [RFC6350] Perreault, S., "vCard Format Specification", RFC 6350, 1526 August 2011. 1528 [RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of 1529 Variable Bit Rate Audio with Secure RTP", RFC 6562, March 1530 2012. 1532 [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the 1533 Opus Audio Codec", RFC 6716, September 2012. 1535 [RFC6772] Schulzrinne, H., Tschofenig, H., Cuellar, J., Polk, J., 1536 Morris, J., and M. Thomson, "Geolocation Policy: A 1537 Document Format for Expressing Privacy Preferences for 1538 Location Information", RFC 6772, January 2013. 1540 [Solove] Solove, D.J., "Understanding Privacy", 2010. 1542 [Tor] The Tor Project, Inc., "Tor", 2011. 1544 [Westin] Kumaraguru, P. and L. Cranor, "Privacy Indexes: A Survey 1545 of Westin's Studies", 2005. 1547 Authors' Addresses 1549 Alissa Cooper 1550 CDT 1551 1634 Eye St. NW, Suite 1100 1552 Washington, DC 20006 1553 US 1555 Phone: +1-202-637-9800 1556 Email: acooper@cdt.org 1557 URI: http://www.cdt.org/ 1559 Hannes Tschofenig 1560 Nokia Siemens Networks 1561 Linnoitustie 6 1562 Espoo 02600 1563 Finland 1565 Phone: +358 (50) 4871445 1566 Email: Hannes.Tschofenig@gmx.net 1567 URI: http://www.tschofenig.priv.at 1568 Bernard Aboba 1569 Microsoft Corporation 1570 One Microsoft Way 1571 Redmond, WA 98052 1572 US 1574 Email: bernarda@microsoft.com 1576 Jon Peterson 1577 NeuStar, Inc. 1578 1800 Sutter St Suite 570 1579 Concord, CA 94520 1580 US 1582 Email: jon.peterson@neustar.biz 1584 John B. Morris, Jr. 1586 Email: ietf@jmorris.org 1588 Marit Hansen 1589 ULD Kiel 1591 Email: marit.hansen@datenschutzzentrum.de 1593 Rhys Smith 1594 Janet 1596 Email: rhys.smith@ja.net