idnits 2.17.1 draft-iab-privacy-considerations-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1339 has weird spacing: '... states on th...' == Line 1340 has weird spacing: '...cessing of...' -- The document date (January 12, 2013) is 4122 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 4282 (Obsoleted by RFC 7542) -- Obsolete informational reference (is this intentional?): RFC 5077 (Obsoleted by RFC 8446) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Cooper 3 Internet-Draft CDT 4 Intended status: Informational H. Tschofenig 5 Expires: July 16, 2013 Nokia Siemens Networks 6 B. Aboba 7 Microsoft Corporation 8 J. Peterson 9 NeuStar, Inc. 10 J. Morris 12 M. Hansen 13 ULD Kiel 14 R. Smith 15 JANET(UK) 16 January 12, 2013 18 Privacy Considerations for Internet Protocols 19 draft-iab-privacy-considerations-06.txt 21 Abstract 23 This document offers guidance for developing privacy considerations 24 for inclusion in protocol specifications. It aims to make protocol 25 designers aware of privacy-related design choices. It suggests that 26 whether any individual RFC warrants a specific privacy considerations 27 section will depend on the document's content. 29 Discussion of this document is taking place on the IETF Privacy 30 Discussion mailing list (see 31 https://www.ietf.org/mailman/listinfo/ietf-privacy). 33 Status of this Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at http://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on July 16, 2013. 50 Copyright Notice 52 Copyright (c) 2013 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 69 2.1. Entities . . . . . . . . . . . . . . . . . . . . . . . . . 6 70 2.2. Data and Analysis . . . . . . . . . . . . . . . . . . . . 7 71 2.3. Identifiability . . . . . . . . . . . . . . . . . . . . . 7 72 3. Communications Model . . . . . . . . . . . . . . . . . . . . . 10 73 4. Privacy Threats . . . . . . . . . . . . . . . . . . . . . . . 12 74 4.1. Combined Security-Privacy Threats . . . . . . . . . . . . 12 75 4.1.1. Surveillance . . . . . . . . . . . . . . . . . . . . . 12 76 4.1.2. Stored Data Compromise . . . . . . . . . . . . . . . . 13 77 4.1.3. Intrusion . . . . . . . . . . . . . . . . . . . . . . 13 78 4.1.4. Misattribution . . . . . . . . . . . . . . . . . . . . 13 79 4.2. Privacy-Specific Threats . . . . . . . . . . . . . . . . . 14 80 4.2.1. Correlation . . . . . . . . . . . . . . . . . . . . . 14 81 4.2.2. Identification . . . . . . . . . . . . . . . . . . . . 15 82 4.2.3. Secondary Use . . . . . . . . . . . . . . . . . . . . 15 83 4.2.4. Disclosure . . . . . . . . . . . . . . . . . . . . . . 16 84 4.2.5. Exclusion . . . . . . . . . . . . . . . . . . . . . . 16 85 5. Threat Mitigations . . . . . . . . . . . . . . . . . . . . . . 18 86 5.1. Data Minimization . . . . . . . . . . . . . . . . . . . . 18 87 5.1.1. Anonymity . . . . . . . . . . . . . . . . . . . . . . 19 88 5.1.2. Pseudonymity . . . . . . . . . . . . . . . . . . . . . 19 89 5.1.3. Identity Confidentiality . . . . . . . . . . . . . . . 20 90 5.1.4. Data Minimization within Identity Management . . . . . 20 91 5.2. User Participation . . . . . . . . . . . . . . . . . . . . 21 92 5.3. Security . . . . . . . . . . . . . . . . . . . . . . . . . 22 93 6. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 94 7. Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 25 95 7.1. Data Minimization . . . . . . . . . . . . . . . . . . . . 25 96 7.2. User Participation . . . . . . . . . . . . . . . . . . . . 26 97 7.3. Security . . . . . . . . . . . . . . . . . . . . . . . . . 27 98 7.4. General . . . . . . . . . . . . . . . . . . . . . . . . . 27 99 8. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 100 9. Security Considerations . . . . . . . . . . . . . . . . . . . 33 101 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34 102 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 35 103 12. IAB Members at the Time of Approval . . . . . . . . . . . . . 36 104 13. Informative References . . . . . . . . . . . . . . . . . . . . 37 105 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 40 107 1. Introduction 109 [RFC3552] provides detailed guidance to protocol designers about both 110 how to consider security as part of protocol design and how to inform 111 readers of protocol specifications about security issues. This 112 document intends to provide a similar set of guidance for considering 113 privacy in protocol design. 115 Privacy is a complicated concept with a rich history that spans many 116 disciplines. With regard to data, often it is a concept applied to 117 "personal data," information relating to an identified or 118 identifiable individual. Many sets of privacy principles and privacy 119 design frameworks have been developed in different forums over the 120 years. These include the Fair Information Practices [FIPs], a 121 baseline set of privacy protections pertaining to the collection and 122 use of personal data (often based on the principles established in 123 [OECD], for example), and the Privacy by Design concept, which 124 provides high-level privacy guidance for systems design (see [PbD] 125 for one example). The guidance provided in this document is inspired 126 by this prior work, but it aims to be more concrete, pointing 127 protocol designers to specific engineering choices that can impact 128 the privacy of the individuals that make use of Internet protocols. 130 Different people have radically different conceptions of what privacy 131 means, both in general, and as it relates to them personally 132 [Westin]. Furthermore, privacy as a legal concept is understood 133 differently in different jurisdictions. The guidance provided in 134 this document is generic and can be used to inform the design of any 135 protocol to be used anywhere in the world, without reference to 136 specific legal frameworks. 138 Whether any individual document warrants a specific privacy 139 considerations section will depend on the document's content. 140 Documents whose entire focus is privacy may not merit a separate 141 section (for example, "Private Extensions to the Session Initiation 142 Protocol (SIP) for Asserted Identity within Trusted Networks" 143 [RFC3325]). For certain specifications, privacy considerations are a 144 subset of security considerations and can be discussed explicitly in 145 the security considerations section. Some documents will not require 146 discussion of privacy considerations (for example, "Definition of the 147 Opus Audio Codec" [RFC6716]). The guidance provided here can and 148 should be used to assess the privacy considerations of protocol, 149 architectural, and operational specifications and to decide whether 150 those considerations are to be documented in a stand-alone section, 151 within the security considerations section, or throughout the 152 document. 154 This document is organized as follows. Section 2 explains the 155 terminology used in this document. Section 3 reviews typical 156 communications architectures to understand at which points there may 157 be privacy threats. Section 4 discusses threats to privacy as they 158 apply to Internet protocols. Section 5 outlines mitigations of those 159 threats. Section 6 describes the extent to which the guidance 160 offered is applicable within the IETF. Section 7 provides the 161 guidelines for analyzing and documenting privacy considerations 162 within IETF specifications. Section 8 examines the privacy 163 characteristics of an IETF protocol to demonstrate the use of the 164 guidance framework. 166 2. Terminology 168 This section defines basic terms used in this document, with 169 references to pre-existing definitions as appropriate. As in 170 [RFC4949], each entry is preceded by a dollar sign ($) and a space 171 for automated searching. Note that this document does not try to 172 attempt to define the term 'privacy' itself. Instead privacy is the 173 sum of what is contained in this document. We therefore follow the 174 approach taken by [RFC3552]. 176 2.1. Entities 178 Several of these terms are further elaborated in Section 3. 180 $ Attacker: An entity that intentionally works against some 181 protection goal. 183 $ Eavesdropper: A type of attacker that passively observes an 184 initiator's communications without the initiator's knowledge or 185 authorization. See [RFC4949]. 187 $ Enabler: A protocol entity that facilitates communication between 188 an initiator and a recipient without being directly in the 189 communications path. 191 $ Individual: A human being. 193 $ Initiator: A protocol entity that initiates communications with a 194 recipient. 196 $ Intermediary: A protocol entity that sits between the initiator 197 and the recipient and is necessary for the initiator and recipient 198 to communicate. Unlike an eavesdropper, an intermediary is an 199 entity that is part of the communication architecture. For 200 example, a SIP proxy is an intermediary in the SIP architecture. 202 $ Observer: An entity that is able to observe and collect 203 information from communications, potentially posing privacy 204 threats depending on the context. As defined in this document, 205 initiators, recipients, intermediaries, and enablers can all be 206 observers. Observers are distinguished from eavesdroppers by 207 being at least tacitly authorized. 209 $ Recipient: A protocol entity that receives communications from an 210 initiator. 212 2.2. Data and Analysis 214 $ Correlation: The combination of various pieces of information 215 relating to an individual. 217 $ Fingerprint: A set of information elements that identifies a 218 device or application instance. 220 $ Fingerprinting: The process of an observer or attacker uniquely 221 identifying (with a sufficiently high probability) a device or 222 application instance based on multiple information elements 223 communicated to the observer or attacker. See [EFF]. 225 $ Item of Interest (IOI): Any data item that an observer or 226 attacker might be interested in. This includes attributes, 227 identifiers, identities, communications content, and the fact that 228 a communication interaction has taken place. 230 $ Personal Data: Any information relating to an individual who can 231 be identified, directly or indirectly. 233 $ (Protocol) Interaction: A unit of communication within a 234 particular protocol. A single interaction may be compromised of a 235 single message between an initiator and recipient or multiple 236 messages, depending on the protocol. 238 $ Traffic Analysis: The inference of information from observation 239 of traffic flows (presence, absence, amount, direction, and 240 frequency). See [RFC4949]. 242 $ Undetectability: The inability of an observer or attacker to 243 sufficiently distinguish whether an item of interest exists or 244 not. 246 $ Unlinkability: Within a particular set of information, the 247 inability of an observer or attacker to distinguish whether two 248 items of interest are related or not (with a high enough degree of 249 probability to be useful to the observer or attacker). 251 2.3. Identifiability 253 $ Anonymity: The state of being anonymous. 255 $ Anonymity Set: A set of individuals that have the same 256 attributes, making them indistinguishable from each other from the 257 perspective of a particular attacker or observer. 259 $ Anonymous: A state of an individual in which an observer or 260 attacker cannot identify the individual within a set of other 261 individuals (the anonymity set). 263 $ Attribute: A property of an individual. 265 $ Identifiable: A property in which an individual's identity is 266 capable of being known to an observer or attacker. 268 $ Identifiability: The extent to which an individual is 269 identifiable. 271 $ Identified: A state in which an individual's identity is known. 273 $ Identifier: A data object uniquely referring to a specific 274 identity of a protocol entity or individual in some context. See 275 [RFC4949]. Identifiers can be based upon natural names -- 276 official names, personal names, and/or nicknames -- or can be 277 artificial (for example, x9z32vb). However, identifiers are by 278 definition unique within their context of use, while natural names 279 are often not unique. 281 $ Identification: The linking of information to a particular 282 individual to infer the individual's identity or to allow the 283 inference of the individual's identity in some context. 285 $ Identity: Any subset of an individual's attributes, including 286 names, that identifies the individual within a given context. 287 Individuals usually have multiple identities for use in different 288 contexts. 290 $ Identity Confidentiality: A property of an individual wherein any 291 party other than the recipient cannot sufficiently identify the 292 individual within a set of other individuals (the anonymity set). 293 This is a desirable property of authentication protocols. 295 $ Identity Provider: An entity (usually an organization) that is 296 responsible for establishing, maintaining, securing, and vouching 297 for the identity associated with individuals. 299 $ Official Name: A personal name for an individual which is 300 registered in some official context. For example, the name on an 301 individual's birth certificate. 303 $ Personal Name: A natural name for an individual. Personal names 304 are often not unique, and often comprise given names in 305 combination with a family name. An individual may have multiple 306 personal names at any time and over a lifetime, including official 307 names. From a technological perspective, it cannot always be 308 determined whether a given reference to an individual is, or is 309 based upon, the individual's personal name(s) (see Pseudonym). 311 $ Pseudonym: A name assumed by an individual in some context, 312 unrelated to the individual's personal names known by others in 313 that context, with an intent of not revealing the individual's 314 identities associated with her other names. 316 $ Pseudonymity: The state of being pseudonymous. 318 $ Pseudonymous: A property of an individual in which the individual 319 is identified by a pseudonym. 321 $ Real name: See personal name and official name. 323 $ Relying party: An entity that relies on assertions of 324 individuals' identities from identity providers in order to 325 provide services to individuals. In effect, the relying party 326 delegates aspects of identity management to the identity 327 provider(s). Such delegation requires protocol exchanges, trust, 328 and a common understanding of semantics of information exchanged 329 between the relying party and the identity provider. 331 3. Communications Model 333 To understand attacks in the privacy-harm sense, it is helpful to 334 consider the overall communication architecture and different actors' 335 roles within it. Consider a protocol entity, the "initiator", that 336 initiates communication with some recipient. Privacy analysis is 337 most relevant for protocols with use cases in which the initiator 338 acts on behalf of an individual (or different individuals at 339 different times). It is this individual whose privacy is potentially 340 threatened. 342 Communications may be direct between the initiator and the recipient, 343 or they may involve an application-layer intermediary (such as a 344 proxy or cache) that is necessary for the two parties to communicate. 345 In some cases this intermediary stays in the communication path for 346 the entire duration of the communication and sometimes it is only 347 used for communication establishment, for either inbound or outbound 348 communication. In rare cases there may be a series of intermediaries 349 that are traversed. At lower layers, additional entities are 350 involved in packet forwarding that may interfere with privacy 351 protection goals as well. 353 Some communications tasks require multiple protocol interactions with 354 different entities. For example, a request to an HTTP server may be 355 preceded by an interaction between the initiator and an 356 Authentication, Authorization, and Accounting (AAA) server for 357 network access and to a DNS server for name resolution. In this 358 case, the HTTP server is the recipient and the other entities are 359 enablers of the initiator-to-recipient communication. Similarly, a 360 single communication with the recipient might generate further 361 protocol interactions between either the initiator or the recipient 362 and other entities, and the roles of the entities might change with 363 each interaction. For example, an HTTP request might trigger 364 interactions with an authentication server or with other resource 365 servers wherein the recipient becomes an initiator in those later 366 interactions. 368 Thus, when conducting privacy analysis of an architecture that 369 involves multiple communications phases, the entities involved may 370 take on different -- or opposing -- roles from a privacy 371 considerations perspective in each phase. Understanding the privacy 372 implications of the architecture as a whole may require a separate 373 analysis of each phase. 375 Protocol design is often predicated on the notion that recipients, 376 intermediaries, and enablers are assumed to be authorized to receive 377 and handle data from initiators. As [RFC3552] explains, "we assume 378 that the end-systems engaging in a protocol exchange have not 379 themselves been compromised." However, by its nature privacy 380 analysis requires questioning this assumption since systems are often 381 compromised for the purpose of obtaining personal data. 383 Although recipients, intermediaries, and enablers may not generally 384 be considered as attackers, they may all pose privacy threats 385 (depending on the context) because they are able to observe, collect, 386 process, and transfer privacy-relevant data. These entities are 387 collectively described below as "observers" to distinguish them from 388 traditional attackers. From a privacy perspective, one important 389 type of attacker is an eavesdropper: an entity that passively 390 observes the initiator's communications without the initiator's 391 knowledge or authorization. 393 The threat descriptions in the next section explain how observers and 394 attackers might act to harm individuals' privacy. Different kinds of 395 attacks may be feasible at different points in the communications 396 path. For example, an observer could mount surveillance or 397 identification attacks between the initiator and intermediary, or 398 instead could surveil an enabler (e.g., by observing DNS queries from 399 the initiator). 401 4. Privacy Threats 403 Privacy harms come in a number of forms, including harms to financial 404 standing, reputation, solitude, autonomy, and safety. A victim of 405 identity theft or blackmail, for example, may suffer a financial loss 406 as a result. Reputational harm can occur when disclosure of 407 information about an individual, whether true or false, subjects that 408 individual to stigma, embarrassment, or loss of personal dignity. 409 Intrusion or interruption of an individual's life or activities can 410 harm the individual's ability to be left alone. When individuals or 411 their activities are monitored, exposed, or at risk of exposure, 412 those individuals may be stifled from expressing themselves, 413 associating with others, and generally conducting their lives freely. 414 They may also feel a general sense of unease, in that it is "creepy" 415 to be monitored or to have data collected about them. In cases where 416 such monitoring is for the purpose of stalking or violence (for 417 example, monitoring communications to or from a domestic abuse 418 shelter), it can put individuals in physical danger. 420 This section lists common privacy threats (drawing liberally from 421 [Solove], as well as [CoE]), showing how each of them may cause 422 individuals to incur privacy harms and providing examples of how 423 these threats can exist on the Internet. 425 Some privacy threats are already considered in IETF protocols as a 426 matter of routine security analysis. Others are more pure privacy 427 threats that existing security considerations do not usually address. 428 The threats described here are divided into those that may also be 429 considered security threats and those that are primarily privacy 430 threats. 432 Note that an individual's awareness of and consent to the practices 433 described below can greatly affect the extent to which they threaten 434 privacy. If an individual authorizes surveillance of his own 435 activities, for example, the harms associated with it may be 436 mitigated, or the individual may accept the risk of harm. 438 4.1. Combined Security-Privacy Threats 440 4.1.1. Surveillance 442 Surveillance is the observation or monitoring of an individual's 443 communications or activities. The effects of surveillance on the 444 individual can range from anxiety and discomfort to behavioral 445 changes such as inhibition and self-censorship to the perpetration of 446 violence against the individual. The individual need not be aware of 447 the surveillance for it to impact privacy -- the possibility of 448 surveillance may be enough to harm individual autonomy. 450 Surveillance can be conducted by observers or eavesdroppers at any 451 point along the communications path. Confidentiality protections (as 452 discussed in [RFC3552] Section 3) are necessary to prevent 453 surveillance of the content of communications. To prevent traffic 454 analysis or other surveillance of communications patterns, other 455 measures may be necessary, such as [Tor]. 457 4.1.2. Stored Data Compromise 459 End systems that do not take adequate measures to secure stored data 460 from unauthorized or inappropriate access expose individuals to 461 potential financial, reputational, or physical harm. 463 Protecting against stored data compromise is typically outside the 464 scope of IETF protocols. However, a number of common protocol 465 functions -- key management, access control, or operational logging, 466 for example -- require the storage of data about initiators of 467 communications. When requiring or recommending that information 468 about initiators or their communications be stored or logged by end 469 systems (see, e.g., RFC 6302 [RFC6302]), it is important to recognize 470 the potential for that information to be compromised and for that 471 potential to be weighed against the benefits of data storage. Any 472 recipient, intermediary, or enabler that stores data may be 473 vulnerable to compromise. 475 4.1.3. Intrusion 477 Intrusion consists of invasive acts that disturb or interrupt one's 478 life or activities. Intrusion can thwart individuals' desires to be 479 left alone, sap their time or attention, or interrupt their 480 activities. This threat is focused on intrusion into one's life 481 rather than direct intrusion into one's communications. The latter 482 is captured in Section 4.1.1. 484 Unsolicited messages and denial-of-service attacks are the most 485 common types of intrusion on the Internet. Intrusion can be 486 perpetrated by any attacker that is capable of sending unwanted 487 traffic to the initiator. 489 4.1.4. Misattribution 491 Misattribution occurs when data or communications related to one 492 individual are attributed to another. Misattribution can result in 493 adverse reputational, financial, or other consequences for 494 individuals that are misidentified. 496 Misattribution in the protocol context comes as a result of using 497 inadequate or insecure forms of identity or authentication. For 498 example, as [RFC6269] notes, abuse mitigation is often conducted on 499 the basis of source IP address, such that connections from individual 500 IP addresses may be prevented or temporarily blacklisted if abusive 501 activity is determined to be sourced from those addresses. However, 502 in the case where a single IP address is shared by multiple 503 individuals, those penalties may be suffered by all individuals 504 sharing the address, even if they were not involved in the abuse. 505 This threat can be mitigated by using identity management mechanisms 506 with proper forms of authentication (ideally with cryptographic 507 properties) so that actions can be attributed uniquely to an 508 individual to provide the basis for accountability without generating 509 false-positives. 511 4.2. Privacy-Specific Threats 513 4.2.1. Correlation 515 Correlation is the combination of various pieces of information 516 related to an individual. Correlation can defy people's expectations 517 of the limits of what others know about them. It can increase the 518 power that those doing the correlating have over individuals as well 519 as correlators' ability to pass judgment, threatening individual 520 autonomy and reputation. 522 Correlation is closely related to identification. Internet protocols 523 can facilitate correlation by allowing individuals' activities to be 524 tracked and combined over time. The use of persistent or 525 infrequently replaced identifiers at any layer of the stack can 526 facilitate correlation. For example, an initiator's persistent use 527 of the same device ID, certificate, or email address across multiple 528 interactions could allow recipients (and observers) to correlate all 529 of the initiator's communications over time. 531 As an example, consider Transport Layer Security (TLS) session 532 resumption [RFC5246] or TLS session resumption without server side 533 state [RFC5077]. In RFC 5246 [RFC5246] a server provides the client 534 with a session_id in the ServerHello message and caches the 535 master_secret for later exchanges. When the client initiates a new 536 connection with the server it re-uses the previously obtained 537 session_id in its ClientHello message. The server agrees to resume 538 the session by using the same session_id and the previously stored 539 master_secret for the generation of the TLS Record Layer security 540 association. RFC 5077 [RFC5077] borrows from the session resumption 541 design idea but the server encapsulates all state information into a 542 ticket instead of caching it. An attacker who is able to observe the 543 protocol exchanges between the TLS client and the TLS server is able 544 to link the initial exchange to subsequently resumed TLS sessions 545 when the session_id and the ticket are exchanged in the clear (which 546 is the case with data exchanged in the initial handshake messages). 548 In theory any observer or attacker that receives an initiator's 549 communications can engage in correlation. The extent of the 550 potential for correlation will depend on what data the entity 551 receives from the initiator and has access to otherwise. Often, 552 intermediaries only require a small amount of information for message 553 routing and/or security. In theory, protocol mechanisms could ensure 554 that end-to-end information is not made accessible to these entities, 555 but in practice the difficulty of deploying end-to-end security 556 procedures, additional messaging or computational overhead, and other 557 business or legal requirements often slow or prevent the deployment 558 of end-to-end security mechanisms, giving intermediaries greater 559 exposure to initiators' data than is strictly necessary from a 560 technical point of view. 562 4.2.2. Identification 564 Identification is the linking of information to a particular 565 individual. In some contexts it is perfectly legitimate to identify 566 individuals, whereas in others identification may potentially stifle 567 individuals' activities or expression by inhibiting their ability to 568 be anonymous or pseudonymous. Identification also makes it easier 569 for individuals to be explicitly controlled by others (e.g., 570 governments) and to be treated differentially compared to other 571 individuals. 573 Many protocols provide functionality to convey the idea that some 574 means has been provided to guarantee that entities are who they claim 575 to be. Often, this is accomplished with cryptographic 576 authentication. Furthermore, many protocol identifiers, such as 577 those used in SIP or XMPP, may allow for the direct identification of 578 individuals. Protocol identifiers may also contribute indirectly to 579 identification via correlation. For example, a web site that does 580 not directly authenticate users may be able to match its HTTP header 581 logs with logs from another site that does authenticate users, 582 rendering users on the first site identifiable. 584 As with correlation, any observer or attacker may be able to engage 585 in identification depending on the information about the initiator 586 that is available via the protocol mechanism or other channels. 588 4.2.3. Secondary Use 590 Secondary use is the use of collected information without the 591 individual's consent for a purpose different from that for which the 592 information was collected. Secondary use may violate people's 593 expectations or desires. The potential for secondary use can 594 generate uncertainty over how one's information will be used in the 595 future, potentially discouraging information exchange in the first 596 place. 598 One example of secondary use would be an authentication server that 599 uses a network access server's Access-Requests to track an 600 initiator's location. Any observer or attacker could potentially 601 make unwanted secondary uses of initiators' data. Protecting against 602 secondary use is typically outside the scope of IETF protocols. 604 4.2.4. Disclosure 606 Disclosure is the revelation of information about an individual that 607 affects the way others judge the individual. Disclosure can violate 608 individuals' expectations of the confidentiality of the data they 609 share. The threat of disclosure may deter people from engaging in 610 certain activities for fear of reputational harm, or simply because 611 they do not wish to be observed. 613 Any observer or attacker that receives data about an initiator may 614 engage in disclosure. Sometimes disclosure is unintentional because 615 system designers do not realize that information being exchanged 616 relates to individuals. The most common way for protocols to limit 617 disclosure is by providing access control mechanisms (discussed in 618 Section 4.2.5). A further example is provided by the IETF 619 geolocation privacy architecture [RFC6280], which supports a way for 620 users to express a preference that their location information not be 621 disclosed beyond the intended recipient. 623 4.2.5. Exclusion 625 Exclusion is the failure to allow individuals to know about the data 626 that others have about them and to participate in its handling and 627 use. Exclusion reduces accountability on the part of entities that 628 maintain information about people and creates a sense of 629 vulnerability about individuals' ability to control how information 630 about them is collected and used. 632 The most common way for Internet protocols to be involved in 633 enforcing exclusion is through access control mechanisms. The 634 presence architecture developed in the IETF is a good example where 635 individuals are included in the control of information about them. 636 Using a rules expression language (e.g., Presence Authorization Rules 637 [RFC5025]), presence clients can authorize the specific conditions 638 under which their presence information may be shared. 640 Exclusion is primarily considered problematic when the recipient 641 fails to involve the initiator in decisions about data collection, 642 handling, and use. Eavesdroppers engage in exclusion by their very 643 nature since their data collection and handling practices are covert. 645 5. Threat Mitigations 647 Privacy is notoriously difficult to measure and quantify. The extent 648 to which a particular protocol, system, or architecture "protects" or 649 "enhances" privacy is dependent on a large number of factors relating 650 to its design, use, and potential misuse. However, there are certain 651 widely recognized classes of mitigations against the threats 652 discussed in Section 4. This section describes three categories of 653 relevant mitigations: (1) data minimization, (2) user participation, 654 and (3) security. The privacy mitigations described in this chapter 655 can loosely be mapped to existing privacy principles, such as the 656 Fair Information Practices, but they have been adapted to fit the 657 target audience of this document. 659 5.1. Data Minimization 661 Data minimization refers to collecting, using, disclosing, and 662 storing the minimal data necessary to perform a task. The less data 663 about individuals that gets exchanged in the first place, the lower 664 the chances of that data being misused or leaked. 666 Data minimization can be effectuated in a number of different ways, 667 including by limiting collection, use, disclosure, retention, 668 identifiability, sensitivity, and access to personal data. Limiting 669 the data collected by protocol elements only to what is necessary 670 (collection limitation) is the most straightforward way to help 671 reduce privacy risks associated with the use of the protocol. In 672 some cases, protocol designers may also be able to recommend limits 673 to the use or retention of data, although protocols themselves are 674 not often capable of controlling these properties. 676 However, the most direct application of data minimization to protocol 677 design is limiting identifiability. Reducing the identifiability of 678 data by using pseudonyms or no identifiers at all helps to weaken the 679 link between an individual and his or her communications. Allowing 680 for the periodic creation of new identifiers reduces the possibility 681 that multiple protocol interactions or communications can be 682 correlated back to the same individual. The following sections 683 explore a number of different properties related to identifiability 684 that protocol designers may seek to achieve. 686 Data minimization mitigates the following threats: surveillance, 687 stored data compromise, correlation, identification, secondary use, 688 disclosure. 690 5.1.1. Anonymity 692 To enable anonymity of an individual, there must exist a set of 693 individuals with potentially the same attributes. To the attacker or 694 the observer these individuals must appear indistinguishable from 695 each other. The set of all such individuals is known as the 696 anonymity set and membership of this set may vary over time. 698 The composition of the anonymity set depends on the knowledge of the 699 observer or attacker. Thus anonymity is relative with respect to the 700 observer or attacker. An initiator may be anonymous only within a 701 set of potential initiators -- its initiator anonymity set -- which 702 itself may be a subset of all individuals that may initiate 703 communications. Conversely, a recipient may be anonymous only within 704 a set of potential recipients -- its recipient anonymity set. Both 705 anonymity sets may be disjoint, may overlap, or may be the same. 707 As an example, consider RFC 3325 (P-Asserted-Identity, PAI) 708 [RFC3325], an extension for the Session Initiation Protocol (SIP), 709 that allows an individual, such as a VoIP caller, to instruct an 710 intermediary that he or she trusts not to populate the SIP From 711 header field with the individual's authenticated and verified 712 identity. The recipient of the call, as well as any other entity 713 outside of the individual's trust domain, would therefore only learn 714 that the SIP message (typically a SIP INVITE) was sent with a header 715 field 'From: "Anonymous" ' rather 716 than the individual's address-of-record, which is typically thought 717 of as the "public address" of the user. When PAI is used, the 718 individual becomes anonymous within the initiator anonymity set that 719 is populated by every individual making use of that specific 720 intermediary. 722 Note that this example ignores the fact that the recipient may infer 723 or obtain personal data from the other SIP protocol payloads (e.g., 724 SIP Via and Contact headers, SDP). The implication is that PAI only 725 attempts to address a particular threat, namely the disclosure of 726 identity in the From header) with respect to the recipient. This 727 caveat makes the analysis of the specific protocol extension easier 728 but cannot be assumed when conducting analysis of an entire 729 architecture. 731 5.1.2. Pseudonymity 733 In the context of Internet protocols, almost all identifiers can be 734 nicknames or pseudonyms since there is typically no requirement to 735 use personal names in protocols. However, in certain scenarios it is 736 reasonable to assume that personal names will be used (with vCard 737 [RFC6350], for example). 739 Pseudonymity is strengthened when less personal data can be linked to 740 the pseudonym; when the same pseudonym is used less often and across 741 fewer contexts; and when independently chosen pseudonyms are more 742 frequently used for new actions (making them, from an observer's or 743 attacker's perspective, unlinkable). 745 For Internet protocols it is important whether protocols allow 746 pseudonyms to be changed without human interaction, the default 747 length of pseudonym lifetimes, to whom pseudonyms are exposed, how 748 individuals are able to control disclosure, how often pseudonyms can 749 be changed, and the consequences of changing them. 751 5.1.3. Identity Confidentiality 753 An initiator has identity confidentiality when any party other than 754 the recipient cannot sufficiently identify the initiator within the 755 anonymity set. The size of the anonymity set has a direct impact on 756 identity confidentiality since the smaller the set is, the easier it 757 is to identify the initiator. Identity confidentiality aims to 758 provide a protection against eavesdroppers and intermediaries rather 759 than the intended communication end points. 761 As an example, consider the network access authentication procedures 762 utilizing the Extensible Authentication Protocol (EAP) [RFC3748]. 763 EAP includes an identity exchange where the Identity Response is 764 primarily used for routing purposes and selecting which EAP method to 765 use. Since EAP Identity Requests and Responses are sent in 766 cleartext, eavesdroppers and intermediaries along the communication 767 path between the EAP peer and the EAP server can snoop on the 768 identity, which is encoded in the form of the Network Access 769 Identifier (NAI) defined in RFC 4282 [RFC4282]). To address this 770 threat, as discussed in RFC 4282 [RFC4282], the username part of the 771 NAI (but not the realm-part) can be hidden from these eavesdroppers 772 and intermediaries with the cryptographic support offered by EAP 773 methods. Identity confidentiality has become a recommended design 774 criteria for EAP (see [RFC4017]). EAP-AKA [RFC4187], for example, 775 protects the EAP peer's identity against passive adversaries by 776 utilizing temporal identities. EAP-IKEv2 [RFC5106] is an example of 777 an EAP method that offers protection against active attackers with 778 regard to the individual's identity. 780 5.1.4. Data Minimization within Identity Management 782 Modern systems are increasingly relying on multi-party transactions 783 to authenticate individuals. Many of these systems make use of an 784 identity provider that is responsible for providing authentication, 785 authorization, and accounting functionality to relying parties that 786 offer some protected resources. To facilitate these functions an 787 identity provider will usually go through a process of verifying the 788 individual's identity and issuing credentials to the individual. 789 When an individual seeks to make use of a service provided by the 790 relying party, the relying party relies on the authentication 791 assertions provided by its identity provider. Note that in more 792 sophisticated scenarios the authentication assertions are traits that 793 demonstrate the individual's capabilities and roles. The 794 authorization responsibility may also be shared between the identity 795 provider and the relying party and does not necessarily only need to 796 reside with the identity provider. 798 Such systems have the ability to support a number of properties that 799 minimize data collection in different ways: 801 In certain use cases relying parties do not need to know the real 802 name of an individual (for example, when the individual's age is 803 the only attribute that needs to be authenticated). 805 Relying parties that collude can be prevented from using an 806 individual's credentials to track the individual. That is, two 807 different relying parties can be prevented from determining that 808 the same individual has authenticated to both of them. This 809 typically requires identity management protocol support and as 810 well as support by both the relying party and the identity 811 provider. 813 The identity provider can be prevented from knowing which relying 814 parties an individual interacted with. This requires avoiding 815 direct communication between the identity provider and the relying 816 party at the time when access to a resource by the initiator is 817 made. 819 5.2. User Participation 821 As explained in Section 4.2.5, data collection and use that happens 822 "in secret," without the individual's knowledge, is apt to violate 823 the individual's expectation of privacy and may create incentives for 824 misuse of data. As a result, privacy regimes tend to include 825 provisions to require informing individuals about data collection and 826 use and involving them in decisions about the treatment of their 827 data. In an engineering context, supporting the goal of user 828 participation usually means providing ways for users to control the 829 data that is shared about them. It may also mean providing ways for 830 users to signal how they expect their data to be used and shared. 832 User participation mitigates the following threats: surveillance, 833 secondary use, disclosure, exclusion 835 5.3. Security 837 Keeping data secure at rest and in transit is another important 838 component of privacy protection. As they are described in [RFC3552] 839 Section 2, a number of security goals also serve to enhance privacy: 841 o Confidentiality: Keeping data secret from unintended listeners. 843 o Peer entity authentication: Ensuring that the endpoint of a 844 communication is the one that is intended (in support of 845 maintaining confidentiality). 847 o Unauthorized usage: Limiting data access to only those users who 848 are authorized. (Note that this goal also falls within data 849 minimization.) 851 o Inappropriate usage: Limiting how authorized users can use data. 852 (Note that this goal also falls within data minimization.) 854 Note that even when these goals are achieved, the existence of items 855 of interest -- attributes, identifiers, identities, communications, 856 actions (such as the sending or receiving of a communication), or 857 anything else an attacker or observer might be interested in -- may 858 still be detectable, even if they are not readable. Thus 859 undetectability, in which an observer or attacker cannot sufficiently 860 distinguish whether an item of interest exists or not, may be 861 considered as a further security goal (albeit one that can be 862 extremely difficult to accomplish). 864 By providing proper security protection the following threats can be 865 mitigated: surveillance, stored data compromise, misattribution, 866 secondary use, disclosure, intrusion 868 6. Scope 870 Internet protocols are often built flexibly, making them useful in a 871 variety of architectures, contexts, and deployment scenarios without 872 requiring significant interdependency between disparately designed 873 components. Although protocol designers often have a particular 874 target architecture or set of architectures in mind at design time, 875 it is not uncommon for architectural frameworks to develop later, 876 after implementations exist and have been deployed in combination 877 with other protocols or components to form complete systems. 879 As a consequence, the extent to which protocol designers can foresee 880 all of the privacy implications of a particular protocol at design 881 time is limited. An individual protocol may be relatively benign on 882 its own, and it may make use of privacy and security features at 883 lower layers of the protocol stack (Internet Protocol Security, 884 Transport Layer Security, and so forth) to mitigate the risk of 885 attack. But when deployed within a larger system or used in a way 886 not envisioned at design time, its use may create new privacy risks. 887 Protocols are often implemented and deployed long after design time 888 by different people than those who did the protocol design. The 889 guidelines in Section 7 ask protocol designers to consider how their 890 protocols are expected to interact with systems and information that 891 exist outside the protocol bounds, but not to imagine every possible 892 deployment scenario. 894 Furthermore, in many cases the privacy properties of a system are 895 dependent upon the complete system design where various protocols are 896 combined together to form a product solution; the implementation, 897 which includes the user interface design; and operational deployment 898 practices, including default privacy settings and security processes 899 within the company doing the deployment. These details are specific 900 to particular instantiations and generally outside the scope of the 901 work conducted in the IETF. The guidance provided here may be useful 902 in making choices about these details, but its primary aim is to 903 assist with the design, implementation, and operation of protocols. 905 Transparency of data collection and use -- often effectuated through 906 user interface design -- is normally a key factor in determining the 907 privacy impact of a system. Although most IETF activities do not 908 involve standardizing user interfaces or user-facing communications, 909 in some cases understanding expected user interactions can be 910 important for protocol design. Unexpected user behavior may have an 911 adverse impact on security and/or privacy. 913 In sum, privacy issues, even those related to protocol development, 914 go beyond the technical guidance discussed herein. As an example, 915 consider HTTP [RFC2616], which was designed to allow the exchange of 916 arbitrary data. A complete analysis of the privacy considerations 917 for uses of HTTP might include what type of data is exchanged, how 918 this data is stored, and how it is processed. Hence the analysis for 919 an individual's static personal web page would be different than the 920 use of HTTP for exchanging health records. A protocol designer 921 working on HTTP extensions (such as WebDAV [RFC4918]) is not expected 922 to describe the privacy risks derived from all possible usage 923 scenarios, but rather the privacy properties specific to the 924 extensions and any particular uses of the extensions that are 925 expected and foreseen at design time. 927 7. Guidelines 929 This section provides guidance for document authors in the form of a 930 questionnaire about a protocol being designed. The questionnaire may 931 be useful at any point in the design process, particularly after 932 document authors have developed a high-level protocol model as 933 described in [RFC4101]. 935 Note that the guidance does not recommend specific practices. The 936 range of protocols developed in the IETF is too broad to make 937 recommendations about particular uses of data or how privacy might be 938 balanced against other design goals. However, by carefully 939 considering the answers to each question, document authors should be 940 able to produce a comprehensive analysis that can serve as the basis 941 for discussion of whether the protocol adequately protects against 942 privacy threats. 944 The framework is divided into four sections that address each of the 945 mitigation classes from Section 5, plus a general section. Security 946 is not fully elaborated since substantial guidance already exists in 947 [RFC3552]. 949 7.1. Data Minimization 951 a. Identifiers. What identifiers does the protocol use for 952 distinguishing initiators of communications? Does the protocol 953 use identifiers that allow different protocol interactions to be 954 correlated? What identifiers could be omitted or be made less 955 identifying while still fulfilling the protocol's goals? 957 b. Data. What information does the protocol expose about 958 individuals, their devices, and/or their device usage (other than 959 the identifiers discussed in (a))? To what extent is this 960 information linked to the identities of the individuals? How does 961 the protocol combine personal data with the identifiers discussed 962 in (a)? 964 c. Observers. Which information discussed in (a) and (b) is 965 exposed to each other protocol entity (i.e., recipients, 966 intermediaries, and enablers)? Are there ways for protocol 967 implementers to choose to limit the information shared with each 968 entity? Are there operational controls available to limit the 969 information shared with each entity? 971 d. Fingerprinting. In many cases the specific ordering and/or 972 occurrences of information elements in a protocol allow users, 973 devices, or software using the protocol to be fingerprinted. Is 974 this protocol vulnerable to fingerprinting? If so, how? Can it 975 be designed to reduce or eliminate the vulnerability? If not, why 976 not? 978 e. Persistence of identifiers. What assumptions are made in the 979 protocol design about the lifetime of the identifiers discussed in 980 (a)? Does the protocol allow implementers or users to delete or 981 replace identifiers? How often does the specification recommend 982 to delete or replace identifiers by default? Can the identifiers, 983 along with other state information, be set to automatically 984 expire? 986 f. Correlation. Does the protocol allow for correlation of 987 identifiers? Are there expected ways that information exposed by 988 the protocol will be combined or correlated with information 989 obtained outside the protocol? How will such combination or 990 correlation facilitate fingerprinting of a user, device, or 991 application? Are there expected combinations or correlations with 992 outside data that will make users of the protocol more 993 identifiable? 995 g. Retention. Does the protocol or its anticipated uses require 996 that the information discussed in (a) or (b) be retained by 997 recipients, intermediaries, or enablers? If so, why? Is the 998 retention expected to be persistent or temporary? 1000 7.2. User Participation 1002 a. User control. What controls or consent mechanisms does the 1003 protocol define or require before personal data or identifiers are 1004 shared or exposed via the protocol? If no such mechanisms or 1005 controls are specified, is it expected that control and consent 1006 will be handled outside of the protocol? 1008 b. Control over sharing with individual recipients. Does the 1009 protocol provide ways for initiators to share different 1010 information with different recipients? If not, are there 1011 mechanisms that exist outside of the protocol to provide 1012 initiators with such control? 1014 c. Control over sharing with intermediaries. Does the protocol 1015 provide ways for initiators to limit which information is shared 1016 with intermediaries? If not, are there mechanisms that exist 1017 outside of the protocol to provide users with such control? Is it 1018 expected that users will have relationships that govern the use of 1019 the information (contractual or otherwise) with those who operate 1020 these intermediaries? 1021 d. Preference expression. Does the protocol provide ways for 1022 initiators to express individuals' preferences to recipients or 1023 intermediaries with regard to the collection, use, or disclosure 1024 of their personal data? 1026 7.3. Security 1028 a. Surveillance. How do the protocol's security considerations 1029 prevent surveillance, including eavesdropping and traffic 1030 analysis? 1032 b. Stored data compromise. How do the protocol's security 1033 considerations prevent or mitigate stored data compromise? 1035 c. Intrusion. How do the protocol's security considerations 1036 prevent or mitigate intrusion, including denial-of-service attacks 1037 and unsolicited communications more generally? 1039 d. Misattribution. How do the protocol's mechanisms for 1040 identifying and/or authenticating individuals prevent 1041 misattribution? 1043 7.4. General 1045 a. Trade-offs. Does the protocol make trade-offs between privacy 1046 and usability, privacy and efficiency, privacy and 1047 implementability, or privacy and other design goals? Describe the 1048 trade-offs and the rationale for the design chosen. 1050 b. Defaults. If the protocol can be operated in multiple modes 1051 or with multiple configurable options, does the default mode or 1052 option minimize the amount, identifiability, and persistence of 1053 the data and identifiers exposed by the protocol? Does the 1054 default mode or option maximize the opportunity for user 1055 participation? Does it provide the strictest security features of 1056 all the modes/options? If any of these answers are no, explain 1057 why less protective defaults were chosen. 1059 8. Example 1061 The following section gives an example of the threat analysis and 1062 threat mitigation recommended by this document. It covers a 1063 particularly difficult application protocol, presence, to try to 1064 demonstrate these principles on an architecture that is vulnerable to 1065 many of the threats described above. This text is not intended as an 1066 example of a Privacy Considerations section that might appear in an 1067 IETF specification, but rather as an example of the thinking that 1068 should go into the design of a protocol when considering privacy as a 1069 first principle. 1071 A presence service, as defined in the abstract in [RFC2778], allows 1072 users of a communications service to monitor one another's 1073 availability and disposition in order to make decisions about 1074 communicating. Presence information is highly dynamic, and generally 1075 characterizes whether a user is online or offline, busy or idle, away 1076 from communications devices or nearby, and the like. Necessarily, 1077 this information has certain privacy implications, and from the start 1078 the IETF approached this work with the aim to provide users with the 1079 controls to determine how their presence information would be shared. 1080 The Common Profile for Presence (CPP) [RFC3859] defines a set of 1081 logical operations for delivery of presence information. This 1082 abstract model is applicable to multiple presence systems. The SIP- 1083 based SIMPLE presence system [RFC3261] uses CPP as its baseline 1084 architecture, and the presence operations in the Extensible Messaging 1085 and Presence Protocol (XMPP) have also been mapped to CPP [RFC3922]. 1087 The fundamental architecture defined in RFC 2778 and RFC 3859 is a 1088 mediated one. Clients (presentities in RFC 2778 terms) publish their 1089 presence information to presence servers, which in turn distribute 1090 information to authorized watchers. Presence servers thus retain 1091 presence information for an interval of time, until it either changes 1092 or expires, so that it can be revealed to authorized watchers upon 1093 request. This architecture mirrors existing pre-standard deployment 1094 models. The integration of an explicit authorization mechanism into 1095 the presence architecture has been widely successful in involving the 1096 end users in the decision making process before sharing information. 1097 Nearly all presence systems deployed today provide such a mechanism, 1098 typically through a reciprocal authorization system by which a pair 1099 of users, when they agree to be "buddies," consent to divulge their 1100 presence information to one another. Buddylists are managed by 1101 servers but controlled by end users. Users can also explicitly block 1102 one another through a similar interface, and in some deployments it 1103 is desirable to provide "polite blocking" of various kinds. 1105 From a perspective of privacy design, however, the classical presence 1106 architecture represents nearly a worst-case scenario. In terms of 1107 data minimization, presentities share their sensitive information 1108 with presence services, and while services only share this presence 1109 information with watchers authorized by the user, no technical 1110 mechanism constrains those watchers from relaying presence to further 1111 third parties. Any of these entities could conceivably log or retain 1112 presence information indefinitely. The sensitivity cannot be 1113 mitigated by rendering the user anonymous, as it is indeed the 1114 purpose of the system to facilitate communications between users who 1115 know one another. The identifiers employed by users are long-lived 1116 and often contain personal information, including personal names and 1117 the domains of service providers. While users do participate in the 1118 construction of buddylists and blacklists, they do so with little 1119 prospect for accountability: the user effectively throws their 1120 presence information over the wall to a presence server that in turn 1121 distributes the information to watchers. Users typically have no way 1122 to verify that presence is being distributed only to authorized 1123 watchers, especially as it is the server that authenticates watchers, 1124 not the end user. Connections between the server and all publishers 1125 and consumers of presence data are moreover an attractive target for 1126 eavesdroppers, and require strong confidentiality mechanisms, though 1127 again the end user has no way to verify what mechanisms are in place 1128 between the presence server and a watcher. 1130 Moreover, the sensitivity of presence information is not limited to 1131 the disposition and capability to communicate. Capabilities can 1132 reveal the type of device that a user employs, for example, and since 1133 multiple devices can publish the same user's presence, there are 1134 significant risks of allowing attackers to correlate user devices. 1135 An important extension to presence was developed to enable the 1136 support for location sharing. The effort to standardize protocols 1137 for systems sharing geolocation was started in the GEOPRIV working 1138 group. During the initial requirements and privacy threat analysis 1139 in the process of chartering the working group, it became clear that 1140 the system would require an underlying communication mechanism 1141 supporting user consent to share location information. The 1142 resemblance of these requirements to the presence framework was 1143 quickly recognized, and this design decision was documented in 1144 [RFC4079]. Location information thus mingles with other presence 1145 information available through the system to intermediaries and to 1146 authorized watchers. 1148 Privacy concerns about presence information largely arise due to the 1149 built-in mediation of the presence architecture. The need for a 1150 presence server is motivated by two primary design requirements of 1151 presence: in the first place, the server can respond with an 1152 "offline" indication when the user is not online; in the second 1153 place, the server can compose presence information published by 1154 different devices under the user's control. Additionally, to 1155 preserve the use of URIs as identifiers for entities, some service 1156 must operate a host with the domain name appearing in a presence URI, 1157 and in practical terms no commercial presence architecture would 1158 force end users to own and operate their own domain names. Many end 1159 users of applications like presence are behind NATs or firewalls, and 1160 effectively cannot receive direct connections from the Internet - the 1161 persistent bidirectional channel these clients open and maintain with 1162 a presence server is essential to the operation of the protocol. 1164 One must first ask if the trade-off of mediation for presence is 1165 worth it. Does a server need to be in the middle of all publications 1166 of presence information? It might seem that end-to-end encryption of 1167 the presence information could solve many of these problems. A 1168 presentity could encrypt the presence information with the public key 1169 of a watcher, and only then send the presence information through the 1170 server. The IETF defined an object format for presence information 1171 called the Presence Information Data Format (PIDF), which for the 1172 purposes of conveying location information was extended to the PIDF 1173 Location Object (PIDF-LO) - these XML objects were designed to 1174 accommodate an encrypted wrapper. Encrypting this data would have 1175 the added benefit of preventing stored cleartext presence information 1176 from being seized by an attacker who manages to compromise a presence 1177 server. This proposal, however, quickly runs into usability 1178 problems. Discovering the public keys of watchers is the first 1179 difficulty, one that few Internet protocols have addressed 1180 successfully. This solution would then require the presentity to 1181 publish one encrypted copy of its presence information per authorized 1182 watcher to the presence service, regardless of whether or not a 1183 watcher is actively seeking presence information - for a presentity 1184 with many watchers, this may place an unacceptable burden on the 1185 presence server, especially given the dynamism of presence 1186 information. Finally, it prevents the server from composing presence 1187 information reported by multiple devices under the same user's 1188 control. On the whole, these difficulties render object encryption 1189 of presence information a doubtful prospect. 1191 Some protocols that provide presence information, such as SIP, can 1192 operate intermediaries in a redirecting mode, rather than a 1193 publishing or proxying mode. Instead of sending presence information 1194 through the server, in other words, these protocols can merely 1195 redirect watchers to the presentity, and then presence information 1196 could pass directly and securely from the presentity to the watcher. 1197 It is worth noting that this would disclose the IP address of the 1198 presentity to the watcher, which has its own set of risks. In that 1199 case, the presentity can decide exactly what information it would 1200 like to share with the watcher in question, it can authenticate the 1201 watcher itself with whatever strength of credential it chooses, and 1202 with end-to-end encryption it can reduce the likelihood of any 1203 eavesdropping. In a redirection architecture, a presence server 1204 could still provide the necessary "offline" indication, without 1205 requiring the presence server to observe and forward all information 1206 itself. This mechanism is more promising than encryption, but also 1207 suffers from significant difficulties. It too does not provide for 1208 composition of presence information from multiple devices - it in 1209 fact forces the watcher to perform this composition itself. The 1210 largest single impediment to this approach is however the difficulty 1211 of creating end-to-end connections between the presentity's device(s) 1212 and a watcher, as some or all of these endpoints may be behind NATs 1213 or firewalls that prevent peer-to-peer connections. While there are 1214 potential solutions for this problem, like STUN and TURN, they add 1215 complexity to the overall system. 1217 Consequently, mediation is a difficult feature of the presence 1218 architecture to remove, and due especially to the requirement for 1219 composition it is hard to minimize the data shared with 1220 intermediaries. Control over sharing with intermediaries must 1221 therefore come from some other explicit component of the 1222 architecture. As such, the presence work in the IETF focused on 1223 improving the user participation over the activities of the presence 1224 server. This work began in the GEOPRIV working group, with controls 1225 on location privacy, as location of users is perceived as having 1226 especially sensitive properties. With the aim to meet the privacy 1227 requirements defined in [RFC2779] a set of usage indications, such as 1228 whether retransmission is allowed or when the retention period 1229 expires, have been added to PIDF-LO that always travel with location 1230 information itself. These privacy preferences apply not only to the 1231 intermediaries that store and forward presence information, but also 1232 to the watchers who consume it. 1234 This approach very much follows the spirit of Creative Commons [CC], 1235 namely the usage of a limited number of conditions (such as 'Share 1236 Alike' [CC-SA]). Unlike Creative Commons, the GEOPRIV working group 1237 did not, however, initiate work to produce legal language nor to 1238 design graphical icons since this would fall outside the scope of the 1239 IETF. In particular, the GEOPRIV rules state a preference on the 1240 retention and retransmission of location information; while GEOPRIV 1241 cannot force any entity receiving a PIDF-LO object to abide by those 1242 preferences, if users lack the ability to express them at all, we can 1243 guarantee their preferences will not be honored. 1245 The retention and retransmission elements were envisioned as the most 1246 essential examples of preference expression in sharing presence. The 1247 PIDF object was designed for extensibility, and the rulesets created 1248 for PIDF-LO can also be extended to provide new expressions of user 1249 preference. Not all user preference information should be bound into 1250 a particular PIDF object, however - many forms of access control 1251 policy assumed by the presence architecture need to be provisioned in 1252 the presence server by some interface with the user. This 1253 requirement eventually triggered the standardization of a general 1254 access control policy language called the Common Policy (defined in 1255 [RFC4745]) framework. This language allows one to express ways to 1256 control the distribution of information as simple conditions, 1257 actions, and transformations rules expressed in an XML format. 1258 Common Policy itself is an abstract format which needs to be 1259 instantiated: two examples can be found with the Presence 1260 Authorization Rules [RFC5025] and the Geolocation Policy 1261 [I-D.ietf-geopriv-policy]. The former provides additional 1262 expressiveness for presence based systems, while the latter defines 1263 syntax and semantic for location based conditions and 1264 transformations. 1266 Ultimately, the privacy work on presence represents a compromise 1267 between privacy principles and the needs of the architecture and 1268 marketplace. While it was not feasible to remove intermediaries from 1269 the architecture entirely, nor to prevent their access to presence 1270 information, the IETF did provide a way for users to express their 1271 preferences and provision their controls at the presence service. We 1272 have not had great successes in the implementation space with privacy 1273 mechanisms thus far, but by documenting and acknowledging the 1274 limitations of these mechanisms, the designers were able to provide 1275 implementers, and end users, with an informed perspective on the 1276 privacy properties of the IETF's presence protocols. 1278 9. Security Considerations 1280 This document describes privacy aspects that protocol designers 1281 should consider in addition to regular security analysis. 1283 10. IANA Considerations 1285 This document does not require actions by IANA. 1287 11. Acknowledgements 1289 We would like to thank Christine Runnegar for her extensive helpful 1290 review comments. 1292 We would like to thank Scott Brim, Kasey Chappelle, Marc Linsner, 1293 Bryan McLaughlin, Nick Mathewson, Eric Rescorla, Scott Bradner, Nat 1294 Sakimura, Bjoern Hoehrmann, David Singer, Dean Willis, Christine 1295 Runnegar, Lucy Lynch, Trent Adams, Mark Lizar, Martin Thomson, Josh 1296 Howlett, Mischa Tuffield, S. Moonesamy, Zhou Sujing, Claudia Diaz, 1297 Leif Johansson, Jeff Hodges, Stephen Farrel, Steven Johnston, Cullen 1298 Jennings, Ted Hardie, Dave Thaler, and Klaas Wierenga. 1300 Finally, we would like to thank the participants for the feedback 1301 they provided during the December 2010 Internet Privacy workshop co- 1302 organized by MIT, ISOC, W3C and the IAB. 1304 12. IAB Members at the Time of Approval 1306 Bernard Aboba 1308 Jari Arkko 1310 Marc Blanchet 1312 Ross Callon 1314 Alissa Cooper 1316 Spencer Dawkins 1318 Joel Halpern 1320 Russ Housley 1322 David Kessens 1324 Danny McPherson 1326 Jon Peterson 1328 Dave Thaler 1330 Hannes Tschofenig 1332 13. Informative References 1334 [CC] Creative Commons, "Creative Commons", 2012. 1336 [CC-SA] Creative Commons, "Share Alike", 2012. 1338 [CoE] Council of Europe, "Recommendation CM/Rec(2010)13 of the 1339 Committee of Ministers to member states on the protection 1340 of individuals with regard to automatic processing of 1341 personal data in the context of profiling", available at 1342 (November 2010) , 1343 https://wcd.coe.int/ViewDoc.jsp?Ref=CM/Rec%282010%2913, 1344 2010. 1346 [EFF] Electronic Frontier Foundation, "Panopticlick", 2011. 1348 [FIPs] Gellman, B., "Fair Information Practices: A Basic 1349 History", 2012. 1351 [I-D.ietf-geopriv-policy] 1352 Schulzrinne, H., Tschofenig, H., Cuellar, J., Polk, J., 1353 Morris, J., and M. Thomson, "Geolocation Policy: A 1354 Document Format for Expressing Privacy Preferences for 1355 Location Information", draft-ietf-geopriv-policy-27 (work 1356 in progress), August 2012. 1358 [OECD] Organization for Economic Co-operation and Development, 1359 "OECD Guidelines on the Protection of Privacy and 1360 Transborder Flows of Personal Data", available at 1361 (September 2010) , http://www.oecd.org/EN/document/ 1362 0,,EN-document-0-nodirectorate-no-24-10255-0,00.html, 1363 1980. 1365 [PbD] Office of the Information and Privacy Commissioner, 1366 Ontario, Canada, "Privacy by Design", 2011. 1368 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 1369 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 1370 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 1372 [RFC2778] Day, M., Rosenberg, J., and H. Sugano, "A Model for 1373 Presence and Instant Messaging", RFC 2778, February 2000. 1375 [RFC2779] Day, M., Aggarwal, S., Mohr, G., and J. Vincent, "Instant 1376 Messaging / Presence Protocol Requirements", RFC 2779, 1377 February 2000. 1379 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1380 A., Peterson, J., Sparks, R., Handley, M., and E. 1381 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1382 June 2002. 1384 [RFC3325] Jennings, C., Peterson, J., and M. Watson, "Private 1385 Extensions to the Session Initiation Protocol (SIP) for 1386 Asserted Identity within Trusted Networks", RFC 3325, 1387 November 2002. 1389 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 1390 Text on Security Considerations", BCP 72, RFC 3552, 1391 July 2003. 1393 [RFC3748] Aboba, B., Blunk, L., Vollbrecht, J., Carlson, J., and H. 1394 Levkowetz, "Extensible Authentication Protocol (EAP)", 1395 RFC 3748, June 2004. 1397 [RFC3859] Peterson, J., "Common Profile for Presence (CPP)", 1398 RFC 3859, August 2004. 1400 [RFC3922] Saint-Andre, P., "Mapping the Extensible Messaging and 1401 Presence Protocol (XMPP) to Common Presence and Instant 1402 Messaging (CPIM)", RFC 3922, October 2004. 1404 [RFC4017] Stanley, D., Walker, J., and B. Aboba, "Extensible 1405 Authentication Protocol (EAP) Method Requirements for 1406 Wireless LANs", RFC 4017, March 2005. 1408 [RFC4079] Peterson, J., "A Presence Architecture for the 1409 Distribution of GEOPRIV Location Objects", RFC 4079, 1410 July 2005. 1412 [RFC4101] Rescorla, E. and IAB, "Writing Protocol Models", RFC 4101, 1413 June 2005. 1415 [RFC4187] Arkko, J. and H. Haverinen, "Extensible Authentication 1416 Protocol Method for 3rd Generation Authentication and Key 1417 Agreement (EAP-AKA)", RFC 4187, January 2006. 1419 [RFC4282] Aboba, B., Beadles, M., Arkko, J., and P. Eronen, "The 1420 Network Access Identifier", RFC 4282, December 2005. 1422 [RFC4745] Schulzrinne, H., Tschofenig, H., Morris, J., Cuellar, J., 1423 Polk, J., and J. Rosenberg, "Common Policy: A Document 1424 Format for Expressing Privacy Preferences", RFC 4745, 1425 February 2007. 1427 [RFC4918] Dusseault, L., "HTTP Extensions for Web Distributed 1428 Authoring and Versioning (WebDAV)", RFC 4918, June 2007. 1430 [RFC4949] Shirey, R., "Internet Security Glossary, Version 2", 1431 RFC 4949, August 2007. 1433 [RFC5025] Rosenberg, J., "Presence Authorization Rules", RFC 5025, 1434 December 2007. 1436 [RFC5077] Salowey, J., Zhou, H., Eronen, P., and H. Tschofenig, 1437 "Transport Layer Security (TLS) Session Resumption without 1438 Server-Side State", RFC 5077, January 2008. 1440 [RFC5106] Tschofenig, H., Kroeselberg, D., Pashalidis, A., Ohba, Y., 1441 and F. Bersani, "The Extensible Authentication Protocol- 1442 Internet Key Exchange Protocol version 2 (EAP-IKEv2) 1443 Method", RFC 5106, February 2008. 1445 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 1446 (TLS) Protocol Version 1.2", RFC 5246, August 2008. 1448 [RFC6269] Ford, M., Boucadair, M., Durand, A., Levis, P., and P. 1449 Roberts, "Issues with IP Address Sharing", RFC 6269, 1450 June 2011. 1452 [RFC6280] Barnes, R., Lepinski, M., Cooper, A., Morris, J., 1453 Tschofenig, H., and H. Schulzrinne, "An Architecture for 1454 Location and Location Privacy in Internet Applications", 1455 BCP 160, RFC 6280, July 2011. 1457 [RFC6302] Durand, A., Gashinsky, I., Lee, D., and S. Sheppard, 1458 "Logging Recommendations for Internet-Facing Servers", 1459 BCP 162, RFC 6302, June 2011. 1461 [RFC6350] Perreault, S., "vCard Format Specification", RFC 6350, 1462 August 2011. 1464 [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the 1465 Opus Audio Codec", RFC 6716, September 2012. 1467 [Solove] Solove, D., "Understanding Privacy", 2010. 1469 [Tor] The Tor Project, Inc., "Tor", 2011. 1471 [Westin] Kumaraguru, P. and L. Cranor, "Privacy Indexes: A Survey 1472 of Westin's Studies", 2005. 1474 Authors' Addresses 1476 Alissa Cooper 1477 CDT 1478 1634 Eye St. NW, Suite 1100 1479 Washington, DC 20006 1480 US 1482 Phone: +1-202-637-9800 1483 Email: acooper@cdt.org 1484 URI: http://www.cdt.org/ 1486 Hannes Tschofenig 1487 Nokia Siemens Networks 1488 Linnoitustie 6 1489 Espoo 02600 1490 Finland 1492 Phone: +358 (50) 4871445 1493 Email: Hannes.Tschofenig@gmx.net 1494 URI: http://www.tschofenig.priv.at 1496 Bernard Aboba 1497 Microsoft Corporation 1498 One Microsoft Way 1499 Redmond, WA 98052 1500 US 1502 Email: bernarda@microsoft.com 1504 Jon Peterson 1505 NeuStar, Inc. 1506 1800 Sutter St Suite 570 1507 Concord, CA 94520 1508 US 1510 Email: jon.peterson@neustar.biz 1512 John B. Morris, Jr. 1514 Email: ietf@jmorris.org 1515 Marit Hansen 1516 ULD Kiel 1518 Email: marit.hansen@datenschutzzentrum.de 1520 Rhys Smith 1521 JANET(UK) 1523 Email: rhys.smith@ja.net