idnits 2.17.1 draft-iab-privacy-considerations-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1301 has weird spacing: '... states on th...' == Line 1302 has weird spacing: '...cessing of...' -- The document date (October 22, 2012) is 4202 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'Chaum' is defined on line 1296, but no explicit reference was found in the text == Unused Reference: 'I-D.iab-identifier-comparison' is defined on line 1313, but no explicit reference was found in the text == Outdated reference: A later version (-09) exists of draft-iab-identifier-comparison-03 -- Obsolete informational reference (is this intentional?): RFC 2616 (Obsoleted by RFC 7230, RFC 7231, RFC 7232, RFC 7233, RFC 7234, RFC 7235) -- Obsolete informational reference (is this intentional?): RFC 4282 (Obsoleted by RFC 7542) -- Obsolete informational reference (is this intentional?): RFC 5077 (Obsoleted by RFC 8446) -- Obsolete informational reference (is this intentional?): RFC 5246 (Obsoleted by RFC 8446) Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group A. Cooper 3 Internet-Draft CDT 4 Intended status: Informational H. Tschofenig 5 Expires: April 25, 2013 Nokia Siemens Networks 6 B. Aboba 7 Microsoft Corporation 8 J. Peterson 9 NeuStar, Inc. 10 J. Morris 12 M. Hansen 13 ULD Kiel 14 R. Smith 15 JANET(UK) 16 October 22, 2012 18 Privacy Considerations for Internet Protocols 19 draft-iab-privacy-considerations-04.txt 21 Abstract 23 This document offers guidance for developing privacy considerations 24 for inclusion in IETF documents. It aims to make protocol designers 25 aware of privacy-related design choices. It suggests that whether 26 any individual RFC requires a specific privacy considerations section 27 will depend on the document's content. 29 Discussion of this document is taking place on the IETF Privacy 30 Discussion mailing list (see 31 https://www.ietf.org/mailman/listinfo/ietf-privacy). 33 Status of this Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at http://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on April 25, 2013. 50 Copyright Notice 52 Copyright (c) 2012 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 69 2.1. Entities . . . . . . . . . . . . . . . . . . . . . . . . . 6 70 2.2. Data and Analysis . . . . . . . . . . . . . . . . . . . . 7 71 2.3. Identifiability . . . . . . . . . . . . . . . . . . . . . 7 72 3. Communications Model . . . . . . . . . . . . . . . . . . . . . 10 73 4. Privacy Threats . . . . . . . . . . . . . . . . . . . . . . . 12 74 4.1. Combined Security-Privacy Threats . . . . . . . . . . . . 12 75 4.1.1. Surveillance . . . . . . . . . . . . . . . . . . . . . 12 76 4.1.2. Stored Data Compromise . . . . . . . . . . . . . . . . 13 77 4.1.3. Intrusion . . . . . . . . . . . . . . . . . . . . . . 13 78 4.1.4. Misattribution . . . . . . . . . . . . . . . . . . . . 13 79 4.2. Privacy-Specific Threats . . . . . . . . . . . . . . . . . 14 80 4.2.1. Correlation . . . . . . . . . . . . . . . . . . . . . 14 81 4.2.2. Identification . . . . . . . . . . . . . . . . . . . . 15 82 4.2.3. Secondary Use . . . . . . . . . . . . . . . . . . . . 15 83 4.2.4. Disclosure . . . . . . . . . . . . . . . . . . . . . . 16 84 4.2.5. Exclusion . . . . . . . . . . . . . . . . . . . . . . 16 85 5. Threat Mitigations . . . . . . . . . . . . . . . . . . . . . . 18 86 5.1. Data Minimization . . . . . . . . . . . . . . . . . . . . 18 87 5.1.1. Anonymity . . . . . . . . . . . . . . . . . . . . . . 18 88 5.1.2. Pseudonymity . . . . . . . . . . . . . . . . . . . . . 19 89 5.1.3. Identity Confidentiality . . . . . . . . . . . . . . . 20 90 5.1.4. Data Minimization within Identity Management . . . . . 20 91 5.2. User Participation . . . . . . . . . . . . . . . . . . . . 21 92 5.3. Security . . . . . . . . . . . . . . . . . . . . . . . . . 21 93 6. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 94 7. Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . 25 95 7.1. Data Minimization . . . . . . . . . . . . . . . . . . . . 25 96 7.2. User Participation . . . . . . . . . . . . . . . . . . . . 26 97 7.3. Security . . . . . . . . . . . . . . . . . . . . . . . . . 27 98 7.4. General . . . . . . . . . . . . . . . . . . . . . . . . . 27 99 8. Example . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 100 9. Security Considerations . . . . . . . . . . . . . . . . . . . 33 101 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 34 102 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 35 103 12. Informative References . . . . . . . . . . . . . . . . . . . . 36 104 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 40 106 1. Introduction 108 [RFC3552] provides detailed guidance to protocol designers about both 109 how to consider security as part of protocol design and how to inform 110 readers of IETF documents about security issues. This document 111 intends to provide a similar set of guidance for considering privacy 112 in protocol design. 114 Privacy is a complicated concept with a rich history that spans many 115 disciplines. With regard to data, often it is a concept applied to 116 "personal data," information relating to an identified or 117 identifiable individual. Many sets of privacy principles and privacy 118 design frameworks have been developed in different forums over the 119 years. These include the Fair Information Practices [FIPs], a 120 baseline set of privacy protections pertaining to the collection and 121 use of personal data (often based on the principles established in 122 [OECD], for example), and the Privacy by Design concept, which 123 provides high-level privacy guidance for systems design (see [PbD] 124 for one example). The guidance provided in this document is inspired 125 by this prior work, but it aims to be more concrete, pointing 126 protocol designers to specific engineering choices that can impact 127 the privacy of the individuals that make use of Internet protocols. 129 Different people have radically different conceptions of what privacy 130 means, both in general, and as it relates to them personally 131 [Westin]. Furthermore, privacy as a legal concept is understood 132 differently in different jurisdictions. The guidance provided in 133 this document is generic and can be used to inform the design of any 134 protocol to be used anywhere in the world, without reference to 135 specific legal frameworks. 137 Whether any individual document will require a specific privacy 138 considerations section will depend on the document's content. 139 Documents whose entire focus is privacy may not merit a separate 140 section (for example, [RFC3325]). For certain specifications, 141 privacy considerations are a subset of security considerations and 142 can be discussed explicitly in the security considerations section. 143 Some documents will not require discussion of privacy considerations 144 (for example, [RFC6716]). The guidance provided here can and should 145 be used to assess the privacy considerations of protocol, 146 architectural, and operational specifications and to decide whether 147 those considerations are to be documented in a stand-alone section, 148 within the security considerations section, or throughout the 149 document. 151 This document is organized as follows. Section 2 explains the 152 terminology used in this document. Section 3 reviews typical 153 communications architectures to understand at which points there may 154 be privacy threats. Section 4 discusses threats to privacy as they 155 apply to Internet protocols. Section 5 outlines mitigations of those 156 threats. Section 6 describes the extent to which the guidance 157 offered is applicable within the IETF. Section 7 provides the 158 guidelines for analyzing and documenting privacy considerations 159 within IETF specifications. Section 8 examines the privacy 160 characteristics of an IETF protocol to demonstrate the use of the 161 guidance framework. 163 2. Terminology 165 This section defines basic terms used in this document, with 166 references to pre-existing definitions as appropriate. As in 167 [RFC4949], each entry is preceded by a dollar sign ($) and a space 168 for automated searching. Note that this document does not try to 169 attempt to define the term 'privacy' itself. Instead privacy is the 170 sum of what is contained in this document. We therefore follow the 171 approach taken by [RFC3552]. 173 2.1. Entities 175 Several of these terms are further elaborated in Section 3. 177 $ Attacker: An entity that intentionally works against some 178 protection goal. 180 $ Eavesdropper: A type of attacker that passively observes an 181 initiator's communications without the initiator's knowledge or 182 authorization. See [RFC4949]. 184 $ Enabler: A protocol entity that facilitates communication between 185 an initiator and a recipient without being directly in the 186 communications path. 188 $ Individual: A human being. 190 $ Initiator: A protocol entity that initiates communications with a 191 recipient. 193 $ Intermediary: A protocol entity that sits between the initiator 194 and the recipient and is necessary for the initiator and recipient 195 to communicate. Unlike an eavesdropper, an intermediary is an 196 entity that is part of the communication architecture. For 197 example, a SIP proxy is an intermediary in the SIP architecture. 199 $ Observer: An entity that is able to observe and collect 200 information from communications, potentially posing privacy 201 threats depending on the context. As defined in this document, 202 initiators, recipients, intermediaries, and enablers can all be 203 observers. Observers are distinguished from eavesdroppers by 204 being at least tacitly authorized. 206 $ Recipient: A protocol entity that receives communications from an 207 initiator. 209 2.2. Data and Analysis 211 $ Correlation: The combination of various pieces of information 212 relating to an individual. 214 $ Fingerprint: A set of information elements that identifies a 215 device or application instance. 217 $ Fingerprinting: The process of an observer or attacker uniquely 218 identifying (with a sufficiently high probability) a device or 219 application instance based on multiple information elements 220 communicated to the observer or attacker. See [EFF]. 222 $ Item of Interest (IOI): Any data item that an observer or 223 attacker might be interested in. This includes attributes, 224 identifiers, identities, communications content, and the fact that 225 a communication interaction has taken place. 227 $ Personal Data: Any information relating to an individual who can 228 be identified, directly or indirectly. 230 $ (Protocol) Interaction: A unit of communication within a 231 particular protocol. A single interaction may be compromised of a 232 single message between an initiator and recipient or multiple 233 messages, depending on the protocol. 235 $ Traffic Analysis: The inference of information from observation 236 of traffic flows (presence, absence, amount, direction, and 237 frequency). See [RFC4949]. 239 $ Undetectability: The inability of an observer or attacker to 240 sufficiently distinguish whether an item of interest exists or 241 not. 243 $ Unlinkability: Within a particular set of information, the 244 inability of an observer or attacker to distinguish whether two 245 items of interest are related or not (with a high enough degree of 246 probability to be useful to the observer or attacker). 248 2.3. Identifiability 250 $ Anonymity: The state of being anonymous. 252 $ Anonymity Set: A set of individuals that have the same 253 attributes, making them indistinguishable from each other from the 254 perspective a particular attacker or observer. 256 $ Anonymous: A state of an individual in which an observer or 257 attacker cannot identify the individual within a set of other 258 individuals (the anonymity set). 260 $ Attribute: A property of an individuals. 262 $ Identifiable: A state in which a individual's identity is capable 263 of being known to an observer or attacker. 265 $ Identifiability: The extent to which an individual is 266 identifiable. 268 $ Identified: A state in which an individual's identity is known. 270 $ Identifier: A data object uniquely referring to a specific 271 identity of a protocol entity or individual in some context. See 272 [RFC4949]. Identifiers can be based upon natural names -- 273 official names, personal names, and/or nicknames -- or can be 274 artificial (for example, x9z32vb). However, identifiers are by 275 definition unique within their context of use, while natural names 276 are often not unique. 278 $ Identification: The linking of information to a particular 279 individual to infer the individual's identity or that allows the 280 inference of the individual's identity in some context. 282 $ Identity: Any subset of an individual's attributes, including 283 names, that identifies the individual within a given context. 284 Individuals usually have multiple identities for use in different 285 contexts. 287 $ Identity Confidentiality: A property of an individual wherein any 288 party other than the recipient cannot sufficiently identify the 289 individual within a set of other individuals (the anonymity set). 290 This is a desirable property of authentication protocols. 292 $ Identity Provider: An entity (usually an organization) that is 293 responsible for establishing, maintaining, securing, and vouching 294 for the identity associated with individuals. 296 $ Official Name: A personal name for an individual which is 297 registered in some official context. For example, the name on an 298 individual's birth certificate. 300 $ Personal Name: A natural name for an individual. Personal names 301 are often not unique, and often comprise given names in 302 combination with a family name. An individual may have multiple 303 personal names at any time and over a lifetime, including official 304 names. From a technological perspective, it cannot always be 305 determined whether a given reference to an individual is, or is 306 based upon, the individual's personal name(s) (see Pseudonym). 308 $ Pseudonym: A name assumed by an individual in some context, 309 unrelated to the individual's personal names known by others in 310 that context, with an intent of not revealing the individual's 311 identities associated with her other names. 313 $ Pseudonymity: The state of being pseudonymous. 315 $ Pseudonymous: A property of an individual in which the individual 316 is identified by a pseudonym. 318 $ Real name: See personal name and official name. 320 $ Relying party: An entity that relies on assertions of 321 individuals' identities from identity providers in order to 322 provide services to individuals. In effect, the relying party 323 delegates aspects of identity management to the identity 324 provider(s). Such delegation requires protocol exchanges, trust, 325 and a common understanding of semantics of information exchanged 326 between the relying party and the identity provider. 328 3. Communications Model 330 To understand attacks in the privacy-harm sense, it is helpful to 331 consider the overall communication architecture and different actors' 332 roles within it. Consider a protocol entity, the "initiator", that 333 initiates communication with some recipient. Privacy analysis is 334 most relevant for protocols with use cases in which the initiator 335 acts on behalf of an individual (or different individuals at 336 different times). It is this individual whose privacy is potentially 337 threatened. 339 Communications may be direct between the initiator and the recipient, 340 or they may involve an application-layer intermediary (such as a 341 proxy or cache) that is necessary for the two parties to communicate. 342 In some cases this intermediary stays in the communication path for 343 the entire duration of the communication and sometimes it is only 344 used for communication establishment, for either inbound or outbound 345 communication. In rare cases there may be a series of intermediaries 346 that are traversed. At lower layers, additional entities are 347 involved in packet forwarding that may interfere with privacy 348 protection goals as well. 350 Some communications tasks require multiple protocol interactions with 351 different entities. For example, a request to an HTTP server may be 352 preceded by an interaction between the initiator and an 353 Authentication, Authorization, and Accounting (AAA) server for 354 network access and to a DNS server for name resolution. In this 355 case, the HTTP server is the recipient and the other entities are 356 enablers of the initiator-to-recipient communication. Similarly, a 357 single communication with the recipient might generate further 358 protocol interactions between either the initiator or the recipient 359 and other entities, and the roles of the entities might change with 360 each interaction. For example, an HTTP request might trigger 361 interactions with an authentication server or with other resource 362 servers wherein the recipient becomes an initiator in those later 363 interactions. 365 Thus, when conducting privacy analysis of an architecture that 366 involves multiple communications phases, the entities involved may 367 take on different -- or opposing -- roles from a privacy 368 considerations perspective in each phase. Understanding the privacy 369 implications of the architecture as a whole may require a separate 370 analysis of each phase. 372 Protocol design is often predicated on the notion that recipients, 373 intermediaries, and enablers are assumed to be authorized to receive 374 and handle data from initiators. As [RFC3552] explains, "we assume 375 that the end-systems engaging in a protocol exchange have not 376 themselves been compromised." However, by its nature privacy 377 analysis requires questioning this assumption since systems are often 378 compromised for the purpose of obtaining personal data. 380 Although recipients, intermediaries, and enablers may not generally 381 be considered as attackers, they may all pose privacy threats 382 (depending on the context) because they are able to observe, collect, 383 process, and transfer privacy-relevant data. These entities are 384 collectively described below as "observers" to distinguish them from 385 traditional attackers. From a privacy perspective, one important 386 type of attacker is an eavesdropper: an entity that passively 387 observes the initiator's communications without the initiator's 388 knowledge or authorization. 390 The threat descriptions in the next section explain how observers and 391 attackers might act to harm individuals' privacy. Different kinds of 392 attacks may be feasible at different points in the communications 393 path. For example, an observer could mount surveillance or 394 identification attacks between the initiator and intermediary, or 395 instead could surveil an enabler (e.g., by observing DNS queries from 396 the initiator). 398 4. Privacy Threats 400 Privacy harms come in a number of forms, including harms to financial 401 standing, reputation, solitude, autonomy, and safety. A victim of 402 identity theft or blackmail, for example, may suffer a financial loss 403 as a result. Reputational harm can occur when disclosure of 404 information about an individual, whether true or false, subjects that 405 individual to stigma, embarrassment, or loss of personal dignity. 406 Intrusion or interruption of an individual's life or activities can 407 harm the individual's ability to be left alone. When individuals or 408 their activities are monitored, exposed, or at risk of exposure, 409 those individuals may be stifled from expressing themselves, 410 associating with others, and generally conducting their lives freely. 411 They may also feel a general sense of unease, in that it is "creepy" 412 to be monitored or to have data collected about them. In cases where 413 such monitoring is for the purpose of stalking or violence (for 414 example, monitoring communications to or from a domestic abuse 415 shelter), it can put individuals in physical danger. 417 This section lists common privacy threats (drawing liberally from 418 [Solove], as well as [CoE]), showing how each of them may cause 419 individuals to incur privacy harms and providing examples of how 420 these threats can exist on the Internet. 422 Some privacy threats are already considered in IETF protocols as a 423 matter of routine security analysis. Others are more pure privacy 424 threats that existing security considerations do not usually address. 425 The threats described here are divided into those that may also be 426 considered security threats and those that are primarily privacy 427 threats. 429 Note that an individual's awareness of and consent to the practices 430 described below can greatly affect the extent to which they threaten 431 privacy. If an individual authorizes surveillance of his own 432 activities, for example, the harms associated with it may be 433 mitigated, or the individual may accept the risk of harm. 435 4.1. Combined Security-Privacy Threats 437 4.1.1. Surveillance 439 Surveillance is the observation or monitoring of an individual's 440 communications or activities. The effects of surveillance on the 441 individual can range from anxiety and discomfort to behavioral 442 changes such as inhibition and self-censorship to the perpetration of 443 violence against the individual. The individual need not be aware of 444 the surveillance for it to impact privacy -- the possibility of 445 surveillance may be enough to harm individual autonomy. 447 Surveillance can be conducted by observers or eavesdroppers at any 448 point along the communications path. Confidentiality protections (as 449 discussed in [RFC3552] Section 3) are necessary to prevent 450 surveillance of the content of communications. To prevent traffic 451 analysis or other surveillance of communications patterns, other 452 measures may be necessary, such as [Tor]. 454 4.1.2. Stored Data Compromise 456 End systems that do not take adequate measures to secure stored data 457 from unauthorized or inappropriate access expose individuals to 458 potential financial, reputational, or physical harm. 460 Protecting against stored data compromise is typically outside the 461 scope of IETF protocols. However, a number of common protocol 462 functions -- key management, access control, or operational logging, 463 for example -- require the storage of data about initiators of 464 communications. When requiring or recommending that information 465 about initiators or their communications be stored or logged by end 466 systems (see, e.g., RFC 6302), it is important to recognize the 467 potential for that information to be compromised and for that 468 potential to be weighed against the benefits of data storage. Any 469 recipient, intermediary, or enabler that stores data may be 470 vulnerable to compromise. 472 4.1.3. Intrusion 474 Intrusion consists of invasive acts that disturb or interrupt one's 475 life or activities. Intrusion can thwart individuals' desires to be 476 let alone, sap their time or attention, or interrupt their 477 activities. This threat is focused on intrusion into one's life 478 rather than direct intrusion into one's communications. The latter 479 is captured in Section 4.1.1. 481 Unsolicited messages and denial-of-service attacks are the most 482 common types of intrusion on the Internet. Intrusion can be 483 perpetrated by any attacker that is capable of sending unwanted 484 traffic to the initiator. 486 4.1.4. Misattribution 488 Misattribution occurs when data or communications related to one 489 individual are attributed to another. Misattribution can result in 490 adverse reputational, financial, or other consequences for 491 individuals that are misidentified. 493 Misattribution in the protocol context comes as a result of using 494 inadequate or insecure forms of identity or authentication. For 495 example, as [RFC6269] notes, abuse mitigation is often conducted on 496 the basis of source IP address, such that connections from individual 497 IP addresses may be prevented or temporarily blacklisted if abusive 498 activity is determined to be sourced from those addresses. However, 499 in the case where a single IP address is shared by multiple 500 individuals, those penalties may be suffered by all individuals 501 sharing the address, even if they were not involved in the abuse. 502 This threat can be mitigated by using identity management mechanisms 503 with proper forms of authentication (ideally with cryptographic 504 properties) so that actions can be attributed uniquely to an 505 individual to provide the basis for accountability without generating 506 false-positives. 508 4.2. Privacy-Specific Threats 510 4.2.1. Correlation 512 Correlation is the combination of various pieces of information 513 related to an individual. Correlation can defy people's expectations 514 of the limits of what others know about them. It can increase the 515 power that those doing the correlating have over individuals as well 516 as correlators' ability to pass judgment, threatening individual 517 autonomy and reputation. 519 Correlation is closely related to identification. Internet protocols 520 can facilitate correlation by allowing individuals' activities to be 521 tracked and combined over time. The use of persistent or 522 infrequently replaced identifiers at any layer of the stack can 523 facilitate correlation. For example, an initiator's persistent use 524 of the same device ID, certificate, or email address across multiple 525 interactions could allow recipients to correlate all of the 526 initiator's communications over time. 528 As an example, consider Transport Layer Security (TLS) session 529 resumption [RFC5246] or TLS session resumption without server side 530 state [RFC5077]. In RFC 5246 [RFC5246] a server provides the client 531 with a session_id in the ServerHello message and caches the 532 master_secret for later exchanges. When the client initiates a new 533 connection with the server it re-uses the previously obtained 534 session_id in its ClientHello message. The server agrees to resume 535 the session by using the same session_id and the previously stored 536 master_secret for the generation of the TLS Record Layer security 537 association. RFC 5077 [RFC5077] borrows from the session resumption 538 design idea but the server encapsulates all state information into a 539 ticket instead of caching it. An attacker who is able to observe the 540 protocol exchanges between the TLS client and the TLS server is able 541 to link the initial exchange to subsequently resumed TLS sessions 542 when the session_id and the ticket are exchanged in the clear (which 543 is the case with data exchanged in the initial handshake messages). 545 In theory any observer or attacker that receives an initiator's 546 communications can engage in correlation. The extent of the 547 potential for correlation will depend on what data the entity 548 receives from the initiator and has access to otherwise. Often, 549 intermediaries only require a small amount of information for message 550 routing and/or security. In theory, protocol mechanisms could ensure 551 that end-to-end information is not made accessible to these entities, 552 but in practice the difficulty of deploying end-to-end security 553 procedures, additional messaging or computational overhead, and other 554 business or legal requirements often slow or prevent the deployment 555 of end-to-end security mechanisms, giving intermediaries greater 556 exposure to initiators' data than is strictly necessary from a 557 technical point of view. 559 4.2.2. Identification 561 Identification is the linking of information to a particular 562 individual. In some contexts it is perfectly legitimate to identify 563 individuals, whereas in others identification may potentially stifle 564 individuals' activities or expression by inhibiting their ability to 565 be anonymous or pseudonymous. Identification also makes it easier 566 for individuals to be explicitly controlled by others (e.g., 567 governments) and to be treated differentially compared to other 568 individuals. 570 Many protocols provide functionality to convey the idea that some 571 means has been provided to guarantee that entities are who they claim 572 to be. Often, this is accomplished with cryptographic 573 authentication. Furthermore, many protocol identifiers, such as 574 those used in SIP or XMPP, may allow for the direct identification of 575 individuals. Protocol identifiers may also contribute indirectly to 576 identification via correlation. For example, a web site that does 577 not directly authenticate users may be able to match its HTTP header 578 logs with logs from another site that does authenticate users, 579 rendering users on the first site identifiable. 581 As with correlation, any observer or attacker may be able to engage 582 in identification depending on the information about the initiator 583 that is available via the protocol mechanism or other channels. 585 4.2.3. Secondary Use 587 Secondary use is the use of collected information without the 588 individual's consent for a purpose different from that for which the 589 information was collected. Secondary use may violate people's 590 expectations or desires. The potential for secondary use can 591 generate uncertainty over how one's information will be used in the 592 future, potentially discouraging information exchange in the first 593 place. 595 One example of secondary use would be a network access server that 596 uses an initiator's access requests to track the initiator's 597 location. Any observer or attacker could potentially make unwanted 598 secondary uses of initiators' data. Protecting against secondary use 599 is typically outside the scope of IETF protocols. 601 4.2.4. Disclosure 603 Disclosure is the revelation of information about an individual that 604 affects the way others judge the individual. Disclosure can violate 605 individuals' expectations of the confidentiality of the data they 606 share. The threat of disclosure may deter people from engaging in 607 certain activities for fear of reputational harm, or simply because 608 they do not wish to be observed. 610 Any observer or attacker that receives data about an initiator may 611 engage in disclosure. Sometimes disclosure is unintentional because 612 system designers do not realize that information being exchanged 613 relates to individuals. The most common way for protocols to limit 614 disclosure is by providing access control mechanisms (discussed in 615 the next section). A further example is provided by the IETF 616 geolocation privacy architecture [RFC6280], which supports a way for 617 users to express a preference that their location information not be 618 disclosed beyond the intended recipient. 620 4.2.5. Exclusion 622 Exclusion is the failure to allow individuals to know about the data 623 that others have about them and to participate in its handling and 624 use. Exclusion reduces accountability on the part of entities that 625 maintain information about people and creates a sense of 626 vulnerability about individuals' ability to control how information 627 about them is collected and used. 629 The most common way for Internet protocols to be involved in 630 enforcing exclusion is through access control mechanisms. The 631 presence architecture developed in the IETF is a good example where 632 individuals are included in the control of information about them. 633 Using a rules expression language (e.g., Presence Authorization Rules 634 [RFC5025]), presence clients can authorize the specific conditions 635 under which their presence information may be shared. 637 Exclusion is primarily considered problematic when the recipient 638 fails to involve the initiator in decisions about data collection, 639 handling, and use. Eavesdroppers engage in exclusion by their very 640 nature since their data collection and handling practices are covert. 642 5. Threat Mitigations 644 Privacy is notoriously difficult to measure and quantify. The extent 645 to which a particular protocol, system, or architecture "protects" or 646 "enhances" privacy is dependent on a large number of factors relating 647 to its design, use, and potential misuse. However, there are certain 648 widely recognized classes of mitigations against the threats 649 discussed in Section 4. This section describes three categories of 650 relevant mitigations: (1) data minimization, (2) user participation, 651 and (3) security. The privacy mitigations described in this chapter 652 can loosely be mapped to existing privacy principles, such as the 653 Fair Information Practices, but they have been adapted to fit the 654 target audience of this document. 656 5.1. Data Minimization 658 Data minimization refers to collecting, using, disclosing, and 659 storing the minimal data necessary to perform a task. The less data 660 about individuals that gets exchanged in the first place, the lower 661 the chances of that data being misused or leaked. 663 Data minimization can be effectuated in a number of different ways, 664 including by limiting collection, use, disclosure, retention, 665 identifiability, sensitivity, and access to personal data. Limiting 666 the data collected by protocol elements only to what is necessary 667 (collection limitation) is the most straightforward way to help 668 reduce privacy risks associated with the use of the protocol. In 669 some cases, protocol designers may also be able to recommend limits 670 to the use or retention of data, although protocols themselves are 671 not often capable of controlling these properties. 673 However, the most direct application of data minimization to protocol 674 design is limiting identifiability. Reducing the identifiability of 675 data by using pseudonyms or no identifiers at all helps to weaken the 676 link between an individual and his or her communications. Allowing 677 for the periodic creation of new identifiers reduces the possibility 678 that multiple protocol interactions or communications can be 679 correlated back to the same individual. The following sections 680 explore a number of different properties related to identifiability 681 that protocol designers may seek to achieve. 683 (Threats mitigated: surveillance, stored data compromise, 684 correlation, identification, secondary use, disclosure) 686 5.1.1. Anonymity 688 To enable anonymity of an individual, there must exist a set of 689 individuals with potentially the same attributes. To the attacker or 690 the observer these individuals must appear indistinguishable from 691 each other. The set of all such individuals is known as the 692 anonymity set and membership of this set may vary over time. 694 The composition of the anonymity set depends on the knowledge of the 695 observer or attacker. Thus anonymity is relative with respect to the 696 observer or attacker. An initiator may be anonymous only within a 697 set of potential initiators -- its initiator anonymity set -- which 698 itself may be a subset of all individuals that may initiate 699 communications. Conversely, a recipient may be anonymous only within 700 a set of potential recipients -- its recipient anonymity set. Both 701 anonymity sets may be disjoint, may overlap, or may be the same. 703 As an example, consider RFC 3325 (P-Asserted-Identity, PAI) 704 [RFC3325], an extension for the Session Initiation Protocol (SIP), 705 that allows an individual, such as a VoIP caller, to instruct an 706 intermediary that he or she trusts not to populate the SIP From 707 header field with the individual's authenticated and verified 708 identity. The recipient of the call, as well as any other entity 709 outside of the individual's trust domain, would therefore only learn 710 that the SIP message (typically a SIP INVITE) was sent with a header 711 field 'From: "Anonymous" ' rather 712 than the individual's address-of-record, which is typically thought 713 of as the "public address" of the user. When PAI is used, the 714 individual becomes anonymous within the initiator anonymity set that 715 is populated by every individual making use of that specific 716 intermediary. 718 Note that this example ignores the fact that other personal data may 719 be inferred from the other SIP protocol payloads. This caveat makes 720 the analysis of the specific protocol extension easier but cannot be 721 assumed when conducting analysis of an entire architecture. 723 5.1.2. Pseudonymity 725 In the context of IETF protocols, almost all identifiers can be 726 nicknames or pseudonyms since there is typically no requirement to 727 use personal names in protocols. However, in certain scenarios it is 728 reasonable to assume that personal names will be used (with vCard 729 [RFC6350], for example). 731 Pseudonymity is strengthened when less personal data can be linked to 732 the pseudonym; when the same pseudonym is used less often and across 733 fewer contexts; and when independently chosen pseudonyms are more 734 frequently used for new actions (making them, from an observer's or 735 attacker's perspective, unlinkable). 737 For Internet protocols it is important whether protocols allow 738 pseudonyms to be changed without human interaction, the default 739 length of pseudonym lifetimes, to whom pseudonyms are exposed, how 740 individuals are able to control disclosure, how often pseudonyms can 741 be changed, and the consequences of changing them. 743 5.1.3. Identity Confidentiality 745 An initiator has identity confidentiality when any party other than 746 the recipient cannot sufficiently identify the initiator within the 747 anonymity set. Identity confidentiality aims to provide a protection 748 against eavesdroppers and intermediaries rather than the intended 749 communication end points. 751 As an example, consider the network access authentication procedures 752 utilizing the Extensible Authentication Protocol (EAP) [RFC3748]. 753 EAP includes an identity exchange where the Identity Response is 754 primarily used for routing purposes and selecting which EAP method to 755 use. Since EAP Identity Requests and Responses are sent in 756 cleartext, eavesdroppers and intermediaries along the communication 757 path between the EAP peer and the EAP server can snoop on the 758 identity. To address this threat, as discussed in RFC 4282 759 [RFC4282], the user's identity can be hidden against these 760 eavesdroppers and intermediaries with the cryptographic support 761 offered by EAP methods. Identity confidentiality has become a 762 recommended design criteria for EAP (see [RFC4017]). EAP-AKA 763 [RFC4187], for example, protects the EAP peer's identity against 764 passive adversaries by utilizing temporal identities. EAP-IKEv2 765 [RFC5106] is an example of an EAP method that offers protection 766 against active attackers with regard to the individual's identity. 768 5.1.4. Data Minimization within Identity Management 770 Modern systems are increasingly relying on multi-party transactions 771 to authenticate individuals. Many of these systems make use of an 772 identity provider that is responsible for providing authentication, 773 authorization, and accounting functionality to relying parties that 774 offer some protected resources. To facilitate these functions an 775 identity provider will usually go through a process of verifying the 776 individual's identity and issuing credentials to the individual. 777 When an individual seeks to make use of a service provided by the 778 relying party, the relying party relies on the authentication 779 assertions provided by its identity provider. Note that in more 780 sophisticated scenarios the authentication assertions are traits that 781 demonstrate the individual's capabilities and roles. The 782 authorization responsibility may also be shared between the identity 783 provider and the relying party and does not necessarily only need to 784 reside with the identity provider. 786 Such systems have the ability to support a number of properties that 787 minimize data collection in different ways: 789 In certain use cases relying parties do not need to know the real 790 name of an individual (for example, when the individual's age is 791 the only attribute that needs to be authenticated). 793 Relying parties that collude can be prevented from using an 794 individual's credentials to track the individual. That is, two 795 different relying parties can be prevented from determining that 796 the same individual has authenticated to both of them. This 797 typically requires identity management protocol support and as 798 well as support by both the relying party and the identity 799 provider. 801 The identity provider can be prevented from knowing which relying 802 parties an individual interacted with. This requires avoiding 803 direct communication between the identity provider and the relying 804 party at the time when access to a resource by the initiator is 805 made. 807 5.2. User Participation 809 As explained in Section 4.2.5, data collection and use that happens 810 "in secret," without the individual's knowledge, is apt to violate 811 the individual's expectation of privacy and may create incentives for 812 misuse of data. As a result, privacy regimes tend to include 813 provisions to require informing individuals about data collection and 814 use and involving them in decisions about the treatment of their 815 data. In an engineering context, supporting the goal of user 816 participation usually means providing ways for users to control the 817 data that is shared about them. It may also mean providing ways for 818 users to signal how they expect their data to be used and shared. 819 (Threats mitigated: surveillance, secondary use, disclosure, 820 exclusion) 822 5.3. Security 824 Keeping data secure at rest and in transit is another important 825 component of privacy protection. As they are described in [RFC3552] 826 Section 2, a number of security goals also serve to enhance privacy: 828 o Confidentiality: Keeping data secret from unintended listeners. 830 o Peer entity authentication: Ensuring that the endpoint of a 831 communication is the one that is intended (in support of 832 maintaining confidentiality). 834 o Unauthorized usage: Limiting data access to only those users who 835 are authorized. (Note that this goal also falls within data 836 minimization.) 838 o Inappropriate usage: Limiting how authorized users can use data. 839 (Note that this goal also falls within data minimization.) 841 Note that even when these goals are achieved, the existence of items 842 of interest -- attributes, identifiers, identities, communications, 843 actions (such as the sending or receiving of a communication), or 844 anything else an attacker or observer might be interested in -- may 845 still be detectable, even if they are not readable. Thus 846 undetectability, in which an observer or attacker cannot sufficiently 847 distinguish whether an item of interest exists or not, may be 848 considered as a further security goal (albeit one that can be 849 extremely difficult to accomplish). 851 (Threats mitigated: surveillance, stored data compromise, 852 misattribution, secondary use, disclosure, intrusion) 854 6. Scope 856 The core function of IETF activity is standardizing protocols. 857 Internet protocols are often built flexibly, making them useful in a 858 variety of architectures, contexts, and deployment scenarios without 859 requiring significant interdependency between disparately designed 860 components. Although protocol designers often have a particular 861 target architecture or set of architectures in mind at design time, 862 it is not uncommon for architectural frameworks to develop later, 863 after implementations exist and have been deployed in combination 864 with other protocols or components to form complete systems. 866 As a consequence, the extent to which protocol designers can foresee 867 all of the privacy implications of a particular protocol at design 868 time is limited. An individual protocol may be relatively benign on 869 its own, and it may make use of privacy and security features at 870 lower layers of the protocol stack (Internet Protocol Security, 871 Transport Layer Security, and so forth) to mitigate the risk of 872 attack. But when deployed within a larger system or used in a way 873 not envisioned at design time, its use may create new privacy risks. 874 Protocols are often implemented and deployed long after design time 875 by different people than those who did the protocol design. The 876 guidelines in Section 7 ask protocol designers to consider how their 877 protocols are expected to interact with systems and information that 878 exist outside the protocol bounds, but not to imagine every possible 879 deployment scenario. 881 Furthermore, in many cases the privacy properties of a system are 882 dependent upon the complete system design where various protocols are 883 combined together to form a product solution; the implementation, 884 which includes the user interface design; and operational deployment 885 practices, including default privacy settings and security processes 886 within the company doing the deployment. These details are specific 887 to particular instantiations and generally outside the scope of the 888 work conducted in the IETF. The guidance provided here may be useful 889 in making choices about these details, but its primary aim is to 890 assist with the design, implementation, and operation of protocols. 892 Transparency of data collection and use -- often effectuated through 893 user interface design -- is normally a key factor in determining the 894 privacy impact of a system. Although most IETF activities do not 895 involve standardizing user interfaces or user-facing communications, 896 in some cases understanding expected user interactions can be 897 important for protocol design. Unexpected user behavior may have an 898 adverse impact on security and/or privacy. 900 In sum, privacy issues, even those related to protocol development, 901 go beyond the technical guidance discussed herein. As an example, 902 consider HTTP [RFC2616], which was designed to allow the exchange of 903 arbitrary data. A complete analysis of the privacy considerations 904 for uses of HTTP might include what type of data is exchanged, how 905 this data is stored, and how it is processed. Hence the analysis for 906 an individual's static personal web page would be different than the 907 use of HTTP for exchanging health records. A protocol designer 908 working on HTTP extensions (such as WebDAV [RFC4918]) is not expected 909 to describe the privacy risks derived from all possible usage 910 scenarios, but rather the privacy properties specific to the 911 extensions and any particular uses of the extensions that are 912 expected and foreseen at design time. 914 7. Guidelines 916 This section provides guidance for document authors in the form of a 917 questionnaire about a protocol being designed. The questionnaire may 918 be useful at any point in the design process, particularly after 919 document authors have developed a high-level protocol model as 920 described in [RFC4101]. 922 Note that the guidance does not recommend specific practices. The 923 range of protocols developed in the IETF is too broad to make 924 recommendations about particular uses of data or how privacy might be 925 balanced against other design goals. However, by carefully 926 considering the answers to each question, document authors should be 927 able to produce a comprehensive analysis that can serve as the basis 928 for discussion of whether the protocol adequately protects against 929 privacy threats. 931 The framework is divided into four sections that address each of the 932 mitigation classes from Section 5, plus a general section. Security 933 is not fully elaborated since substantial guidance already exists in 934 [RFC3552]. 936 7.1. Data Minimization 938 a. Identifiers. What identifiers does the protocol use for 939 distinguishing initiators of communications? Does the protocol 940 use identifiers that allow different protocol interactions to be 941 correlated? What identifiers could be omitted or be made less 942 identifying while still fulfilling the protocol's goals? 944 b. Data. What information does the protocol expose about 945 individuals, their devices, and/or their device usage (other than 946 the identifiers discussed in (a))? To what extent is this 947 information linked to the identities of the individuals? How does 948 the protocol combine personal data with the identifiers discussed 949 in (a)? 951 c. Observers. Which information discussed in (a) and (b) is 952 exposed to each other protocol entity (i.e., recipients, 953 intermediaries, and enablers)? Are there ways for protocol 954 implementers to choose to limit the information shared with each 955 entity? Are there operational controls available to limit the 956 information shared with each entity? 958 d. Fingerprinting. In many cases the specific ordering and/or 959 occurrences of information elements in a protocol allow users, 960 devices, or software using the protocol to be fingerprinted. Is 961 this protocol vulnerable to fingerprinting? If so, how? Can it 962 be designed to reduce or eliminate the vulnerability? If not, why 963 not? 965 e. Persistence of identifiers. What assumptions are made in the 966 protocol design about the lifetime of the identifiers discussed in 967 (a)? Does the protocol allow implementers or users to delete or 968 replace identifiers? How often does the specification recommend 969 to delete or replace identifiers by default? Can the identifiers, 970 along with other state information, be set to automatically 971 expire? 973 f. Correlation. Does the protocol allow for correlation of 974 identifiers? Are there expected ways that information exposed by 975 the protocol will be combined or correlated with information 976 obtained outside the protocol? How will such combination or 977 correlation facilitate fingerprinting of a user, device, or 978 application? Are there expected combinations or correlations with 979 outside data that will make users of the protocol more 980 identifiable? 982 g. Retention. Does the protocol or its anticipated uses require 983 that the information discussed in (a) or (b) be retained by 984 recipients, intermediaries, or enablers? If so, why? Is the 985 retention expected to be persistent or temporary? 987 7.2. User Participation 989 a. User control. What controls or consent mechanisms does the 990 protocol define or require before personal data or identifiers are 991 shared or exposed via the protocol? If no such mechanisms or 992 controls are specified, is it expected that control and consent 993 will be handled outside of the protocol? 995 b. Control over sharing with individual recipients. Does the 996 protocol provide ways for initiators to share different 997 information with different recipients? If not, are there 998 mechanisms that exist outside of the protocol to provide 999 initiators with such control? 1001 c. Control over sharing with intermediaries. Does the protocol 1002 provide ways for initiators to limit which information is shared 1003 with intermediaries? If not, are there mechanisms that exist 1004 outside of the protocol to provide users with such control? Is it 1005 expected that users will have relationships that govern the use of 1006 the information (contractual or otherwise) with those who operate 1007 these intermediaries? 1008 d. Preference expression. Does the protocol provide ways for 1009 initiators to express individuals' preferences to recipients or 1010 intermediaries with regard to the collection, use, or disclosure 1011 of their personal data? 1013 7.3. Security 1015 a. Surveillance. How do the protocol's security considerations 1016 prevent surveillance, including eavesdropping and traffic 1017 analysis? 1019 b. Stored data compromise. How do the protocol's security 1020 considerations prevent or mitigate stored data compromise? 1022 c. Intrusion. How do the protocol's security considerations 1023 prevent or mitigate intrusion, including denial-of-service attacks 1024 and unsolicited communications more generally? 1026 d. Misattribution. How do the protocol's mechanisms for 1027 identifying and/or authenticating individuals prevent 1028 misattribution? 1030 7.4. General 1032 a. Trade-offs. Does the protocol make trade-offs between privacy 1033 and usability, privacy and efficiency, privacy and 1034 implementability, or privacy and other design goals? Describe the 1035 trade-offs and the rationale for the design chosen. 1037 b. Defaults. If the protocol can be operated in multiple modes 1038 or with multiple configurable options, does the default mode or 1039 option minimize the amount, identifiability, and persistence of 1040 the data and identifiers exposed by the protocol? Does the 1041 default mode or option maximize the opportunity for user 1042 participation? Does it provide the strictest security features of 1043 all the modes/options? If any of these answers are no, explain 1044 why less protective defaults were chosen. 1046 8. Example 1048 The following section gives an example of the threat analysis and 1049 threat mitigation recommended by this document. It covers a 1050 particularly difficult application protocol, presence, to try to 1051 demonstrate these principles on an architecture that is vulnerable to 1052 many of the threats described above. This text is not intended as an 1053 example of a Privacy Considerations section that might appear in an 1054 IETF specification, but rather as an example of the thinking that 1055 should go into the design of a protocol when considering privacy as a 1056 first principle. 1058 A presence service, as defined in the abstract in [RFC2778], allows 1059 users of a communications service to monitor one another's 1060 availability and disposition in order to make decisions about 1061 communicating. Presence information is highly dynamic, and generally 1062 characterizes whether a user is online or offline, busy or idle, away 1063 from communications devices or nearby, and the like. Necessarily, 1064 this information has certain privacy implications, and from the start 1065 the IETF approached this work with the aim to provide users with the 1066 controls to determine how their presence information would be shared. 1067 The Common Profile for Presence (CPP) [RFC3859] defines a set of 1068 logical operations for delivery of presence information. This 1069 abstract model is applicable to multiple presence systems. The SIP- 1070 based SIMPLE presence system [RFC3261] uses CPP as its baseline 1071 architecture, and the presence operations in the Extensible Messaging 1072 and Presence Protocol (XMPP) have also been mapped to CPP [RFC3922]. 1074 The fundamental architecture defined in RFC 2778 and RFC 3859 is a 1075 mediated one. Clients (presentities in RFC 2778 terms) publish their 1076 presence information to presence servers, which in turn distribute 1077 information to authorized watchers. Presence servers thus retain 1078 presence information for an interval of time, until it either changes 1079 or expires, so that it can be revealed to authorized watchers upon 1080 request. This architecture mirrors existing pre-standard deployment 1081 models. The integration of an explicit authorization mechanism into 1082 the presence architecture has been widely successful in involving the 1083 end users in the decision making process before sharing information. 1084 Nearly all presence systems deployed today provide such a mechanism, 1085 typically through a reciprocal authorization system by which a pair 1086 of users, when they agree to be "buddies," consent to divulge their 1087 presence information to one another. Buddylists are managed by 1088 servers but controlled by end users. Users can also explicit block 1089 one another through a similar interface, and in some deployments it 1090 is desirable to provide "polite blocking" of various kinds. 1092 From a perspective of privacy design, however, the classical presence 1093 architecture represents nearly a worst-case scenario. In terms of 1094 data minimization, presentities share their sensitive information 1095 with presence services, and while services only share this presence 1096 information with watchers authorized by the user, no technical 1097 mechanism constrains those watchers from relaying presence to further 1098 third parties. Any of these entities could conceivable log or retain 1099 presence information indefinitely. The sensitivity cannot be 1100 mitigated by rendering the user anonymous, as it is indeed the 1101 purpose of the system to facilitate communications between users who 1102 know one another. The identifiers employed by users are long-lived 1103 and often contain personal information, including personal names and 1104 the domains of service providers. While users do participate in the 1105 construction of buddylists and blacklists, they do so with little 1106 prospect for accountability: the user effectively throws their 1107 presence information over the wall to a presence server that in turn 1108 distributes the information to watchers. Users typically have no way 1109 to verify that presence is being distributed only to authorized 1110 watchers, especially as it is the server that authenticates watchers, 1111 not the end user. Connections between the server and all publishers 1112 and consumers of presence data are moreover an attractive target for 1113 eavesdroppers, and require strong confidentiality mechanisms, though 1114 again the end user has no way to verify what mechanisms are in place 1115 between the presence server and a watcher. 1117 Moreover, the sensitivity of presence information is not limited to 1118 the disposition and capability to communicate. Capability can reveal 1119 the type of device that a user employs, for example, and since 1120 multiple devices can publish the same user's presence, there are 1121 significant risks of allowing attackers to correlate user devices. 1122 An important extension to presence was developed to enable the 1123 support for location sharing. The effort to standardize protocols 1124 for systems sharing geolocation was started in the GEOPRIV working 1125 group. During the initial requirements and privacy threat analysis 1126 in the process of chartering the working group, it became clear that 1127 the system would require an underlying communication mechanism 1128 supporting user consent to share location information. The 1129 resemblance of these requirements to the presence framework was 1130 quickly recognized, and this design decision was documented in 1131 [RFC4079]. Location information thus mingles with other presence 1132 information available through the system to intermediaries and to 1133 authorized watchers. 1135 Privacy concerns about presence information largely arise due to the 1136 built-in mediation of the presence architecture. The need for a 1137 presence server is motivated by two primary design requirements of 1138 presence: in the first place, the server can respond with an 1139 "offline" indication when the user is not online; in the second 1140 place, the server can compose presence information published by 1141 different devices under the user's control. Additionally, to 1142 preserve the use of URIs as identifiers for entities, some service 1143 must operate a host with the domain name appearing in a presence URI, 1144 and in practical terms no commercial presence architecture would 1145 force end users to own and operate their own domain names. Many end 1146 users of applications like presence are behind NATs or firewalls, and 1147 effectively cannot receive direct connections from the Internet - the 1148 persistent bidirectional channel these clients open and maintain with 1149 a presence server is essential to the operation of the protocol. 1151 One must first ask if the trade-off of mediation for presence is 1152 worth it. Does a server need to be in the middle of all publications 1153 of presence information? It might seem that end-to-end encryption of 1154 the presence information could solve many of these problems. A 1155 presentity could encrypt the presence information with the public key 1156 of a watcher, and only then send the presence information through the 1157 server. The IETF defined an object format for presence information 1158 called the Presence Information Data Format (PIDF), which for the 1159 purposes of conveying location information was extended to the PIDF 1160 Location Object (PIDF-LO) - these XML objects were designed to 1161 accommodate an encrypted wrapper. Encrypting this data would have 1162 the added benefit of preventing stored cleartext presence information 1163 from being seized by an attacker who manages to compromise a presence 1164 server. This proposal, however, quickly runs into usability 1165 problems. Discovering the public keys of watchers is the first 1166 difficulty, one that few Internet protocols have addressed 1167 successfully. This solution would then require the presentity to 1168 publish one encrypted copy of its presence information per authorized 1169 watcher to the presence service, regardless of whether or not a 1170 watcher is actively seeking presence information - for a presentity 1171 with many watchers, this may place an unacceptable burden on the 1172 presence server, especially given the dynamism of presence 1173 information. Finally, it prevents the server from composing presence 1174 information reported by multiple devices under the same user's 1175 control. On the whole, these difficulties render object encryption 1176 of presence information a doubtful prospect. 1178 Some protocols that provide presence information, such as SIP, can 1179 operate intermediaries in a redirecting mode, rather than a 1180 publishing or proxying mode. Instead of sending presence information 1181 through the server, in other words, these protocols can merely 1182 redirect watchers to the presentity, and then presence information 1183 could pass directly and securely from the presentity to the watcher. 1184 In that case, the presentity can decide exactly what information it 1185 would like to share with the watcher in question, it can authenticate 1186 the watcher itself with whatever strength of credential it chooses, 1187 and with end-to-end encryption it can reduce the likelihood of any 1188 eavesdropping. In a redirection architecture, a presence server 1189 could still provide the necessary "offline" indication, without 1190 requiring the presence server to observe and forward all information 1191 itself. This mechanism is more promising than encryption, but also 1192 suffers from significant difficulties. It too does not provide for 1193 composition of presence information from multiple devices - it in 1194 fact forces the watcher to perform this composition itself, which may 1195 lead to unexpected results. The largest single impediment to this 1196 approach is however the difficulty of creating end-to-end connections 1197 between the presentity's device(s) and a watcher, as some or all of 1198 these endpoints may be behind NATs or firewalls that prevent peer-to- 1199 peer connections. While there are potential solutions for this 1200 problem, like STUN and TURN, they add complexity to the overall 1201 system. 1203 Consequently, mediation is a difficult feature of the presence 1204 architecture to remove, and due especially to the requirement for 1205 composition it is hard to minimize the data shared with 1206 intermediaries. Control over sharing with intermediaries must 1207 therefore come from some other explicit component of the 1208 architecture. As such, the presence work in the IETF focused on 1209 improving the user participation over the activities of the presence 1210 server. This work began in the GEOPRIV working group, with controls 1211 on location privacy, as location of users is perceived as having 1212 especially sensitive properties. With the aim to meet the privacy 1213 requirements defined in [RFC2779] a set of usage indications, such as 1214 whether retransmission is allowed or when the retention period 1215 expires, have been added to PIDF-LO that always travel with location 1216 information itself. These privacy preferences apply not only to the 1217 intermediaries that store and forward presence information, but also 1218 to the watchers who consume it. 1220 This approach very much follows the spirit of Creative Commons [CC], 1221 namely the usage of a limited number of conditions (such as 'Share 1222 Alike' [CC-SA]). Unlike Creative Commons, the GEOPRIV working group 1223 did not, however, initiate work to produce legal language nor to 1224 design graphical icons since this would fall outside the scope of the 1225 IETF. In particular, the GEOPRIV rules state a preference on the 1226 retention and retransmission of location information; while GEOPRIV 1227 cannot force any entity receiving a PIDF-LO object to abide by those 1228 preferences, if users lack the ability to express them at all, we can 1229 guarantee their preferences will not be honored. 1231 The retention and retransmission elements were envisioned as the only 1232 first and most essential examples of preference expression in sharing 1233 presence. The PIDF object was designed for extensibility, and the 1234 rulesets created for PIDF-LO can also be extended to provide new 1235 expressions of user preference. Not all user preference information 1236 should be bound into a particular PIDF object, however - many forms 1237 of access control policy assumed by the presence architecture need to 1238 be provisioned in the presence server by some interface with the 1239 user. This requirement eventually triggered the standardization of a 1240 general access control policy language called the Common Policy 1241 (defined in [RFC4745]) framework. This language allows one to 1242 express ways to control the distribution of information as simple 1243 conditions, actions, and transformations rules expressed in an XML 1244 format. Common Policy itself is an abstract format which needs to be 1245 instantiated: two examples can be found with the Presence 1246 Authorization Rules [RFC5025] and the Geolocation Policy 1247 [I-D.ietf-geopriv-policy]. The former provides additional 1248 expressiveness for presence based systems, while the latter defines 1249 syntax and semantic for location based conditions and 1250 transformations. 1252 Ultimately, the privacy work on presence represents a compromise 1253 between privacy principles and the needs of the architecture and 1254 marketplace. While it was not feasible to remove intermediaries from 1255 the architecture entirely, nor to prevent their access to presence 1256 information, the IETF did provide a way for users to express their 1257 preferences and provision their controls at the presence service. We 1258 have not had great successes in the implementation space with privacy 1259 mechanisms thus far, but by documenting and acknowledging the 1260 limitations of these mechanisms, the designers were able to provide 1261 implementers, and end users, with an informed perspective on the 1262 privacy properties of the IETF's presence protocols. 1264 9. Security Considerations 1266 This document describes privacy aspects that protocol designers 1267 should consider in addition to regular security analysis. 1269 10. IANA Considerations 1271 This document does not require actions by IANA. 1273 11. Acknowledgements 1275 We would like to thank Christine Runnegar for her extensive helpful 1276 review comments. 1278 We would like to thank Scott Brim, Kasey Chappelle, Marc Linsner, 1279 Bryan McLaughlin, Nick Mathewson, Eric Rescorla, Scott Bradner, Nat 1280 Sakimura, Bjoern Hoehrmann, David Singer, Dean Willis, Christine 1281 Runnegar, Lucy Lynch, Trent Adams, Mark Lizar, Martin Thomson, Josh 1282 Howlett, Mischa Tuffield, S. Moonesamy, Zhou Sujing, Claudia Diaz, 1283 Leif Johansson, Jeff Hodges, Stephen Farrel, Steven Johnston, Cullen 1284 Jennings, Ted Hardie, and Klaas Wierenga. 1286 Finally, we would like to thank the participants for the feedback 1287 they provided during the December 2010 Internet Privacy workshop co- 1288 organized by MIT, ISOC, W3C and the IAB. 1290 12. Informative References 1292 [CC] Creative Commons, "Creative Commons", 2012. 1294 [CC-SA] Creative Commons, "Share Alike", 2012. 1296 [Chaum] Chaum, D., "Untraceable Electronic Mail, Return Addresses, 1297 and Digital Pseudonyms", Communications of the ACM , 24/2, 1298 84-88, 1981. 1300 [CoE] Council of Europe, "Recommendation CM/Rec(2010)13 of the 1301 Committee of Ministers to member states on the protection 1302 of individuals with regard to automatic processing of 1303 personal data in the context of profiling", available at 1304 (November 2010) , 1305 https://wcd.coe.int/ViewDoc.jsp?Ref=CM/Rec%282010%2913, 1306 2010. 1308 [EFF] Electronic Frontier Foundation, "Panopticlick", 2011. 1310 [FIPs] Gellman, B., "Fair Information Practices: A Basic 1311 History", 2012. 1313 [I-D.iab-identifier-comparison] 1314 Thaler, D., "Issues in Identifier Comparison for Security 1315 Purposes", draft-iab-identifier-comparison-03 (work in 1316 progress), July 2012. 1318 [I-D.ietf-geopriv-policy] 1319 Schulzrinne, H., Tschofenig, H., Cuellar, J., Polk, J., 1320 Morris, J., and M. Thomson, "Geolocation Policy: A 1321 Document Format for Expressing Privacy Preferences for 1322 Location Information", draft-ietf-geopriv-policy-27 (work 1323 in progress), August 2012. 1325 [OECD] Organization for Economic Co-operation and Development, 1326 "OECD Guidelines on the Protection of Privacy and 1327 Transborder Flows of Personal Data", available at 1328 (September 2010) , http://www.oecd.org/EN/document/ 1329 0,,EN-document-0-nodirectorate-no-24-10255-0,00.html, 1330 1980. 1332 [PbD] Office of the Information and Privacy Commissioner, 1333 Ontario, Canada, "Privacy by Design", 2011. 1335 [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., 1336 Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext 1337 Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. 1339 [RFC2778] Day, M., Rosenberg, J., and H. Sugano, "A Model for 1340 Presence and Instant Messaging", RFC 2778, February 2000. 1342 [RFC2779] Day, M., Aggarwal, S., Mohr, G., and J. Vincent, "Instant 1343 Messaging / Presence Protocol Requirements", RFC 2779, 1344 February 2000. 1346 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1347 A., Peterson, J., Sparks, R., Handley, M., and E. 1348 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1349 June 2002. 1351 [RFC3325] Jennings, C., Peterson, J., and M. Watson, "Private 1352 Extensions to the Session Initiation Protocol (SIP) for 1353 Asserted Identity within Trusted Networks", RFC 3325, 1354 November 2002. 1356 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC 1357 Text on Security Considerations", BCP 72, RFC 3552, 1358 July 2003. 1360 [RFC3748] Aboba, B., Blunk, L., Vollbrecht, J., Carlson, J., and H. 1361 Levkowetz, "Extensible Authentication Protocol (EAP)", 1362 RFC 3748, June 2004. 1364 [RFC3859] Peterson, J., "Common Profile for Presence (CPP)", 1365 RFC 3859, August 2004. 1367 [RFC3922] Saint-Andre, P., "Mapping the Extensible Messaging and 1368 Presence Protocol (XMPP) to Common Presence and Instant 1369 Messaging (CPIM)", RFC 3922, October 2004. 1371 [RFC4017] Stanley, D., Walker, J., and B. Aboba, "Extensible 1372 Authentication Protocol (EAP) Method Requirements for 1373 Wireless LANs", RFC 4017, March 2005. 1375 [RFC4079] Peterson, J., "A Presence Architecture for the 1376 Distribution of GEOPRIV Location Objects", RFC 4079, 1377 July 2005. 1379 [RFC4101] Rescorla, E. and IAB, "Writing Protocol Models", RFC 4101, 1380 June 2005. 1382 [RFC4187] Arkko, J. and H. Haverinen, "Extensible Authentication 1383 Protocol Method for 3rd Generation Authentication and Key 1384 Agreement (EAP-AKA)", RFC 4187, January 2006. 1386 [RFC4282] Aboba, B., Beadles, M., Arkko, J., and P. Eronen, "The 1387 Network Access Identifier", RFC 4282, December 2005. 1389 [RFC4745] Schulzrinne, H., Tschofenig, H., Morris, J., Cuellar, J., 1390 Polk, J., and J. Rosenberg, "Common Policy: A Document 1391 Format for Expressing Privacy Preferences", RFC 4745, 1392 February 2007. 1394 [RFC4918] Dusseault, L., "HTTP Extensions for Web Distributed 1395 Authoring and Versioning (WebDAV)", RFC 4918, June 2007. 1397 [RFC4949] Shirey, R., "Internet Security Glossary, Version 2", 1398 RFC 4949, August 2007. 1400 [RFC5025] Rosenberg, J., "Presence Authorization Rules", RFC 5025, 1401 December 2007. 1403 [RFC5077] Salowey, J., Zhou, H., Eronen, P., and H. Tschofenig, 1404 "Transport Layer Security (TLS) Session Resumption without 1405 Server-Side State", RFC 5077, January 2008. 1407 [RFC5106] Tschofenig, H., Kroeselberg, D., Pashalidis, A., Ohba, Y., 1408 and F. Bersani, "The Extensible Authentication Protocol- 1409 Internet Key Exchange Protocol version 2 (EAP-IKEv2) 1410 Method", RFC 5106, February 2008. 1412 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 1413 (TLS) Protocol Version 1.2", RFC 5246, August 2008. 1415 [RFC6269] Ford, M., Boucadair, M., Durand, A., Levis, P., and P. 1416 Roberts, "Issues with IP Address Sharing", RFC 6269, 1417 June 2011. 1419 [RFC6280] Barnes, R., Lepinski, M., Cooper, A., Morris, J., 1420 Tschofenig, H., and H. Schulzrinne, "An Architecture for 1421 Location and Location Privacy in Internet Applications", 1422 BCP 160, RFC 6280, July 2011. 1424 [RFC6350] Perreault, S., "vCard Format Specification", RFC 6350, 1425 August 2011. 1427 [RFC6716] Valin, JM., Vos, K., and T. Terriberry, "Definition of the 1428 Opus Audio Codec", RFC 6716, September 2012. 1430 [Solove] Solove, D., "Understanding Privacy", 2010. 1432 [Tor] The Tor Project, Inc., "Tor", 2011. 1434 [Westin] Kumaraguru, P. and L. Cranor, "Privacy Indexes: A Survey 1435 of Westin's Studies", 2005. 1437 Authors' Addresses 1439 Alissa Cooper 1440 CDT 1441 1634 Eye St. NW, Suite 1100 1442 Washington, DC 20006 1443 US 1445 Phone: +1-202-637-9800 1446 Email: acooper@cdt.org 1447 URI: http://www.cdt.org/ 1449 Hannes Tschofenig 1450 Nokia Siemens Networks 1451 Linnoitustie 6 1452 Espoo 02600 1453 Finland 1455 Phone: +358 (50) 4871445 1456 Email: Hannes.Tschofenig@gmx.net 1457 URI: http://www.tschofenig.priv.at 1459 Bernard Aboba 1460 Microsoft Corporation 1461 One Microsoft Way 1462 Redmond, WA 98052 1463 US 1465 Email: bernarda@microsoft.com 1467 Jon Peterson 1468 NeuStar, Inc. 1469 1800 Sutter St Suite 570 1470 Concord, CA 94520 1471 US 1473 Email: jon.peterson@neustar.biz 1475 John B. Morris, Jr. 1477 Email: ietf@jmorris.org 1478 Marit Hansen 1479 ULD Kiel 1481 Email: marit.hansen@datenschutzzentrum.de 1483 Rhys Smith 1484 JANET(UK) 1486 Email: rhys.smith@ja.net