idnits 2.17.1 draft-peterson-message-identity-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3667, Section 5.1 on line 14. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1988. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1965. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1972. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1978. ** Found boilerplate matching RFC 3978, Section 5.4, paragraph 1 (on line 1994), which is fine, but *also* found old RFC 2026, Section 10.4C, paragraph 1 text on line 36. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement -- however, there's a paragraph with a matching beginning. Boilerplate error? ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate instead of verbatim RFC 3978 boilerplate. After 6 May 2005, submission of drafts without verbatim RFC 3978 boilerplate is not accepted. The following non-3978 patterns matched text found in the document. That text should be removed or replaced: By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, or will be disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 18, 2004) is 7129 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '1' is defined on line 1605, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 1608, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 1614, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 1622, but no explicit reference was found in the text == Unused Reference: '15' is defined on line 1651, but no explicit reference was found in the text == Unused Reference: '26' is defined on line 1689, but no explicit reference was found in the text == Unused Reference: '27' is defined on line 1692, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 821 (ref. '1') (Obsoleted by RFC 2821) -- Obsolete informational reference (is this intentional?): RFC 822 (ref. '2') (Obsoleted by RFC 2822) -- Obsolete informational reference (is this intentional?): RFC 2821 (ref. '4') (Obsoleted by RFC 5321) -- Obsolete informational reference (is this intentional?): RFC 2822 (ref. '5') (Obsoleted by RFC 5322) -- Obsolete informational reference (is this intentional?): RFC 2111 (ref. '9') (Obsoleted by RFC 2392) -- Obsolete informational reference (is this intentional?): RFC 3280 (ref. '12') (Obsoleted by RFC 5280) -- Obsolete informational reference (is this intentional?): RFC 3921 (ref. '13') (Obsoleted by RFC 6121) -- Obsolete informational reference (is this intentional?): RFC 3851 (ref. '17') (Obsoleted by RFC 5751) -- No information found for draft-crocker-mail-arch - is the name correct? == Outdated reference: A later version (-13) exists of draft-josefsson-dns-url-10 == Outdated reference: A later version (-06) exists of draft-ietf-sip-identity-03 -- Obsolete informational reference (is this intentional?): RFC 2543 (ref. '27') (Obsoleted by RFC 3261, RFC 3262, RFC 3263, RFC 3264, RFC 3265) Summary: 6 errors (**), 0 flaws (~~), 11 warnings (==), 17 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group J. Peterson 2 Internet-Draft NeuStar 3 Expires: April 18, 2005 October 18, 2004 5 Security Considerations for Impersonation and Identity in Messaging 6 Systems 7 draft-peterson-message-identity-00 9 Status of this Memo 11 By submitting this Internet-Draft, I certify that any applicable 12 patent or other IPR claims of which I am aware have been disclosed, 13 and any of which I become aware will be disclosed, in accordance with 14 RFC 3668. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as 19 Internet-Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on April 18, 2005. 34 Copyright Notice 36 Copyright (C) The Internet Society (2004). All Rights Reserved. 38 Abstract 40 This document provides an overview of the concept of identity in 41 Internet messaging systems as a means of preventing impersonation. 42 It describes the architectural roles necessary to provide identity, 43 and details some approaches to the generation of identity assertions 44 and the transmission of such assertions within messages. The 45 trade-offs of various design decisions are explained. 47 Table of Contents 49 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 50 1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 51 2. What is Identity? . . . . . . . . . . . . . . . . . . . . . . 5 52 3. Roles in an Identity System . . . . . . . . . . . . . . . . . 6 53 3.1 Identity provider . . . . . . . . . . . . . . . . . . . . 6 54 3.2 Verifier . . . . . . . . . . . . . . . . . . . . . . . . . 8 55 4. Threat Model of Impersonation in Messaging Systems . . . . . . 8 56 5. Identity Assertions . . . . . . . . . . . . . . . . . . . . . 10 57 6. Keying for Assertions . . . . . . . . . . . . . . . . . . . . 11 58 6.1 Asymmetric Keys . . . . . . . . . . . . . . . . . . . . . 11 59 6.1.1 Certificates . . . . . . . . . . . . . . . . . . . . . 12 60 6.1.2 Uncertified Public Keys . . . . . . . . . . . . . . . 13 61 6.2 Symmetric Keys . . . . . . . . . . . . . . . . . . . . . . 15 62 7. User-based and Domain-based Assertions . . . . . . . . . . . . 15 63 7.1 Name Subordination . . . . . . . . . . . . . . . . . . . . 17 64 8. Reference Indicators and Replay Protection . . . . . . . . . . 18 65 8.1 Canonicalization versus Replication . . . . . . . . . . . 19 66 8.2 Assertion Constraints and Scope . . . . . . . . . . . . . 21 67 9. Placement of Assertions and Keys in Messages . . . . . . . . . 25 68 9.1 Assertions in the Envelope . . . . . . . . . . . . . . . . 26 69 9.2 Assertions in the Content . . . . . . . . . . . . . . . . 27 70 9.3 Distributing Keys by-Reference or by-Value . . . . . . . . 28 71 9.4 Distributing Assertions by-Reference . . . . . . . . . . . 31 72 10. Privacy and Anonymity . . . . . . . . . . . . . . . . . . . 31 73 11. Conclusion: Consensus Points and Questions . . . . . . . . . 32 74 12. Security Considerations . . . . . . . . . . . . . . . . . . 34 75 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . 34 76 Author's Address . . . . . . . . . . . . . . . . . . . . . . . 36 77 A. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 36 78 B. Verification Assertions . . . . . . . . . . . . . . . . . . . 37 79 C. Messaging: Real-Time versus Store-and-Forward . . . . . . . . 37 80 D. Third-Party Assertions . . . . . . . . . . . . . . . . . . . . 38 81 E. Alternatives to Identity Assertions . . . . . . . . . . . . . 39 82 E.1 Trusted Intermediary Networks . . . . . . . . . . . . . . 39 83 14. Informative References . . . . . . . . . . . . . . . . . . . 34 84 E.2 Dial-back Identity . . . . . . . . . . . . . . . . . . . . 40 85 Intellectual Property and Copyright Statements . . . . . . . . 42 87 1. Introduction 89 Widespread forgery of the From header field of email [5] messages is 90 the most immediate motivation for work on message identity systems. 91 However, there are numerous other messaging systems used on the 92 Internet that currently confront similar problems, or are likely to 93 confront these problems in the future; notably instant messaging 94 systems and other real-time communications systems that leverage a 95 messaging architecture as a rendez-vous protocol for session 96 establishment. All of these systems suffer from a similar threat of 97 impersonation (as described in Section 4). Messaging identity 98 mechanisms, as defined in this document, address specifically the 99 threat of impersonation in messaging systems. 101 It is unlikely that the diverse identity requirements of these 102 various messaging systems will admit of any single solution that 103 could be deployed for all such protocols. However, there is much to 104 be gained by considering the broad body of work on the messaging 105 identity problem that has already been done across this wide 106 selection of protocols. The core commonalities of these systems 107 permit a high-level analysis of the message identity problem that 108 could assist all messaging protocols in selecting an appropriate way 109 of incorporating identity. 111 This document aspires to apply to messaging systems with the 112 following architectural qualities: 113 o The messaging system has the two agents: originators and 114 recipients. Both originators and recipients interact with the 115 system through endpoints. Messages sent from endpoints may pass 116 through multiple intermediaries before arriving at the recipient. 117 For the purposes of this document, reflectors and similar services 118 are lumped in with intermediaries, even if from a protocol 119 perspective they act more like endpoints. 120 o The messaging system employs names that are constituted of a 121 'host' portion, which is a DNS [6] name (allocated through the 122 delegative administration of the DNS) and a 'user' portion which 123 is administered by the domain indicated in the 'host' portion. 124 o The messaging system carries messages that are divided into two 125 major components: envelope and contents. The distinction between 126 the two is inexact, but primarily the content is intended to be 127 rendered by the recipient's application, whereas much of the 128 envelope contains addressing and routing data that is used by 129 intermediaries. [In deference to the email community, 'envelope' 130 here should be understood to encompass both the envelope and 131 header portions of a message.] 132 o The messaging system is used in an interdomain context. Different 133 administrative domains may deploy messaging intermediaries and 134 issue names to valid local users. Administrative domains need to 135 be capable of exchanging messages with one another if they have no 136 previous association. 137 o The messaging system is capable of 'retargeting' a message in 138 transit, and delivering it a recipient whose name in the system is 139 not identical to that of the intended recipient specified by the 140 originator. Primarily, this arises when an intermediary forwards 141 a message to multiple recipients (in which case the resource 142 designated as the intended recipient is some sort of reflector). 144 This document was written based on the author's experience on 145 developing identity solutions for the Session Initiation Protocol 146 (SIP, [11]) and on consideration of several proposals circulating to 147 provide similar features in email. 149 The scope of this document is limited to the generation, carriage, 150 and consumption of identity assertions. It does not consider any 151 authorization decisions that might be made, on the basis of the 152 identity of the originator, by the consumer of identity assertions. 154 This document is organized as follows: Section 2 attempts to define 155 identity, and to demonstrate broadly the current manner in which 156 identity is communicated in messaging systems. Section 3 describes 157 the abstract roles that must be instantiated in a system in order to 158 incorporate identity assertions into a messaging architecture: the 159 identity provider and the verifier. The threat model for 160 impersonation in messaging systems is considered in Section 4. 161 Section 5 defines an identity assertion, and explains the manner in 162 which cryptography can be leveraged to generate assertions. Section 163 6 provides an overview of keying and key distribution architectures 164 that provide a foundation for sharing cryptographic assertions. 165 Section 7 compares the traditional concept of user-based assertions 166 with the newer, and perhaps more promising, idea of domain-based 167 assertions. Section 8 considers the internal composition of an 168 identity assertion, and the elements in a message which the assertion 169 must guarantee in order to be correlated with a message. Section 9 170 considers various ways that an assertion might be added to a message. 171 Section 10 considers the privacy and anonymity implications of adding 172 identity assertions to messages. Section 11 attempts to pose the key 173 questions that should determine how a messaging protocol approaches 174 the incorporation of an identity mechanism, and to note when this 175 high-level analysis has revealed any general principles that point 176 one way or another on these questions. Various appendices discuss 177 related material that is not directly in the scope of the primary 178 analysis. 180 1.1 Terminology 182 This document intentionally uses core terminology that is to neutral 183 existing messaging protocols. Terminology specific to email is taken 184 from [21]. 186 2. What is Identity? 188 Every communications system has a namespace. For example, the 189 telephone network uses telephone numbers as a namespace, the postal 190 system uses postal addresses as a namespace, and the Internet 191 Protocol uses Internet Protocol addresses as a namespace. In order 192 for a name to be usable, it must meet the syntactical constraints of 193 the namespace, and it must be unique within the namespace. 194 Accordingly, namespaces generally require significant centralization 195 of administration, though in many cases, delegation can distribute 196 this work across multiple distinct authorities. In the context of a 197 particular communications system, the semantics of these names 198 enables the system to route communications to the appropriate 199 resources. 201 In the most common messaging system on the Internet today, email, the 202 namespace is founded on the Internet Domain Name System (DNS [6]). 203 Names (in RFC2822 [5] terms, the 'addr-spec') are constituted of a 204 'host' portion, which is a DNS name (allocated through the delegative 205 administration of the DNS) and a 'user' or 'local-part' portion which 206 is administered by the domain indicated in the 'host' portion, and 207 which designates a particular resource or user in the domain. As the 208 message transfer service delivers the message, the host portion of 209 the destination email address is resolved in the DNS (though 210 practically, a message may pass through many intermediary 211 administrative domains before reaching its destination). Aside from 212 email, many other Internet messaging systems have constructed 213 namespaces with the same components: a domain name host portion and a 214 domain-specific user portion. 216 When a message is delivered to its recipient, the recipient has a 217 strong interest in knowing who the message is from. While the 218 contents of a message may be sufficient to identify the originator to 219 the recipient, it is also may happen that: 220 o the contents of the message do not identify the originator 221 o the contents of the message fabricate the identity of the 222 originator 223 o the recipient does not wish to read the contents of the message 224 without first identifying the originator 226 Most protocols therefore provide a field which designates the 227 originator of a communication. Generally, the originator is 228 identified by their name in the communication system. For example, 229 in the postal network, the originator is identified by their return 230 address; by convention, the return address of the originator appears 231 on the outside of an envelope. In caller identification systems used 232 in the telephone network, the telephone number of caller is displayed 233 to the callee. In email systems, user agents render the contents of 234 the RFC2822.From header field of an email message as the originator. 236 Nothing forces the originator of a postal message to supply a genuine 237 return address on an envelope; originators are incented to provide a 238 genuine return address only if they want the envelope to be returned 239 to them if it cannot be delivered. Similarly, the RFC2822.From 240 header field of an email message can be populated arbitrarily by the 241 originator (though it is not necessarily the address to which bounces 242 are sent). Malicious originators may want to provide a misleading or 243 false return address for their messages, or to withhold a return 244 address altogether, in order to escape reports of abuse or to mislead 245 the recipient about the origins of the message. While there are 246 valid cases where anonymous communication is necessary, impersonation 247 can be very problematic. 249 For the purposes of this document, 'identity' refers to mechanisms 250 that provide an assurance of the originator of a message. An 251 identity assurance is provided by a party in the messaging 252 architecture that can prove its authority over a segment of the 253 namespace. For the identity systems considered in this document, 254 that may entail proof of authority over DNS names, or it may also be 255 authority specific to a particular user within a domain. This 256 assurance is communicated along with the message, and can be verified 257 by recipients of the message. 259 3. Roles in an Identity System 261 This document postulates two fundamental roles in a messaging 262 identity architecture: an identity provider and a verifier. These 263 roles might usefully be instantiated by any elements in a messaging 264 architecture. Most commonly, an originator or a proxy for the 265 originator of a message will act as an identity provider, and the 266 recipient or a proxy for the recipient will act as a verifier. 267 However, this is far from the only valid assignment of these roles. 268 There are even useful architectures where it is meaningful for the 269 originator to act both as the identity provider and the verifier 270 (where token-based assertions are used to authenticate networks 271 reports of undeliverable messages). 273 3.1 Identity provider 275 The role of the identity provider in an identity architecture is to 276 generate an identity assertion. An identity assertion is a chunk of 277 information added to a message which can later be verified to assure 278 the identity of the originator. 280 An identity provider must be capable of authenticating the originator 281 of the message. The messaging architecture of the system in 282 question, and the entity that plays the role of the identity 283 provider, will largely determine how this authentication takes place. 284 If the identity provider is instantiated by the endpoint of the 285 originator, for example, this authentication might be tacitly 286 assumed, or occur in some application-specific way. If the identity 287 provider is built into an intermediary, some network authentication 288 mechanism must be used by the identity provider to ascertain the 289 identity of the originator. 291 An identity provider must have some verifiable authority over a 292 segment of the namespace of this messaging system; that is, it must 293 be capable of proving to verifiers that it is the appropriate entity 294 to identify the originator of a particular message. This proof of 295 authority can come in many forms, depending on the type of assertion 296 that the identity provider generates. 298 Once the originator has been authenticated, the identity provider 299 must furthermore determine whether or not the originator is 300 authorized to send the message in question; this practice is most 301 relevant to cases in which the identity provider role is instantiated 302 by an intermediary, since in those cases where the originator's 303 endpoint instantiates the identity provider, the originator itself 304 has authority over the relevant segment of the namespace. When it is 305 necessary, this authorization decision may be based on a number of 306 factors; for our purposes, the most important is the identity claimed 307 by the originator of the message. An originator may be authorized to 308 claim one identity, or any of a number of identities, in accordance 309 with the policy of the controller of the namespace containing the 310 identity. Identity providers only provide identity assertions for 311 messages in which the originator claims an authorized identity. 313 Ideally, an identity provider will be last entity in the architecture 314 that will modify the message in transit. An assertion will create a 315 signature over certain elements of the message, and if the message is 316 subsequently modified, it may violate this signature. The severity 317 of this condition is entirely dependent on the nature of the 318 assertion, and in the elements of the message which are guaranteed by 319 the assertion. In practice, most messaging systems modify messages 320 in some fashion throughout their transit of the network, and 321 subsequent modification after the generation of an identity assertion 322 is most likely unavoidable in any practical deployments. 324 An identity provider must be capable of modifying a message, or 325 forcing another entity in the architecture to modify the message in a 326 particular way, in order to incorporate the identity assertion. 327 Commonly, creating an identity assertion involves the use of 328 cryptography, and accordingly, generating identity assertions may 329 slow message creation or processing in the identity provider. 331 3.2 Verifier 333 A verifier consumes an identity assertion in order to verify the 334 identity of the originator of a message. After inspecting an 335 identity assertion, a verifier may make an authorization decision to 336 act on the message in any of a number of ways. Authorization 337 decisions made by verifiers are outside the scope of this document. 339 In order to perform its function, a verifier must be capable reading 340 the identity assertion in a message. Depending on the placement of 341 the assertion in the message, and the underlying architecture of the 342 messaging system, this may limit the entities that can instantiate 343 the verifier role. 345 It is possible that more than one verifier will inspect the same 346 assertion in a message. In some architectures, it may make sense for 347 one or more intermediaries to act as verifiers before a message 348 reaches its recipient, which may also act as a verifier. 349 Alternatively, an intermediary could reflect a message to a 350 potentially large list of recipients, in which case each recipient 351 (and/or intermediaries acting on their behalf) might act as a 352 verifier. In other architectures, an intermediary acting as a 353 verifier might strip the identity assertion before forwarding the 354 message; in such cases, the intermediary might replace the identity 355 assertion with a verification assertion (see Appendix B). 356 Verification assertions can also be added without stripping identity 357 assertions. 359 Commonly, the verification of an identity assertion involves the use 360 of cryptography, and accordingly, verifying identity assertions may 361 slow message processing in the verifier. 363 4. Threat Model of Impersonation in Messaging Systems 365 Impersonation is the practice of falsifying the elements of a message 366 that indicate its originator. This is generally done in order to 367 mislead a recipient about the origins of the message. 369 The most common adversary in impersonation threats is a passive 370 attacker. A passive attacker can capture email messages in some way: 371 they may see messages in transit, they may see archives of messages 372 on the web, or they might even be a recipient of a message. By 373 capturing messages, the impersonator learns how a genuine originator 374 structures their messages, including the manner in which elements of 375 the message that indicate the originator are populated. The 376 impersonator then sends messages that mimic the structures used by 377 the originator they intend to impersonate, altering the destinations, 378 contents, and other meaningful headers as needed. In the case of 379 fictional originators, impersonators merely create plausible-looking 380 messages based on their experience with typical originators. In many 381 current messaging systems, there is no need to do anything other than 382 adopt the name of the desired originator and inject the message into 383 the messaging system. 385 The manner in which an impersonator injects messages into the 386 messaging system admits of varying degrees of sophistication. A 387 passive attacker may, for example, only be capable of injecting 388 messages as an originator, or they may control or be capable of 389 imitating intermediaries in the system. This can have a large impact 390 on the way that other elements in the messaging system perceive their 391 forgeries. 393 Another type of impersonator is an active attacker. An active 394 attacker can intercept messages in transport, modify them 395 arbitrarily, and then return them to the message transit system. 396 This is a harder sort of attack to mount, and a much harder attack to 397 defeat; consequently it may not be in the scope of identity assertion 398 systems to prevent this sort of attack. Since many intermediaries 399 that are not actually attackers exhibit essentially indistinguishable 400 behavior, designers of identity systems are further disincented from 401 meeting this threat. 403 The uses of impersonation are legion. An impersonator may want to 404 avoid reports or abuse, or accountability for the contents of 405 messages. Or, an impersonator may want to make a message appear to 406 come from a particular originator to whom they believe a recipient 407 will be sympathetic (which may lead the recipient to read a message 408 and inspect content of the impersonator's choosing). 410 Primarily, the purpose of an identity assertion is to prevent 411 impersonation. This means that it must provide the following 412 qualities: 413 o In order for an assertion to be valuable, it must provide a 414 stronger assurance than the return address conventionally attached 415 to a message. For example, an email identity system would be 416 totally uninteresting if it allowed any originator to arbitrarily 417 populate their identity, because this would constitute no 418 improvement over the existing RFC2822.From header field. 419 Typically, the strength of the assertion depends on some form of 420 cryptography, and provable authority over the namespace of the 421 originator. In some constrained environments, assertions instead 422 derive their authority from some form of transitive trust (see 423 Appendix E.1); such assertions are outside the scope of this 424 document. 425 o The assertion must have a precise scope and constraints (see 426 Section 8.2), whether these are explicit in the message or static 427 and understood implicitly in the messaging protocol. It is 428 assumed that the means by which a passive attacker collects 429 messages will also allow them to collect identity assertions, and 430 impersonators may accordingly attempt to replay them. Constraints 431 are intended to combat replay attacks. 432 o The assertion must denote the identity provider in some secure 433 fashion, and provide any information necessary for the verifier to 434 validate cryptographic properties of the assertion. Assertions 435 must provide verifiers with a means of determining whether or not 436 the identity provider is authoritative for the namespace of the 437 originator of a message. 439 5. Identity Assertions 441 An identity assertion is a piece of information (perhaps a header, a 442 parameter, or a attached document) added to a message by an identity 443 provider in order to provide verifiers with identity information 444 about the originator of the message. 446 Most existing and proposed identity mechanisms for Internet messaging 447 systems leverage some form of cryptography. Public key (or 448 'asymmetric') cryptography is an especially attractive tool in this 449 context, because it allows a verifier to validate an assertion even 450 if it has never before been contacted by that originator. Symmetric 451 key cryptography, by way of contrast, requires that the identity 452 provider and verifier share some pre-arranged secret. 454 Cryptographic signatures generated by an asymmetric keying mechanism 455 provide authentication of the signer and integrity over the signed 456 information. There are a number of ways that a signature can provide 457 identity information, depending on the type of key used to generate 458 the signature, and the identity of the signer. 460 Providing a signature over an identity string like 'joe@example.com' 461 alone, however, does not provide a strong assertion of the identity 462 of the originator of the message. The assertion must contain enough 463 supplemental information that it is clear that it refers to this 464 particular message, not just any message in which an attacker might 465 try to replay the assertion. The constraints and scope of assertions 466 is discussed further in Section 8. 468 Assertions may also be encrypted. In some cases, it may be desirable 469 to hide the identity of the originator of a message from 470 intermediaries, but to reveal this information only to a particular 471 recipient, or vice versa. Potentially, this could provide certain 472 privacy properties to an identity assertion mechanism (see Section 473 10). 475 The use of cryptography requires some mechanism for key distribution 476 and may require a public key infrastructure with widely-distributed 477 root certificates. Encrypting identity assertions requires more 478 complex keying systems. The use of certificates, uncertified 479 asymmetric keys, and symmetric keys is discussed in Section 6. 481 6. Keying for Assertions 483 Cryptographic identity assertions require the use of keys. In order 484 for a cryptographic signature over an assertion created by an 485 identity provider to be validated by a verifier, both parties must 486 possess corresponding keying material. Since Internet messaging 487 systems assume that messages can be sent to arbitrary recipients that 488 have no previous association with the originator, key distribution is 489 the primary problem confronting the use of cryptographic identity 490 assertions. 492 Note that regardless of the keying mechanism used, an identity 493 provider may have multiple keys that it employs for various reasons. 494 Provided that there is way to link an assertion to a particular key 495 used by the identity provider, this requires no special support from 496 the identity mechanism. 498 6.1 Asymmetric Keys 500 Asymmetric keys are credentials that have been split into two 501 components, a public and a private key. The holder of the 502 credentials keeps the private key secret, and widely distributes the 503 public key. If a document is signed with the private key, the 504 signature over the document can be validated with the public key. 505 This signature provides integrity over the document, and 506 authenticates the signer. 508 An identity assertion is a type of document that can be signed with a 509 private key by an identity provider. In order to validate the 510 signature, the verifier must hold the corresponding public key, and 511 must have some reason to think that this public key is associated 512 with the identity provider. In order for that signature to provide 513 any guarantee of the identity of the originator, the verifier must 514 also have some assurance that the identity provider is authoritative 515 for namespace of the originator of a message. 517 Asymmetric keys may be generated by an identity provider, or acquired 518 by the identity provider from a third party such as a certificate 519 authority. Thus, there are two significant varieties of public keys 520 - uncertified public keys, and public keys within certificates. The 521 certification status of a public key has a tremendous impact on how 522 it can be distributed and the manner in which it assures authority 523 over a namespace. 525 6.1.1 Certificates 527 A certificate [12] is a document that binds public keying material to 528 a particular name, the 'subject' of the certificate. The certificate 529 is signed by a certificate authority, and accordingly, parties that 530 validate certificates must possess the public keys of certificate 531 authorities (and unfortunately, the chain of certification between a 532 particular certificate and the root certificate authority can include 533 multiple middleman certificates). For the purposes of this document, 534 self-signed certificates are simply considered uncertified public 535 keys. 537 Certificates support a wide variety of subject formats. Two are 538 significant to the scope of this document. First, a certificate's 539 subject can be a valid name in an Internet messaging system, such as 540 an email address. Second, the certificate's subject can be a domain 541 name. Depending on the nature of the subject, the certificate can 542 sign user-based or domain-based assertions; this is discussed further 543 in Section 7. 545 Whether user-based or domain-based certificates are used, 546 certificates have a common set of advantages and drawbacks. The 547 primary advantage of certificates is that they provide a strong link 548 between a public key and a subject. Accordingly, by looking at the 549 subject of a certificate, it is relatively easy to decide whether or 550 not they are authoritative for the namespace of a particular 551 originator of a message (bearing in mind the caveats in Section 7.1). 552 Because a certificate is a signed document, certificates can also be 553 distributed over the network without requiring integrity over the 554 transport; e.g., a certificate store for an identity provider could 555 use an insecure transport like vanilla HTTP to distribute 556 certificates. 558 The downside is that certificates do not represent a permanent 559 binding. Certificates have an expiration date, and consequently 560 certificates must be periodically renewed, which is an operational 561 hassle for identity providers. However, parties that rely on 562 certificates cannot assume that a certificate is still valid simply 563 because it has not expired. Certificates can also be revoked, 564 usually as a consequences of the compromise of their corresponding 565 private key. Relying parties are therefore required to monitor 566 certificate revocation lists (CRLs) issued by certificate 567 authorities. Because this entails cumbersome operational procedures, 568 relying parties rarely adhere to this in practice. With all that in 569 mind, it must be remembered that uncertified public keys do not 570 represent a permanent binding either, and that there are no 571 comparable intrinsic mechanisms for determining the expiry or 572 compromise of an uncertified public key, even if a relying party was 573 sufficiently troubled by these concerns to employ them. 575 6.1.2 Uncertified Public Keys 577 Public key cryptography can also be used for identity assertions 578 without certificates; for example, an identity provider may generate 579 a public/private key pair itself. This requires a mechanism for 580 distributing public keys in which the identity of the private key 581 holder is implicitly or explicitly disclosed to potential verifiers, 582 and verifiers understand unambiguously the namespace for which the 583 identity provider is responsible. 585 One way to associate an uncertified public key with a message 586 originator is to transmit the public key in an initial unsigned 587 message. The recipient, upon receipt of the public key, could store 588 it in a local, application-specific keychain, indexed by the 589 originator's return address (for user-based assertions) or the 590 originating domain (for domain-based assertions) - the message would 591 need to make clear precisely who the identity provider is. Future 592 signed messages received from that originator (or domain) could be 593 validated with the public key. This mechanism of key distribution 594 will be referred to in this document as the "leap-of-faith" 595 mechanism. It merits this particular name because the originator and 596 recipient must have faith that no man-in-the-middle interfered with 597 the initial message containing the public key. If an active attacker 598 were present in the key exchange, they could inject their own public 599 key and impersonate the originator to that recipient. 601 The leap-of-faith follows the example of SSH, which is widely 602 regarded as a vast improvement over insecure telnet-style 603 applications, and no doubt the leap-of-faith method of distributing 604 public keys for identity providers would be an improvement over a 605 lack of identity assertions altogether. Unfortunately, messaging 606 architectures almost inevitably involve application-layer 607 intermediaries that could inspect or modify leap-of-faith keys, and 608 in this respect messaging is significantly distinct from the 609 traditional client-server architecture of SSH. 611 The other challenges facing this approach rest largely in associating 612 the key with a legitimate identity provider, and determining the 613 namespace for which that identity provider is authoritative. 614 Practically, there isn't really a way to do so; when a message 615 arrives with an uncertified public key in it, that key is ultimately 616 serviceable only as an validation of that particular (anonymous) 617 identity provider. When future messages are received, the verifier 618 can prove that these assertions were created by that same identity 619 provider, but that verification offers no proof of the namespace for 620 which that identity provider is authoritative. This problem is 621 severe enough that leap-of-faith key distribution is probably only 622 meaningful for anonymous user-based assertions. But again, even 623 anonymous user-based assertions are better than nothing. 625 The DNS might also be leveraged to bind a public key to an 626 identifying domain. DNSSEC [23], for example, provides public keys 627 in a DNS resource record. Those keys are known to be associated with 628 a particular domain (thanks to the delegative structure of DNSSEC). 629 Those keys, or some other keying material in the DNS which is signed 630 via DNSSEC, could be used to provide a domain-based signature in the 631 request for an identity assertion (see Section 7). Even a simple 632 hash of the public key used by the identity provider, placed in the 633 DNS, would enable the transmission of domain-based public keys in 634 messages without any need for a leap-of-faith. 636 Note that strictly speaking, the keying material (or a hash of it) 637 does not need to appear in the DNS in order for the DNS to be 638 leveraged to bind a public key to an identifying domain. If the 639 identity provider were to run a key store service (like an HTTP 640 server) that made its key available, then the identity provider could 641 include a URI reference to that store with its assertion. Since the 642 DNS would be used to dereference that URI, the security of that store 643 is predicated on the security of the DNS. However, the operation of 644 the store exposes the identity provider to further security risks 645 (see Section 9.3), and since the DNS needs to be invoked in order to 646 find the store, using keys or hashes in the DNS is ultimately more 647 efficient from a messaging perspective. 649 In the absence of operational DNSSEC, however, using the DNS to find 650 uncertified keys is insecure. While the technical specifications of 651 DNSSEC are largely complete, it will likely be some time before 652 DNSSEC is fully operationalized. There are high-level changes that 653 would need to sweep through the DNS in order to operationalize 654 DNSSEC, whereas today individuals within the messaging community can 655 opt to employ certificates, or not, on an incremental basis. That 656 much said, it can be argued that the difficulty of subverting the DNS 657 is sufficiently high that this practice would deter a large number of 658 potential impersonators; verifiers can make they own policy decisions 659 about the strength of the assertion based on whether or not the zone 660 containing the keying material uses DNSSEC. Note, however, that this 661 approach has no obvious way to support user-based assertions short of 662 placing many (for large domains, perhaps tens or hundreds of 663 thousands) of records in the DNS corresponding the keys of particular 664 individuals; since the identity provider's assurance of the namespace 665 derives from the DNS zone in which these key records are stored, the 666 security of providing domain-based assertions is materially the same. 668 Given either approach, it is desirable for validators to be capable 669 of caching uncertified public keys. For DNS-based schemes, the cache 670 duration could presumably be dictated by the time-to-live of the DNS 671 resource record containing the key or hash of the key. For the 672 leap-of-faith approach, additional metadata associated with the 673 public key would presumably dictate the length of time for which it 674 is safe to cache the key. Some further considerations related to 675 caching are discussed in Section 9.3. 677 One example of the leap-of-faith system in an Internet messaging 678 protocol is given in RFC3261 [11] Section 23.2 (for the case where 679 unsigned certificates are used). 681 6.2 Symmetric Keys 683 The use of symmetric keys for an identity assertion is severely 684 limited because it requires that the identity provider and verifier 685 pre-arrange a shared secret, which, for the typical assignment of 686 these roles, runs contrary to the requirement that the domain of the 687 originator and recipient of a message require no previous 688 association. However, depending on the intended applicability of the 689 assertion, this may not be an unreasonable constraint. 691 For a case like determining that a bounce resulted from a message 692 that an originator actually sent, the identity provider and verifier 693 of a message are the same endpoint (the originator). Since an 694 endpoint can reasonably be expected to share a secret with itself, 695 the use of symmetric keys is attractive for this use case. 697 The interdomain use of symmetric keys is further limited by the 698 difficulty of key distribution. Asymmetric public keys can be 699 distributed without fear that any passive attacker will be capable of 700 leveraging the keys to impersonate the principal. If a symmetric key 701 used for identity assertions is captured by an attacker, however, the 702 attacker can impersonate the principal for the lifetime of the key. 703 Symmetric keys essentially need to be negotiated, in interdomain 704 cases, through some out-of-band mechanism. 706 7. User-based and Domain-based Assertions 708 To understand the distinction between user-based and domain-based 709 assertions, it is simplest to assume that they are generated by 710 certificates. Consequently, the discussion in the next few 711 paragraphs describes only the use of certificates to provide these 712 assertions; alternatives to certificates are described at the end of 713 this section. 715 In the simplest assertion, the identity provider is directly 716 authoritative for the name of the originator only. For example, the 717 identity provider holds a certificate with a subject of 718 'joe@example.com', and provides an identity assertion with the 719 private key corresponding to that certificate for only for messages 720 sent by 'joe@example.com'. We will refer to this sort of identity 721 assertion as a user-based assertion. Usually, the identity provider 722 is in this instance the endpoint of the originator, though of course 723 it would also be possible (though probably not very scalable) for an 724 intermediary to manage a keyring of such certificates for every user 725 in their domain. 727 While this case is straightforward, there is no widely-supported 728 public key infrastructure that issues user-based certificates to 729 date. The only successful PKI on the Internet today provides 730 domain-based certificates, primarily for securing web transactions. 731 These certificates have a hostname subject of the form 'example.com' 732 (or, more commonly, 'www.example.com'). While there are many reasons 733 why domain-based certificates are more successful than user-based 734 certificates, for our purposes the most important is enrollment: it 735 is very easy for a certificate authority to determine who controls 736 'www.example.com' (since this is a matter of public record), but very 737 difficult for a certificate authority to determine to whom 738 'example.com' has allocated the username 'joe'. The only deployable 739 means of doing so today (email pings) are essentially leap-of-faith 740 mechanisms. 742 Because domain-based certificates are widely available, and the root 743 certificates of the major certificate authorities that issue these 744 certificates are installed on almost all Internet-enabled platforms, 745 the prospect of leveraging domain-based certificates for identity in 746 messaging systems is very attractive. Compared to user-based 747 certificates, domain-based certificates are also attractive because 748 there need to be fewer of them in the overall messaging system, since 749 there are generally many users to a given domain. This is 750 advantageous both for identity providers, especially from a cost 751 perspective, and for verifiers, who will need to persist many 752 certificates from remote domains. 754 When domain-based assertions are employ, the certificate itself does 755 not provide the identity of the originator, but it does prove that 756 the identity provider is authoritative over a particular segment of 757 the namespace. Accordingly, the identity provider's signature must 758 cover some field of the request that contains the identity that the 759 signer is asserting. In order for the assertion to have any 760 strength, that identity must be within the segment of the namespace 761 for which the signer is authoritative; i.e., if the certificate of 762 the signer proves authority over 'example.com', then the signature 763 would be valid if the identity of the originator were 764 'joe@example.com', but not if the signature were over 765 'alice@example.org'. This gives rise to quite a few subtleties which 766 are discussed in Section 7.1. 768 In order to acquire a domain-based identity assertion for a request, 769 originators would typically need to forward their message to an 770 intermediary that instantiates the identity provider role (unless the 771 originator holds a certificate authoritative for its own domain). 772 This in and of itself can be viewed as a drawback, since in many 773 messaging architectures originators are not required to send messages 774 through any specific local intermediary. Moreover, messaging 775 protocols are used in some environments that constrain the first-hop 776 local intermediary to which an originator sends a request (e.g., 777 blocking outbound SMTP with an enterprise firewall). In those 778 environments, originators would be unable to acquire an identity 779 assertion from an intermediary that was unsanctioned by the operator 780 of the environment. 782 Note that the considerations applying to domain-based certificates 783 also apply to most DNS-based mechanisms for public key distribution - 784 the identity assertions generated by keys distributed on a per-domain 785 basis through the DNS are domain-based assertions. The distinction 786 lies in the strength of the assurance - uncertified public keys 787 distributed through the DNS without DNSSEC are inherently less secure 788 than certificates, and thus can be said to provide a weaker 789 domain-based assurance. 791 7.1 Name Subordination 793 Identity assertions become harder to verify when the subject of the 794 signer's certificate does not correspond exactly with the 795 originator's name. There needs to be a deterministic way of deciding 796 if an identity provider is authoritative over the namespace 797 containing an originator's name. 799 For example, how should a verifier treat an identity assertion 800 generated by an identity provider with a certificate for 801 'joe@example.com' when the originator of the associated message is 802 given as 'joe@mail.example.com'? The problem is more pronounced with 803 domain-based assertions. How should a verifier treat an identity 804 assertion generated by 'alice.example.com' for a message whose 805 originator is 'joe@example.com'? What if the domain were 806 'joe.example.com', or 'mail.example.com', or 'sip.example.com'? 808 We are forced to pose this authorization question because the 809 verifier has no way to know how the identifying domain 'example.com' 810 has allocated its namespace - which is why these problems are 811 problems of name subordination. While authorization policy is 812 outside the scope of this document, there are potentially ways to 813 design a messaging identity system such that these concerns never 814 arise. The most obvious way is to be very strict about generating 815 assertions - to mandate, for example, that identity providers cannot 816 provide domain-based assertions for messages unless their domain (the 817 subject of their certificate, or the zone containing their key in the 818 DNS) corresponds exactly to the host portion of the originator's 819 return address. But this may be too rigid to support some use cases. 821 Another possible solution is to leverage the DNS in some new way to 822 designate the identity provider for a domain. Just as one resource 823 record type designates the mail exchanger to which mail should be 824 sent, some other DNS resource record might designate the identity 825 provider for email messages in a domain (e.g., for 'example.com', the 826 identity provider for mail messages resides at 827 'mail-ident.example.com'). Predictably, this solution is limited by 828 the lack of an operational DNSSEC infrastructure in the DNS. Without 829 DNSSEC, it is possible that an attacker could spoof DNS responses to 830 suggest that an inappropriate host is the signer for the domain; 831 essentially, this grants the attacker the ability to impersonate any 832 user in the domain. 834 8. Reference Indicators and Replay Protection 836 If any attacker can cut an identity assertion from a legitimate 837 message, paste it into an arbitrary message of their own, and thereby 838 fool a verifier into believing that the hacked message came from the 839 originator of the legitimate message, then the value of the identity 840 assertion is essentially nil, given that it exists primarily to 841 prevent impersonation. If an identity assertion provided only a 842 signature over the name of the originator, assertions would be 843 trivially exploitable in precisely this fashion. 845 Accordingly, an assertion must cover more than just the originator's 846 name. It must cover enough additional information that the assertion 847 cannot be replayed in a substantially message. 849 Ideally, the identity assertion would provide a signature over the 850 entire message, envelope and contents alike. If this were the case, 851 then an attacker could only replay the identity assertion in an 852 identical message - which would be a duplication rather than an 853 impersonation. But practically, this wouldn't work for any existing 854 messaging system. In fact, in most messaging systems, intermediaries 855 need to modify the envelope in order to perform their duties. While 856 strictly speaking, intermediary modification of the content is not a 857 baseline requirement for a messaging system, some intermediaries do 858 so in order to enforce any of a number of domain-specific policies. 859 Consequently, were a signature over an entire messages included in 860 identity assertions, such signature are likely to fail to validate at 861 verifiers. 863 Thus, only some subset of the message must be signed. The selection 864 of the exact subset is a very difficult problem. For the purposes of 865 this document, the elements of a message that need to be signed in 866 order to bind an identity assertion to a particular message will be 867 termed the 'reference indicators'. The manner in which a subset is 868 identified or carried in the message also admits of more than one 869 plausible design choice. 871 8.1 Canonicalization versus Replication 873 There are two basic approaches to generating a signature over a 874 subset of a message - canonicalization and replication. 876 Canonicalization entails the generation of a canonical string from 877 the reference indicators. The signature is generated over that 878 string (or a hash of that string), even though the string as such 879 does not appear in the message. The canonicalization system must 880 specify the reference indicators that are going to be signed. The 881 reference indicators can be specified statically, as a component of 882 the specification of the mechanism, or dynamically, on a per-message 883 basis. In the former case, every identity assertion for every 884 message in the system will generate a canonical string containing 885 exactly the same reference indicators. In the latter case, each 886 assertion will denote in some manner which reference indicators have 887 been incorporated within the canonical string. When a verifier 888 receives the message, it extracts those same reference indicators 889 from the message, generates the same canonical string, hashes it 890 where applicable, and then determines whether or not the signature in 891 the identity assertion is valid for that canonical string. 893 The most practicable canonicalization procedures incorporate only the 894 most specific reference indicators from a message. For example, 895 inclusion of the entire RFC2822.From header field value (including 896 the header field name, the colon, whitespace, etc) is much more 897 problematic than the inclusion of only the addr-spec component of the 898 From header field value. The less specific the reference indicators 899 are, the harder it is for them to be canonicalized, and the more 900 likely it is that intermediaries (though munging white-space, 901 changing line-wrap, and so on) may inadvertently change the canonical 902 string that will be generated by the verifier, or that the verifier 903 will miss a blot of whitespace, and so on. 905 Both dynamic and static reference indicators for canonicalization 906 have their drawbacks. Static reference indicators can be too 907 limiting; it is difficult to anticipate the reference integrity needs 908 of every imaginable message. Dynamic reference indicators, however, 909 are extremely complicated. The syntactical system required to 910 describe reference indicators is potentially an exercise in arbitrary 911 string manipulation, especially when attempting to denote reference 912 indicators with a high degree of specificity. Dynamic reference 913 indicators also leave much more room for error in the generation of 914 the canonical string, and accordingly, more room for discrepancy in 915 the manner that the verifier generates the canonical string. It is 916 difficult to strike a balance, and once you allow any reference 917 indicators to be decided on a per-message basis, the slope becomes 918 very slippery. 920 Replication attempts to avoid the difficulties of canonicalization by 921 providing a copy of the reference indicators that is carried within 922 the message itself. The simplest form of replication is the 923 reproduction of the entire message, which is then tunneled within the 924 message itself. The identity assertion is a signature over the 925 replication. Of course, if the entire message is carried within 926 itself, this doubles the size of the message (not even counting the 927 signature), and so presenting a subset of the message is again 928 desirable. However, unlike canonicalization, replication does not 929 require any pre-agreement or denotation of the reference indicators. 930 The reference indicators that appear in the replicated message are 931 visible to the verifier, and the verifier validates the signature 932 over the replication, not over elements in the original message which 933 need to assembled into a canonical string. If the signature over the 934 replication is valid, the verifier then compares the values of the 935 reference indicators in the replication to the corresponding elements 936 of the message. If these two correspond, then the identity assertion 937 is valid for this message. 939 Another significant distinction between canonicalization and 940 replication is that a verifier inspecting a replication-based 941 assertion can determine which reference indicators do not correspond 942 to the received message; a verifier validating a 943 canonicalization-based assertion can only tell whether or not the 944 reference indicators as whole exactly match the current message. As 945 a consequence, a verifier of a replication-based assertion can be 946 lenient towards minor discrepancies between the message signed by the 947 identity provider and the received message. If the verifier were 948 implemented in a endpoint, the endpoint might even render an account 949 of the discrepancies to a user, who might be able to make an informed 950 decision about the severity of the differences. Another alternative 951 is that verifiers might 'clobber' the contents of the outer envelope 952 with the replicated envelope, treating only the replicated headers as 953 authoritative and ignoring any discrepancies. Clobbering, however, 954 only works when the reference indicators are carefully chosen; 955 otherwise, it may disguise the actions of an impersonator who has 956 cut-and-pasted the replicated assertion into a message of their 957 choosing. 959 Replication furthermore introduces the interesting possibility that 960 envelope elements intended for end-to-end consumption (that do not 961 need to be inspected by intermediaries, like the Subject header of 962 email) might be included in the replicated body, but not in the 963 headers. The originator might intentionally provide only the minimum 964 amount of information necessary in the envelope of the message, but 965 arrange for the identity provider to place detailed end-to-end 966 information in the assertion. Were the assertion then to be 967 encrypted, the identity provider could securely tunnel end-to-end 968 elements to the recipient. This is most meaningful when the 969 originator acts as the identity provider. Note, however, the 970 limitations of encryption in Section 10. 972 A very basic replication scheme deals poorly with the content of the 973 message. Many messaging protocols allow large content 974 (multi-megabyte email attachments leap to mind). Replicating this 975 content is extremely undesirable. In contrast, 976 canonicalization-based assertions do not become larger as the signed 977 content grows. Since canonicalization can use a hash of the 978 canonical string, even if the canonical string is built from the 979 message body, the size of the hash is unchanged. Traditional 980 approaches to message content security (including PGP [18] and S/MIME 981 [17]) sign a hash over the message content. Accordingly, some hybrid 982 approaches replicate reference indicators in the envelope but 983 canonicalize content, which can yield the best of both worlds. 985 One example of a canonicalization-based identity assertion for SIP is 986 given in sip-identity [25]. The core SIP standard, RFC3261 [11] 987 Section 23.4.2, provides a replication model that entails tunneling 988 the entire message; RFC3893 [14] provides a replication model for SIP 989 which is restricted to reference indicators alone. 991 8.2 Assertion Constraints and Scope 993 The choice of reference indicators dictates the constraints and scope 994 of the assertion. For example, if the reference indicators include 995 something like the email Date header field, then it is possible, 996 after verification, to apply authorization policies related to the 997 time the Date header was created. 999 Whether canonicalization or replication is used, the selection of a 1000 set of reference indicators must be informed by nature of the 1001 messaging protocols. Which elements of the envelope and content are 1002 necessarily immutable from the identity provider to the verifier 1003 (however those roles are assigned)? Which are always mutable? Which 1004 elements are conceivably reference indicators? 1006 The following may not be a necessary or sufficient list for any given 1007 messaging protocol, but it does exemplify the sort of analysis that 1008 needs to be performed in determining whether or not an element should 1009 be used as a reference indicator. 1011 Beginning with the envelope, at a minimum, the name of the originator 1012 of the message has to be preserved (in email, the addr-spec component 1013 of the RFC2822.From header field). 1015 It is also highly desirable to include some indicator that denotes 1016 the intended recipient(s) of the message. Without such an indicator, 1017 a message containing a valid identity assertion could be replayed to 1018 a different recipient by a passive attacker who captured the message, 1019 and the verifier would be unable to determine that the originator did 1020 not intended to send this message to the designated recipient. 1021 Barring the presence of other reference indicators, even the intended 1022 recipient of the request could act as such a passive attacker. 1024 Constraining the assertion with some sort of unique identifier for 1025 the message is also very desirable. Most messaging protocols provide 1026 a unique message identifier in order to enable the recipient to 1027 detect duplicates, or to enable correspondents to refer to a previous 1028 messages unambiguously. While the presence of a unique identifier in 1029 the constraints does not prevent passive attackers from replaying 1030 assertions to new verifiers, it does change the situation when 1031 impersonators attempt to replay assertions to the same verifier 1032 (which complements selecting the intended destination as a reference 1033 indicator). It enables verifiers to remember unique identifiers for 1034 some period of time. By doing so, verifiers can discover that they 1035 previously verified a message with this unique identifier. This 1036 does, however, have some important limitations. The first is 1037 scalability. Intermediaries that act as verifiers can potentially 1038 process staggering numbers of messages, and recording every passing 1039 unique identifier in such intermediaries is probably infeasible. 1040 However, this does not mean that the presence of a unique identifier 1041 would not be useful for recipients that act as verifiers (who might 1042 persist messages, including the unique identifier, for various 1043 reasons anyway). The second limitation is a race condition. If the 1044 attacker's message is delivered to the verifier before the legitimate 1045 message, a verifier might mistakenly believe that the attacker's 1046 message is the valid one; while active attackers are the most likely 1047 to successfully mount this sort of attack, in store-and-forward 1048 architectures it is possible that a passive attacker might do so 1049 (though not in a deterministic fashion, probably). A third 1050 limitation is that in some architectures, a particularly intrusive 1051 intermediary might alter the unique identifier of a message in the 1052 process of forwarding the message; in these environments, the unique 1053 identifier has little value. 1055 Providing a time-based constraint can complement the use of unique 1056 identifiers and other local policies at the verifier. Virtually all 1057 messaging protocols provide an indicator in the envelope that states 1058 when the message was created (such as the Date header in email). 1059 This can aid verifier policies that help to manage replay protection. 1060 For example, verifiers could be configured with interval of time 1061 derived from some assessment of how long a message can plausible be 1062 supposed to have remained in transit in the message system. If they 1063 receive a message, and the time that has elapsed since the creation 1064 of the message exceeds that interval, they could consider the 1065 assertion invalid, or at least suspect. Obviously, this interval 1066 would be very different depending on whether the messaging 1067 architecture is based on a store-and-forward methodology or a 1068 real-time delivery methodology. In concert with unique identifiers, 1069 this interval of time could be used to determine how long a verifier 1070 needs to remember unique identifiers recorded from valid past 1071 messages. Some ways in which a passive attacker might collect 1072 assertions for replay (from web pages of email archives, for example) 1073 could involve the retrieval of very dated assertions that would be 1074 flagged by this sort of policy. 1076 Some elements of a message are half-way between envelope and content, 1077 such as the typical Subject header field of email and SIP. Since it 1078 is common practice for endpoints to render this element to the user, 1079 and the element can significantly change how recipients understand 1080 the message, it should serve as a reference indicator. 1082 While any single one of the envelope-based reference indicators 1083 described above would be insufficient to provide a strong assurance 1084 of identity, in concert, they can meet the majority of the plausible 1085 threats, and require such a high degree of sophistication from the 1086 attacker that most impersonation would be eliminated. However, it 1087 may not be possible for a verifier instantiated by an intermediary to 1088 make full use of all of these indicators (the message's unique 1089 identifier, for example). Moreover, it may not be possible for an 1090 originator to act as an identity provider for all of these reference 1091 indicators if certain elements (like the message's unique identifier) 1092 are generated by an intermediary. 1094 The content of the message is also apparently a critical reference 1095 indicator. Without a signature over the content, a passive attacker 1096 who captures the message could preserve the envelope of a message but 1097 send a completely different content, which allows the attacker to 1098 impersonate the asserted originator and provide the content of their 1099 choosing. Since in many messaging architectures, intermediaries can 1100 legitimately alter the contents of messages (most commonly, either by 1101 adding to the content or modifying the existing content in some 1102 fashion), defending against replay of a message with a modified 1103 content by a passive attacker is essentially the same level of 1104 difficult as defending against message modifications made by an 1105 active attacker. 1107 In environments where intermediaries do modify message content for 1108 legitimate or at least quasi-legitimate reasons, the issue of 1109 protecting the content is academic. A signature over the content 1110 will be violated if the content is changed. There relatively few 1111 approaches to preventing intermediaries from violating these 1112 signatures; a few examples include: 1113 o If the messages use MIME [8], it is possible to apply MIME-layer 1114 security to particular bodies in the content. If trivial 1115 additions are made by an intermediary (such as appending a few 1116 lines of text to the message), then they will fall out of the 1117 scope of the MIME body or bodies. If one is especially lucky, the 1118 intermediary might even be MIME-aware, and capable of 1119 understanding how to interact with the complex multipart bodies 1120 that MIME-layer security frequently requires. 1121 o If rather than merely adding to the content, the intermediary 1122 seeks to modify existing message content (filtering for content 1123 that appears inappropriate, perhaps), then the only recourse is to 1124 encrypt the content in its entirety. If it is unable to 1125 understand the content, an intermediary will not be able to make 1126 these sorts of alterations. 1128 Neither of these solutions is applicable in all cases. However, 1129 given the use of envelope-based reference indicators described above 1130 in an identity assertion, most impersonations that replayed an 1131 assertion but changed the content would be perceived as duplicates 1132 (based on the unique identifier), outdated, or potentially in 1133 violation of a Subject header constraint; moreover, it could only 1134 impersonate the originator to a specific recipient or specific set of 1135 recipients. It could be argued, therefore, that it is not necessary 1136 to use the content as a reference indicator. But in messaging 1137 systems and environments where it is safe to do so, the value of 1138 including the content as a reference indicator is clear. 1140 An account of the mutable and immutable elements in a SIP message is 1141 given in RFC3261. The most complete analysis of reference indicators 1142 in SIP is given in the Security Considerations of sip-identity [25]. 1143 Given the sheer number of possible headers used by email (see [22]), 1144 a complete analysis of mutable and immutable elements is probably a 1145 fool's errand. 1147 The surfeit of possible reference indicators may tempt designers to 1148 punt on deciding, at a protocol level, which ones are appropriate, 1149 and simply to allow identity providers to make this decision based on 1150 domain policy or even on a per-message basis. There are two 1151 disadvantages that this flexibility incurs. In the first place, if 1152 the assertion is based on canonicalization, the assertion must be 1153 accompanied by some sort of description of the reference indicators 1154 that have been used to generate the assertion. Determining how to 1155 describe reference indicators precisely is a significant challenge. 1156 In the second place, it leaves a great deal of ambiguity in 1157 intermediary behavior. How can an identity provider anticipate which 1158 elements an intermediary might want to modify? If the standard is 1159 firm about this matter, and all identity provider rely on the same 1160 reference indicators, then operators of intermediaries will be 1161 incented to phase out any practices that modify those elements. If, 1162 on the other hand, each identity provider does things a little 1163 differently, there could be significant operational turmoil that 1164 could potentially lead to a rollback from the identity mechanism. 1166 In the end analysis, for any given messaging system, there is 1167 probably a finite set of identifiable elements that should be used as 1168 reference indicators. At worst, there should be a set of fixed 1169 reference indicators that can be supplemented with optional, dynamic 1170 reference indicators as needed. Note that other constraints and 1171 reference indicators that might be added by third-parties to the 1172 identity process are described in Appendix D, and should not be 1173 considered a part of the identity assertion created by the identity 1174 provider to identity the originator a message. 1176 9. Placement of Assertions and Keys in Messages 1178 Most Internet messaging systems employ messages that are divided into 1179 two major parts: envelope and contents. The envelope of a message is 1180 typically made up of headers, like the traditional email RFC2822.From 1181 header field, though some messaging systems used alternative schemes 1182 (like XML for XMPP [13]). The contents consist of one or more 1183 message bodies, typically, but not always, MIME bodies. 1185 The division between envelope and contents is imprecise in most 1186 messaging systems. As a general rule, the envelope is used by 1187 endpoints and intermediaries in the addressing and routing of a 1188 message, whereas the content is generated by the originator's 1189 endpoint, consumed by the recipient's endpoint, and rendered to the 1190 recipient's application in some fashion. However, many elements in 1191 the envelope are also rendered to the recipient (a classic example 1192 being the Subject header field of email), and intermediaries have 1193 numerous reasons to inspect or modify the contents of messages. 1195 Given that an identity assertion needs to appear somewhere in a 1196 message, there are two plausible alternatives: 1197 o it could appear in the message content 1198 o it could appear in the message envelope, as a value or parameter 1199 of a new or existing element 1201 The attractiveness of one or another of these options is greatly 1202 dependent on the nature of the assertion, and particularly on the 1203 size and encoding of the assertion. Canonicalization will result in 1204 a smaller assertion than replication. To speak in particulars 1205 briefly, for a base64-encoded assertion based on an RSA signature 1206 (1024 bit key) of a SHA1 hash of the canonical string, the resulting 1207 assertion is 175 bytes long - varying the key length will make the 1208 message proportionally larger or smaller, obviously. It is difficult 1209 to gauge the likely size of an assertion based on replication, since 1210 it is highly dependent on the number of reference indicators 1211 included, but it would be significantly larger. An example in 1212 RFC3893 of a S/MIME-based replication assertion for SIP (containing 1213 six headers) is 913 bytes long, counting the multipart/MIME wrapper 1214 and the signature. A base64 encoded version of that assertion is 1215 1240 bytes long. 1217 9.1 Assertions in the Envelope 1219 An envelope is generally composed of a set of elements that describe 1220 the originator and intended recipient of a message, the subject of 1221 the message, the time the message was created, some unique 1222 identifiers for the message, and so on. It is a common practice for 1223 intermediaries to inspect an envelope's elements in order to make 1224 forwarding decisions, and to add additional elements to the envelope 1225 to reflect various circumstances surrounding the delivery of the 1226 message. 1228 Provided that an assertion is short and syntactically manageable, 1229 there's no reason why it couldn't appear in some new or existing 1230 envelope element. Some messaging systems have a practical (if not 1231 theoretical) limit on the size of envelope elements, in others this 1232 is no cause for concern. 1234 The syntax of the assertion is a more complicated issue. If identity 1235 assertions based on replication are used, and are intended to be 1236 stored in the envelope, it may be syntactically confusing to store a 1237 set of envelope elements within a single envelope element. Worst 1238 case, this confusion could be alleviated by encoding the entire 1239 assertion in some fashion (such as base64), but this would result in 1240 quite a large string. Even in cases where element length is limited, 1241 it is possible that a very large string encoded in this fashion could 1242 be split across multiple envelope elements, and internally ordered in 1243 some way, to meet practical size limits. 1245 Intermediaries generally have an easier time reading and writing 1246 parts of the envelope than the content, and according, if one intends 1247 for intermediaries to instantiate the identity provider or verifier 1248 roles, then placing assertions in the envelope has the distinct 1249 advantage of requiring less changes to intermediary behavior. 1251 Also, some messaging architectures might not guarantee the survival 1252 of particular portions of the message as they traverse 1253 intermediaries. For example, if intermediaries customarily rewrite 1254 or delete particular envelope elements, it would be a poor design 1255 decision to store an identity assertion as a value in those element. 1257 9.2 Assertions in the Content 1259 Given an assertion of large size or cumbersome syntax, storing an 1260 assertion in the envelope might be undesirable. Appending the 1261 assertion to the contents of the message (perhaps using a multipart 1262 MIME body) might therefore seem superior. 1264 However, for some messaging systems, placing identity assertions in 1265 the content may limit the set of entities in the messaging 1266 architecture that can instantiate the identity provider role. It 1267 might be illegal for an intermediary, for example, to modify content. 1268 This is the case with SIP, where an intermediary cannot delete or 1269 modify the contents of a SIP message. 1271 Placing assertions in the content can also limit the set of entities 1272 that can instantiate the verifier role. Email intermediaries are not 1273 required to be capable of understanding or parsing the contents of 1274 email messages (especially MIME bodies), and accordingly, they cannot 1275 be expected to act as verifiers of an identity assertion that appears 1276 as part of the message content without requiring significant changes 1277 to their functionality. 1279 Furthermore, an identity system should be compatible with end-to-end 1280 security of message contents. If the identity system requires that 1281 an intermediary add a body to a message, and the endpoints are using 1282 some end-to-end integrity mechanism like S/MIME or PGP, appending the 1283 assertion to the content may violate that end-to-end integrity. If 1284 MIME is supported by these intermediaries, however, this problem 1285 becomes less pressing, as intermediaries might add the assertion as a 1286 complete MIME body by transposing the existing content into a new 1287 multipart. 1289 Placing assertions in the content is further complicated by the 1290 manner in which the content of a message is rendered to the 1291 recipient. Ultimately, an identity assertion is not a component of 1292 the content that should be blindly rendered to the user. It is more 1293 appropriate for a recipient's endpoint to consume the assertion as an 1294 input to an authorization decision, which may in turn change the 1295 manner in which the message is rendered to the user. The assertion 1296 itself, a collection of cryptographic bits, is not something that 1297 should be intermingled with the content rendered to the recipient. 1298 Endpoints that do not support the identity assertion scheme, however, 1299 are likely to do just that, and accordingly, placing the assertion in 1300 the content leads to serious backwards-compatibility concerns. 1302 A messaging system based entirely on the use of MIME content, 1303 however, overcome these difficulties. Various Content-Dispositions 1304 (see [10]) can inform the recipient's endpoint that it should not 1305 render the content of a body to a user. Moreover, it can flag the 1306 body as specifically containing an identity assertion. One such 1307 Content-Disposition for identity assertions ("aib") is defined in 1308 RFC3893. In messaging systems where multipart MIME support is not 1309 guaranteed in endpoints, however, this would lead to backwards 1310 compatibility issues. 1312 9.3 Distributing Keys by-Reference or by-Value 1314 The various keying schemes described in Section 6 entail a few 1315 high-level models by which keys can be incorporated into requests: 1316 inclusion by-reference and inclusion by-value. In the by-reference 1317 case, some resource in the network would hold the key, and the 1318 message would either explicitly (with something like a URI) or 1319 implicitly (through some understanding built into the identity 1320 architecture and messaging protocols) indicate where the key for a 1321 particular identity provider can be acquired. In the by-value case, 1322 the key would accompany the identity assertion in the message. 1324 When an originator acts as a identity provider, they may not be 1325 capable of operating or contracting with a network service such as a 1326 key store. In those cases, they have no alternative but to include 1327 keys by-value. Intermediary-based identity providers would generally 1328 have no trouble offering keys by-reference. 1330 Including keys by-value is attractive if the keys are self-validating 1331 (as is the case with public keys bound to certificates). If keys are 1332 not self-validating, then clearly an impersonator could trivially 1333 include a key of their own choosing with the request - this is an 1334 instance of the leap-of-faith model described in Section 6.1.2. 1335 Including certificates by-value, however, can be troublesome given 1336 the comparatively greater size of certificates (though in many 1337 messaging architectures, certificates can be incorporated into the 1338 content of the message without dire ramifications). For the purposes 1339 of comparison, an example self-signed certificate constitutes about 1340 1100 bytes of data when base64 encoded; the public key it contains is 1341 about 270 bytes of base64 data (for a 1024 bit RSA key). 1343 The high-level problem with including keys by-reference is that the 1344 verifier must have network access (if they do not already possess the 1345 key) in order to validate the signature within an assertion; this is 1346 not a requirement for verifier by-value assertions, and is important 1347 for recipient-based verifiers in store-and-forward messaging 1348 architectures. There are also potentially non-obvious consequences 1349 of including keys by-reference. Consider, for example, that if the 1350 message is not rendered to a recipient instantiating the identity 1351 provider role for a protracted period of time (weeks or months), it 1352 is possible that the key used by the identity provider will expire or 1353 changed during that time; one interesting property of carrying 1354 certificates by-value is that a verifier can determine, on the basis 1355 of an expired certificate shipped with a message, if the assertion 1356 was valid at the time it was created, provided that the assertion is 1357 constrained by its creation time (though there would be good reasons 1358 to be cautious about this practice). 1360 Also, as another non-obvious consequence of by-reference key 1361 distribution, note that the key store used by the identity provider 1362 will be notified each time that a verifier acquires a key. This can 1363 actually have important privacy implications, because in some cases, 1364 this could reveal most or all of the recipients of the request to the 1365 identity provider. The implications of this are further discussed in 1366 Section 10. 1368 It is important to recall that messages may be reflected to multiple 1369 recipients - potentially, many thousands in some environments. While 1370 it may seem to save message bandwidth to include keys by-reference, 1371 thousands of requests to the key store may result in profoundly 1372 greater network traffic. Note, however, that the impact for 1373 domain-based keys is probably less than the impact for user-based 1374 keys, since domain-based keys need to be acquired on a per-domain 1375 basis, and a domain generally encompasses many users. There's no 1376 question that the impact might be significant in either case. 1378 Comparing the total bandwidth consumed by the two approaches, it is 1379 also important to note that verifiers can cache credentials. So, if 1380 the verifier already has the key, and the key is still valid, a 1381 reference to the key in a message will not necessitate the key's 1382 retrieval. When keys are sent by-value, however, the originator has 1383 no way to know whether or not a potential recipient already possesses 1384 the key; this problem is compounded by the general difficulty of 1385 anticipating who might conceivably receive a message. The only safe 1386 policy is to send the key every time, when keys are distributed 1387 by-value. When compared with the potential for thousands of 1388 recipients to retrieve the key from a key store, however, this is 1389 still comparatively a minor inconvenience. 1391 It is furthermore the case that any network service which distributes 1392 keys to verifiers will add new threats to the overall identity 1393 architecture. The security properties of the protocols used to 1394 implement the service become critical to the strength of the 1395 assertion. Moreover, those services could be subject to 1396 denial-of-service attacks intended to prevent verification of 1397 messages with identity assertions. 1399 Any network service that can provide keys by-reference to verifiers 1400 might also provide keys to originators; originators, in contrast to 1401 verifiers, would only need to access this local service very 1402 infrequently, and at worst, only one originator would need to access 1403 this service per message, which compares very favorably to the 1404 unbounded set of verifiers to which a message might be distributed. 1405 In fact, the manner in which originators authenticate themselves to 1406 identity providers (which is outside the scope of this document) may 1407 innately entail a key exchange - the originator may learn the keys of 1408 their local domain as a matter of course. Provided these keys are 1409 bound in certificates, this could potentially serve as an attractive 1410 manner for originators to learn their identity provider's keys in 1411 order to include them in messages by-value. This may be important in 1412 architectures where it is desirable to add the key to the content of 1413 a message, but intermediaries lack the capability or permissions to 1414 make useful additions to the content. 1416 Too hard to choose? Ultimately, there is an easy way to be flexible 1417 about the incorporation of keys into a message. If there is a field 1418 in the message that provides a URI where the key can be acquired, 1419 this can be purposed to include a key by-reference or by-value. It 1420 can be used by-reference to indicate, for example, an HTTP or HTTPS 1421 URI where the key can be acquired; alternatively, it could use some 1422 form of DNS URL (such as [24]) to denote a particular DNS resource 1423 record where the key is located. If the message uses MIME bodies as 1424 content, it could use the CID URI scheme [9] to designate a 1425 particular MIME body that contains the key. The only option 1426 considered in this document for which a URI does not provide a 1427 solution is carrying they key by-value in the envelope, but of 1428 course, it wouldn't make much sense to have, for example, one header 1429 in a SIP request contain a URI reference to another header in the 1430 same message - a special-purpose header should be used to carry keys 1431 by-value in the envelope. 1433 9.4 Distributing Assertions by-Reference 1435 It is also possible to distribute assertions by-reference; to force 1436 the verifier to contact a service operated by the identity provider 1437 in order to acquire an assertion that would be used to verify the 1438 message. This is identical, from a security perspective, to a 1439 dial-back identity scheme; see Appendix E.2. 1441 10. Privacy and Anonymity 1443 Anonymity plays an important part in communication systems. The 1444 existence of an identity system should not preclude anonymous message 1445 originators. However, it is possible to strike a balance in which 1446 anonymized messages still contain identity assertions, and those 1447 identity assertions are potentially still valuable. 1449 Considering the classic case of an originator wishing to be anonymous 1450 to recipients, there are numerous ways in which this could be 1451 realized in the context of an identity system. If the domain of the 1452 originator permits anonymous messaging, the originator could populate 1453 their return address in the message with, say, 1454 'anonymous@example.com', and send the message through the identity 1455 provider of 'example.com'. This sort of anonymity is meaningful for 1456 domains with a great many users, and less useful as the number of 1457 users in the domain grows smaller. Alternatively, a message 1458 anonymization service unrelated to the originator's usual domain 1459 could act as an identity provider for a message. Receiving a message 1460 signed by 'anonymous@anonymizer.example.org' is still, in all 1461 likelihood, preferably from an authorization perspective to receiving 1462 a message without any identity assertion whatsoever. An assertion 1463 provides a pointer of accountability to the originating domain in 1464 cases of abuse. 1466 Another important form of privacy relates to preventing 1467 intermediaries responsible for message transfer from reading the 1468 identity assertion. Encryption of assertions entails a very 1469 different key distribution problem than identity. In order to send 1470 an encrypted message to a recipient, the recipient must possess a 1471 corresponding decryption key. This key needs to be shared, in some 1472 fashion, with the identity provider before the identity provider can 1473 encrypt the assertion for that recipient. The problem is complicated 1474 by the potential existence of multiple recipients. If the identity 1475 assertion is encrypted for one particular recipient, and ends up 1476 being distributed to multiple recipients by a reflector, the addition 1477 recipients will not be able to read or verify the assertion. 1479 There are at least two strategies for overcoming this problem: 1480 o Encrypt the assertion on a per-recipient basis (i.e., include 1481 multiple versions of the assertion, each one encrypted with a key 1482 corresponding to the decryption key of each recipient). 1483 o Force all recipients to share a common decryption key, and encrypt 1484 the assertion only once with that key. 1486 Both of these approaches are limited by the fact that the identity 1487 provider cannot anticipate who will receive the message. Moreover, 1488 as the list of recipients grows larger, these strategies become 1489 increasingly unmanageable. Even if a message is retargeted to only 1490 one destination, the identity provider has no way to anticipate what 1491 that destination might be. In the end analysis, encryption of 1492 assertions is a very difficult practice to manage in messaging 1493 identity architectures. 1495 When a message is reflected to multiple recipients, this can give 1496 rise to another privacy problem. If the identity provider's keying 1497 material is included in the message by-reference, then the identity 1498 provider will know who the verifiers are when they content the key 1499 store to acquire the key (given that identity providers operates or 1500 has some oversight of the key store). While not all reflectors need 1501 to protect the privacy of their distribution list, it is very 1502 probable that some do. This problem can even arise when a message is 1503 forwarded by one recipient to another recipient, who subsequently 1504 verifiers the message, if the original recipient did not want to 1505 reveal to the originator that their message was forwarded. In an 1506 identity architecture in which keys are always distributed by-value, 1507 this problem never arises; if the originator or identity provider can 1508 choose to include keys by-reference, however, this could be a 1509 material concern. The concern lessen as the number of messages 1510 assured by the identity provider grows larger (i.e., large domains 1511 using domain-based assertions); any individual request becomes a 1512 needle in a haystack. Nothing about a request for a key alone 1513 identifies the message that the verifier is validating - although if 1514 user-based assertions are used, it will reveal the originator of the 1515 message. This is a major distinction between distributing keys 1516 by-reference and distributing assertions by-reference; dial-back 1517 identity schemes (see Appendix E.2) notify the identity provider of 1518 the exact message that the verifier is inspecting. 1520 11. Conclusion: Consensus Points and Questions 1522 If the analysis in this document illustrates anything, it's the sheer 1523 number of moving parts that must be fixed in order to arrive at an 1524 identity solution for a messaging system. It does, however, identify 1525 the core consensus points in arriving at an identity solution. The 1526 following are the major points that require analysis: 1527 o keying: asymmetric keys vs. symmetric keys 1528 o asymmetric keys: certificates vs. uncertified 1529 o assertion structure: canonicalization vs. replication 1530 o reference indicators: static vs. dynamic 1531 o identity providers: originators vs. intermediaries 1532 o verifiers: recipients vs. intermediaries 1533 o content: a reference indicator vs. not a reference indicator 1534 o assertions: domain-based vs. user-based 1535 o assertion placement: envelope vs. content 1536 o key distribution: by-reference vs. by-value 1538 In order to arrive at a consensus on those points, questions like the 1539 following need to be asked. 1541 Do your use cases include identity assertions being validated by 1542 verifiers who have no previous association with the identity 1543 provider? If so, this argues for using asymmetric keys rather than 1544 symmetric keys, since symmetric keys assume some pre-arranged key 1545 exchange between the identity provider and the verifier. 1547 Is the privacy of the recipients of a message with respect to the 1548 identity provider, when a message is forwarded to unanticipated 1549 destinations, important? At a high level, if you believe so, this 1550 argues for supplying keys in messages by-value, rather than 1551 by-reference. Alternatively, if the by-reference key store is the 1552 DNS, one could argue that requests for keys are likely to be lost in 1553 the general mass of queries targeting the DNS server (though this may 1554 not be the case in practice, depending on how the query strings are 1555 formulated). 1557 Do you want recipients of a message to be able to verify messages 1558 off-line? If so, this also argues for supplying keys by-value. If 1559 keys are supplied by-value, it is far better to use certificates than 1560 uncertified public keys, especially if you want domain-based 1561 assertions. 1563 Is it critical that an identity provider be securely associated with 1564 a particular domain? If you say 'yes' to this, this argues for 1565 domain-based assertions. Furthermore, depending on exactly how 1566 critical it is, this argues for using certificates rather than any 1567 system relying on the DNS (given the current state of DNSSEC 1568 deployment) or a leap-of-faith system. 1570 Is it possible to arrive at fixed set of reference indicators for 1571 messages in your messaging system? If so, then this argues for using 1572 canonicalization rather than replication in assertions. If not, then 1573 replication is probably a better bet than dynamic canonicalization. 1574 If you can use canonicalization, then placing assertions in the 1575 envelope is preferable for most messaging systems. 1577 Do you want the use of identity assertions to be opportunistic for 1578 endpoints? If so, then you want intermediaries to instantiate the 1579 identity provider role. 1581 Are you willing to try to prevent active attackers as well as passive 1582 attackers? If so, then you may be willing to try to use message 1583 content as a reference indicator. 1585 12. Security Considerations 1587 This document is entirely concerned with the security of Internet 1588 messaging systems. It provides a survey of existing mechanisms to 1589 provide identity in Internet messaging systems in order to counter 1590 the seminal threat of impersonation. Since it treats messaging in 1591 the abstract, rather than discussing any particular protocol, it 1592 makes no specific recommendation for advancing any particular 1593 approach for the problem. It does, however, show how some 1594 architectural decisions, at a high level, are likely to be more 1595 successful than others. It also suggests a way to divide-and-conquer 1596 decision-making about identity enhancements for applicable messaging 1597 systems. 1599 13. IANA Considerations 1601 This document contains no considerations for the IANA. 1603 14 Informative References 1605 [1] Postel, J., "Simple Mail Transfer Protocol", RFC 821, STD 10, 1606 August 1982. 1608 [2] Crocker, D., "Standard for the format of ARPA Internet text 1609 messages", RFC 822, August 1982. 1611 [3] Oikarinen, J. and D. Reed, "Internet Relay Chat Protocol", RFC 1612 1459, May 1993. 1614 [4] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, April 1615 2001. 1617 [5] Resnick, P., "Internet Message Format", RFC 2822, April 2001. 1619 [6] Mockapetris, P., "Domain names - concepts and facilities", RFC 1620 1034, STD 13, November 1987. 1622 [7] Linn, J., "Privacy Enhancement for Internet Electronic Mail: 1623 Part I: Message Encryption and Authentication Procedures", RFC 1624 1421, February 1993. 1626 [8] Freed, N. and N. Borenstein, "Multipurpose Internet Mail 1627 Extensions (MIME) Part One: Format of Internet Message Bodies", 1628 RFC 2033, November 1987. 1630 [9] Levinson, E., "Content-ID and Message-ID Uniform Resource 1631 Locators", RFC 2111, March 1997. 1633 [10] Troost, R., Dorner, S. and K. Moore, "Communicating 1634 Presentation Information in Internet Messages: The 1635 Content-Disposition Header Field", RFC 2183, August 1997. 1637 [11] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., 1638 Peterson, J., Sparks, R., Handley, M. and E. Schooler, "SIP: 1639 Session Initiation Protocol", RFC 3261, May 2002. 1641 [12] Housley, R., Polk, W., Ford, W. and D. Solo, "Internet X.509 1642 Public Key Infrastructure Certificate and Certificate 1643 Revocation List (CRL) Profile", RFC 3280, April 2002. 1645 [13] St. Andre, P., "Extensible Messaging and Presence Protocol: 1646 Instant Messaging and Presence", RFC 3921, October 2004. 1648 [14] Peterson, J., "Session Initiation Protocol (SIP) Authenticated 1649 Identity Body (AIB) Format", RFC 3893, September 2004. 1651 [15] Watson, M., "Short Term Requirements for Network Asserted 1652 Identity", RFC 3324, November 2002. 1654 [16] Jennings, C., Peterson, J. and M. Watson, "Private Extensions 1655 to the Session Initiation Protocol (SIP) for Asserted Identity 1656 within Trusted Networks", RFC 3325, November 2002. 1658 [17] Ramsdell, B., "Secure/Multipurpose Internet Mail Extensions (S/ 1659 MIME) Version 3.1: Message Specification", RFC 3851, July 2004. 1661 [18] Elkins, M., Del Toro, D., Levien, R. and T. Roesler, "MIME 1662 Security with OpenPGP", RFC 3156, August 2001. 1664 [19] Sparks, R., "The Session Initiation Protocol (SIP) REFER 1665 Method", RFC 3515, April 2003. 1667 [20] Sparks, R., "The Session Initiation Protocol (SIP) Referred-by 1668 Mechanism", RFC 3892, September 2004. 1670 [21] Crocker, D., "Internet Mail Architecture", 1671 draft-crocker-mail-arch-01 (work in progress), July 2004. 1673 [22] Klyne, G. and J. Palme, "Registration of mail and MIME header 1674 fields", draft-klyne-hdrreg-mail-05 (work in progress), May 1675 2004. 1677 [23] Arends, R., Austein, R., Larson, M., Massey, D. and S. Rose, 1678 "Protocol Modifications for the DNS Security Extensions", 1679 draft-ietf-dnsext-dnssec-protocol-09 (work in progress), 1680 October 2004. 1682 [24] Josefsson, S., "Domain Name System Uniform Resource Locators", 1683 draft-josefsson-dns-url-10 (work in progress), September 2004. 1685 [25] Peterson, J. and C. Jennings, "Enhancements for Authenticated 1686 Identity Management in the Session Initiation Protocol (SIP)", 1687 draft-ietf-sip-identity-03 (work in progress), September 2004. 1689 [26] Bradner, S., "Key words for use in RFCs to indicate requirement 1690 levels", RFC 2119, March 1997. 1692 [27] Handley, M., Schulzrinne, H., Schooler, E. and J. Rosenberg, 1693 "SIP: Session Initiation Protocol", RFC 2543, March 1999. 1695 Author's Address 1697 Jon Peterson 1698 NeuStar, Inc. 1699 1800 Sutter St 1700 Suite 570 1701 Concord, CA 94520 1702 US 1704 Phone: +1 925/363-8720 1705 EMail: jon.peterson@neustar.biz 1706 URI: http://www.neustar.biz/ 1708 Appendix A. Acknowledgments 1710 The author drew considerable inspiration for this document from the 1711 longstanding discussion of identity on the SIP mailing list. The IAB 1712 Workshop on Messaging in October of 2004 was also a valuable 1713 influence. 1715 Appendix B. Verification Assertions 1717 A verification assertion is a piece of information added to a message 1718 by an intermediary-based verifier which asserts that an identity 1719 assertion in the message was verified. These assertions are most 1720 useful in architectures in which a recipient cannot be expected to 1721 instantiate the verifier role itself. However, it is also possible 1722 that verification assertions could be inspected by intermediaries 1723 between the verifier and the recipient. 1725 Verification assertions may be cryptographic, but typically they are 1726 not. Usually, the recipient has some specific trust relationship 1727 with the verifier, which may include the use of some other form of 1728 security (for example, network or transport layer security) which 1729 guarantees that the verification assertion was created by the trusted 1730 verifier. 1732 A verifier may strip any identity assertion from a message before 1733 adding a verification assertion, or it may leave the assertion in the 1734 message. The latter option is preferable, in so far as it is 1735 forwards-compatible with recipients instantiating the verifier role. 1737 While verification assertions are probably important for some 1738 architectures, they are not strictly necessary to implement an 1739 identity service. In fact, by rendering the identity architecture 1740 less end-to-end, verification assertions may weaken the overall 1741 security of the architecture. 1743 Appendix C. Messaging: Real-Time versus Store-and-Forward 1745 In most respects, the high-level messaging architectures discussed in 1746 this document share common security properties regardless of whether 1747 they are real-time or store-and-forward. However, there are a few 1748 important respects in which the two differ: 1749 Delay from Computation The instantiation of the verifier and identity 1750 provider roles by the system (more or less irrespective of where 1751 they are located) will incur some delay corresponding to the 1752 complexity of the cryptosystems they employ. While this delay is 1753 not likely to be noticeable in store-and-forward messaging 1754 systems, it may be perceptible (and undesirable) in real-time 1755 communications systems. 1756 Offline Handling Store-and-forward systems allow users to read their 1757 messages offline. Accordingly, if the recipient acts as a 1758 verifier, the verifier might not be online when it reads the 1759 message. This has important implications for any sort keys or 1760 assertions that are carried by-reference (and for dial-back 1761 identity schemes). 1763 Delivery Receipts Real-time messaging protocols tend to provide 1764 real-time acknowledgements of message delivery by default. These 1765 acknowledgements in turn have important identity properties. 1766 While the same is true of various optional delivery 1767 acknowledgement mechanisms that can be used in store-and-forward 1768 systems, in real-time systems the responses returned to a message 1769 can invoke all sorts of behavior on the originator side, including 1770 resubmission of the request to alternate destinations and so on. 1771 Any sort of response identity is outside the scope of this 1772 document, and believed to be separable from the message identity 1773 work described in this document. 1774 By-Value Subversion In order to subvert a request to acquire keys 1775 by-value from a key store, it really helps if the attacker knows 1776 when the verifier will initiate the request. In real-time 1777 messaging architectures, this is relatively clear - it will be 1778 soon after the message has been sent. In store-and-forward 1779 architectures, since the verifier might not validate the message 1780 for hours or days or weeks, it can be very difficult for the 1781 attacker to make this determination. Not that even in 1782 store-and-forward architecture, if an intermediary acts a 1783 verifier, this distinction becomes less acute - there is a 1784 comparatively smaller time-window in zwhich an intermediary is 1785 likely to verify a assertion, and accordingly, it may be easier to 1786 subvert request for a key when an intermediary is the target. 1787 Creation Time as a Reference Indicator In real-time messaging 1788 systems, the creation time of a message is a very strong reference 1789 indicator, since deliver of messages is expected to be very quick. 1790 Accordingly, passive attackers have only a small interval of time 1791 to mount a replay attack using an assertion with a creation time 1792 reference indicator. In store-and-forward architectures, the 1793 delivery window is extremely large, so creation time is a less 1794 valuable reference indicator (though not entirely useless). 1796 Appendix D. Third-Party Assertions 1798 Many messaging architectures assign important roles to third parties. 1799 To take a familiar example, email has the concept of a mailing list 1800 which sends messages on behalf of an originator. For the purposes of 1801 this document, a third-party assertion is differentiated from an 1802 ordinary identity assertion as follows: a third-party assertion is 1803 provided by an identity provider that is not authoritative for the 1804 namespace containing the name of the originator of the message 1805 (following the general constraints of Section 7.1. 1807 Depending on the sorts of authorization decisions that a verifier 1808 might want to perform, the identity of the originator may be 1809 secondary, or even totally irrelevant, when a third-party is 1810 involved. A particular recipient might wish to accept any email 1811 message from a particular mailing list, for example, without regard 1812 to the identity of a particular originator. Other practical examples 1813 include chat-rooms of instant messaging systems, and systems in which 1814 one endpoint can instruct another endpoint to send a message (such as 1815 the SIP REFER [19] method). 1817 Clearly, the manner in which a third-party asserts something about a 1818 message is orthogonal to the broader question of how to identify the 1819 originator of a message. However, it is certainly possible that 1820 third-parties may want to add additional cryptographic information to 1821 a message in order to allow particular authorization decisions to be 1822 made available to recipients. The formulation of third-party 1823 assertions seems to be a problem that is entirely separable from the 1824 identification of the originator, and is thus out of scope of this 1825 document. Future work could identify a means of providing 1826 third-party assertions that was entirely supplemental to the identity 1827 work in this document. 1829 An example of a third-party assertion is the Referred-by [20] token 1830 associated with the SIP REFER method. 1832 Appendix E. Alternatives to Identity Assertions 1834 Identity assertions are not the only means of increasing a 1835 recipient's surety of the identity of an originator of a request. 1837 E.1 Trusted Intermediary Networks 1839 It is important to note that identity assertions are primarily 1840 motivated by the interdomain nature of messaging. Within a single 1841 administrative domain, both the originator and the recipient of any 1842 message must trust the same domain in order for messaging to function 1843 at all. Accordingly, they can assume (perhaps without good 1844 justification) that the domain would not connect them if it had not 1845 properly authenticated them both. 1847 Given this, some messaging architectures try to extend the boundaries 1848 of an administrative domain in order to treat interdomain messaging 1849 as an intradomain problem. In contrast to cryptographic assertions, 1850 these identity systems rely on particular deployment architectures to 1851 guarantee the security properties of the assertion. The only 1852 assertion that is actually carried in the message is a separate 1853 envelope element that provides an 'authoritative' return address. 1855 For example, consider the 'trust domain' concept defined in Section 1856 2.3 of RFC3324. In this messaging architecture, a trusted network is 1857 a set of intermediaries that exchange messages with one another over 1858 a closed network (a network either logically or physically 1859 inaccessible from the Internet, over which intermediaries pass 1860 messages to one another). 1862 Assuming such a trusted network, one can design a very simple 1863 identity assertion. For example, in an email network, one could 1864 introduce a new 'Trusted-From' header field whose contents could only 1865 be set by intermediaries in the trusted network. The identity 1866 information conveyed by such a system is the contents of this trusted 1867 header. Recipients treat this trusted header as the assured identity 1868 of the originator. An example of this sort of trusted assertion is 1869 RFC3325 [16], which defines the P-Asserted-Identity header field for 1870 SIP. 1872 The traditional Internet Relay Chat (IRC [3]) service relied on a 1873 similar concept of trusted intermediaries. Intermediaries formed a 1874 meshed trust network over which messages passed, and each server was 1875 responsible for authenticating its users. 1877 While this model has enjoyed considerable success in closed networks 1878 such as the telephone network, it has a number of limitations which 1879 render it incompatible with widespread Internet deployment of a 1880 messaging architecture. Forming closed overlay networks of providers 1881 that agree on network or transport-layer security standards and 1882 practices does not agree with the general model of Internet 1883 messaging, in which domains may exchange messages without any 1884 previous association. 1886 Other, more sophisticated forms of transitive trust are ad-hoc. For 1887 example, a message could contain an explicit indication that any 1888 intermediary that relays the message needs to use some form of 1889 transport or network-layer security when sending to the next hop. 1890 Assuming a proper keying architecture, intermediaries can mutually 1891 authenticate one another from the originating domain to the domain of 1892 the recipient. The SIPS URI scheme in RFC3261 has this property. 1893 The main drawback to such mechanisms is that it is impossible for any 1894 intermediary or recipient to verify that appropriate lower-layer 1895 security was used over any particular transit hop. This is, in fact, 1896 the main problem with trusted networks in general - a given domain 1897 must trust that the remainder of the domains in the network behave 1898 properly. 1900 E.2 Dial-back Identity 1902 A dial-back identity system for messaging works as follows: when a 1903 verifier receives a message, it inspects the name that identifies the 1904 originator (such as the RFC2822.From header for email), and then 1905 launches a dial-back request to that name. This dial-back request 1906 must contain reference indicators for the request, either by-value or 1907 possibly as a hash of a canonicalization of the reference indicators. 1908 In another variant, the message itself contains such a hash which is 1909 verifiable by the recipient (essentially, an unsigned identity 1910 assertion), and the recipient then sends that hash in the backwards 1911 direction to the identity provider. 1913 Assuming the name of the originator is valid, an identity provider 1914 responsible for the namespace of the originator's name will receive 1915 the request. If this identity provider is the originator, it can 1916 reply to the request with a positive response if it did indeed send 1917 the message in question. If the identity provider is some 1918 intermediary, it would need some way to ascertain that the originator 1919 sent that message; possibly, the originator sent the message through 1920 the identity provider, and the identity provider keeps state for 1921 every message it handled. However the intermediary-based identity 1922 provider learns of the validity of a request, it returns a positive 1923 response if the request was in fact sent from the originator in 1924 question. If the identity provider does not recognize the described 1925 message, it sends a negative response. No response (because the 1926 domain of the originator's name doesn't exist, or exists but has no 1927 identity provider) is assumed to be a negative response. 1929 Depending on the semantics of the request, it may be somewhat 1930 intensive for the identity provider to make a determination of 1931 whether or not the request was actually sent by the originator. If a 1932 message is forwarded to numerous recipients, obviously this 1933 per-message work becomes larger, and for cases like large email 1934 mailing lists, it may become unmanageable. The use of unsigned 1935 hashes in the message moves this work to a phase before the message 1936 is sent, rather than after the dial-back request is received. 1938 In some respects, dial-back has similar properties to DNS-based 1939 mechanisms of keying distribution discussed in Section 6.1.2. Since 1940 these system relies on a request being sent in the backwards 1941 direction using the name of the originator, it would necessarily rely 1942 on the validity of the DNS to reach that name. However, unlike the 1943 DNS-based uncertified keying mechanisms, dial-back requires no 1944 special modifications to the DNS. 1946 Dial-back identity systems have enjoyed some success in real-time 1947 messaging systems, but clearly their applicability to 1948 store-and-forward systems is limited, especially when the identity 1949 provider role is instantiated by originators. 1951 All in all, within their domain of applicability, dial-back identity 1952 systems improve security with little expenditure of design effort. 1953 They are not considered further in this document because they are not 1954 predicated on identity assertions as such. 1956 Intellectual Property Statement 1958 The IETF takes no position regarding the validity or scope of any 1959 Intellectual Property Rights or other rights that might be claimed to 1960 pertain to the implementation or use of the technology described in 1961 this document or the extent to which any license under such rights 1962 might or might not be available; nor does it represent that it has 1963 made any independent effort to identify any such rights. Information 1964 on the procedures with respect to rights in RFC documents can be 1965 found in BCP 78 and BCP 79. 1967 Copies of IPR disclosures made to the IETF Secretariat and any 1968 assurances of licenses to be made available, or the result of an 1969 attempt made to obtain a general license or permission for the use of 1970 such proprietary rights by implementers or users of this 1971 specification can be obtained from the IETF on-line IPR repository at 1972 http://www.ietf.org/ipr. 1974 The IETF invites any interested party to bring to its attention any 1975 copyrights, patents or patent applications, or other proprietary 1976 rights that may cover technology that may be required to implement 1977 this standard. Please address the information to the IETF at 1978 ietf-ipr@ietf.org. 1980 Disclaimer of Validity 1982 This document and the information contained herein are provided on an 1983 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1984 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1985 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1986 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1987 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1988 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1990 Copyright Statement 1992 Copyright (C) The Internet Society (2004). This document is subject 1993 to the rights, licenses and restrictions contained in BCP 78, and 1994 except as set forth therein, the authors retain all their rights. 1996 Acknowledgment 1998 Funding for the RFC Editor function is currently provided by the 1999 Internet Society.