idnits 2.17.1 draft-zimmermann-avt-zrtp-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 18. -- Found old boilerplate from RFC 3978, Section 5.5 on line 2228. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2239. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2246. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2252. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 4 instances of too long lines in the document, the longest one being 5 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 22, 2006) is 6389 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: 'SHA-256' on line 1137 == Unused Reference: '5' is defined on line 2130, but no explicit reference was found in the text == Outdated reference: A later version (-01) exists of draft-mcgrew-srtp-big-aes-00 ** Obsolete normative reference: RFC 3309 (ref. '6') (Obsoleted by RFC 4960) ** Obsolete normative reference: RFC 4566 (ref. '11') (Obsoleted by RFC 8866) == Outdated reference: A later version (-02) exists of draft-wing-rtpsec-keying-eval-01 -- Obsolete informational reference (is this intentional?): RFC 4474 (ref. '23') (Obsoleted by RFC 8224) Summary: 6 errors (**), 0 flaws (~~), 5 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 AVT WG P. Zimmermann 3 Internet-Draft Zfone Project 4 Intended status: Informational A. Johnston, Ed. 5 Expires: April 25, 2007 Avaya 6 J. Callas 7 PGP Corporation 8 October 22, 2006 10 ZRTP: Extensions to RTP for Diffie-Hellman Key Agreement for SRTP 11 draft-zimmermann-avt-zrtp-02 13 Status of this Memo 15 By submitting this Internet-Draft, each author represents that any 16 applicable patent or other IPR claims of which he or she is aware 17 have been or will be disclosed, and any of which he or she becomes 18 aware will be disclosed, in accordance with Section 6 of BCP 79. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six months 26 and may be updated, replaced, or obsoleted by other documents at any 27 time. It is inappropriate to use Internet-Drafts as reference 28 material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt. 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 This Internet-Draft will expire on April 25, 2007. 38 Copyright Notice 40 Copyright (C) The Internet Society (2006). 42 Abstract 44 This document defines ZRTP, RTP (Real-time Transport Protocol) header 45 extensions for a Diffie-Hellman exchange to agree on a session key 46 and parameters for establishing Secure RTP (SRTP) sessions. The ZRTP 47 protocol is completely self-contained in RTP and does not require 48 support in the signaling protocol or assume a Public Key 49 Infrastructure (PKI) infrastructure. For the media session, ZRTP 50 provides confidentiality, protection against Man in the Middle (MitM) 51 attacks, and, in cases where a secret is available from the signaling 52 protocol, authentication. ZRTP can utilize three Session Description 53 Protocol (SDP) attributes to provide discovery and authentication 54 through the signaling channel. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 8 60 3. ZRTP and RTP Keying Requirements . . . . . . . . . . . . . . . 8 61 4. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 62 4.1. Key Agreement Modes . . . . . . . . . . . . . . . . . . . 9 63 4.1.1. Diffie-Hellman Mode . . . . . . . . . . . . . . . . . 10 64 4.1.2. Multistream Mode . . . . . . . . . . . . . . . . . . . 11 65 5. Protocol Description . . . . . . . . . . . . . . . . . . . . . 12 66 5.1. Key Agreement and Derivation Algorithm . . . . . . . . . . 12 67 5.1.1. Discovery . . . . . . . . . . . . . . . . . . . . . . 12 68 5.1.2. Hash Commitment . . . . . . . . . . . . . . . . . . . 13 69 5.1.3. Diffie-Hellman Exchange . . . . . . . . . . . . . . . 14 70 5.1.4. Confirmation and Switch to SRTP . . . . . . . . . . . 18 71 5.2. Multistream Mode . . . . . . . . . . . . . . . . . . . . . 20 72 5.3. Random Number Generation . . . . . . . . . . . . . . . . . 21 73 5.4. CRC Protection of Messages . . . . . . . . . . . . . . . . 22 74 5.5. ZID and Cache Operation . . . . . . . . . . . . . . . . . 22 75 5.6. Terminating an SRTP Session or ZRTP Exchange . . . . . . . 23 76 6. RTP Header Extension . . . . . . . . . . . . . . . . . . . . . 24 77 6.1. ZRTP Message Formats . . . . . . . . . . . . . . . . . . . 24 78 6.1.1. Message Type Block . . . . . . . . . . . . . . . . . . 25 79 6.1.2. Hash Type Block . . . . . . . . . . . . . . . . . . . 26 80 6.1.3. Cipher Type Block . . . . . . . . . . . . . . . . . . 26 81 6.1.4. Auth Tag Length Block . . . . . . . . . . . . . . . . 26 82 6.1.5. Key Agreement Type Block . . . . . . . . . . . . . . . 27 83 6.1.6. SAS Type Block . . . . . . . . . . . . . . . . . . . . 27 84 6.2. Hello message . . . . . . . . . . . . . . . . . . . . . . 28 85 6.3. HelloACK message . . . . . . . . . . . . . . . . . . . . . 29 86 6.4. Commit message . . . . . . . . . . . . . . . . . . . . . . 30 87 6.5. DHPart1 message . . . . . . . . . . . . . . . . . . . . . 31 88 6.6. DHPart2 message . . . . . . . . . . . . . . . . . . . . . 32 89 6.7. Confirm1 message . . . . . . . . . . . . . . . . . . . . . 33 90 6.8. Confirm2 message . . . . . . . . . . . . . . . . . . . . . 35 91 6.9. Conf2ACK message . . . . . . . . . . . . . . . . . . . . . 36 92 6.10. GoClear message . . . . . . . . . . . . . . . . . . . . . 37 93 6.11. ClearACK message . . . . . . . . . . . . . . . . . . . . . 37 94 7. Retransmissions . . . . . . . . . . . . . . . . . . . . . . . 38 95 8. Short Authentication String . . . . . . . . . . . . . . . . . 39 96 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 41 97 10. Security Considerations . . . . . . . . . . . . . . . . . . . 42 98 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 43 99 12. Appendix A - ZRTP, SIP, and SDP . . . . . . . . . . . . . . . 43 100 13. Appendix B - The ZRTP Disclosure flag . . . . . . . . . . . . 46 101 14. Appendix C - Intermediary ZRTP Devices . . . . . . . . . . . . 48 102 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 48 103 15.1. Normative References . . . . . . . . . . . . . . . . . . . 48 104 15.2. Informative References . . . . . . . . . . . . . . . . . . 49 105 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 50 106 Intellectual Property and Copyright Statements . . . . . . . . . . 52 108 1. Introduction 110 ZRTP is key agreement protocol which performs Diffie-Hellman key 111 exchange during call setup in-band in the Real-time Transport 112 Protocol (RTP) [2] media stream which has been established using a 113 signaling protocol such as Session Initiation Protocol (SIP) [17]. 114 This generates a shared secret which is then used to generate keys 115 and salt for a Secure RTP (SRTP) [3] session. ZRTP borrows ideas 116 from PGPfone [13]. A reference implementation of ZRTP is available 117 as Zfone [14]. 119 The ZRTP protocol has some nice cryptographic features lacking in 120 many other approaches to media session encryption. Although it uses 121 a public key algorithm, it does not rely on a public key 122 infrastructure (PKI). In fact, it does not use persistent public 123 keys at all. It uses ephemeral Diffie-Hellman (DH) with hash 124 commitment, and allows the detection of Man in the Middle (MitM) 125 attacks by displaying a short authentication string for the users to 126 read and compare over the phone. It has perfect forward secrecy, 127 meaning the keys are destroyed at the end of the call, which 128 precludes retroactively compromising the call by future disclosures 129 of key material. But even if the users are too lazy to bother with 130 short authentication strings, we still get fairly decent 131 authentication against a MitM attack, based on a form of key 132 continuity. It does this by caching some key material to use in the 133 next call, to be mixed in with the next call's DH shared secret, 134 giving it key continuity properties analogous to SSH. All this is 135 done without reliance on a PKI, key certification, trust models, 136 certificate authorities, or key management complexity that bedevils 137 the email encryption world. It also does not rely on SIP signaling 138 for the key management, and in fact does not rely on any servers at 139 all. It performs its key agreements and key management in a purely 140 peer-to-peer manner over the RTP packet stream. 142 Most secure phones rely on a Diffie-Hellman exchange to agree on a 143 common session key. But since DH is susceptible to a man-in-the- 144 middle (MitM) attack, it is common practice to provide a way to 145 authenticate the DH exchange. In some military systems, this is done 146 by depending on digital signatures backed by a centrally-managed PKI. 147 A decade of industry experience has shown that deploying centrally 148 managed PKIs can be a painful and often futile experience. PKIs are 149 just too messy, and require too much activation energy to get them 150 started. Setting up a PKI requires somebody to run it, which is not 151 practical for an equipment provider. A service provider like a 152 carrier might venture down this path, but even then you have to deal 153 with cross-carrier authentication, certificate revocation lists, and 154 other complexities. It is much simpler to avoid PKIs altogether, 155 especially when developing secure commercial products. It is 156 therefore more common for commercial secure phones in the PSTN world 157 to augment the DH exchange with a Short Authentication String (SAS) 158 combined with a hash commitment at the start of the key exchange, to 159 shorten the length of SAS material that must be read aloud. No PKI 160 is required for this approach to authenticating the DH exchange. The 161 AT&T 3600, Eric Blossom's COMSEC secure phones [15], PGPfone [13], 162 and CryptoPhone [16] are all examples of products that took this 163 simpler lightweight approach. 165 The main problem with this approach is inattentive users who may not 166 execute the voice authentication procedure, or unattended secure 167 phone calls to answering machines that cannot execute it. 168 Additionally, some people worry about voice spoofing (the "Rich 169 Little" attack), and some worry about trying to use it between people 170 who don't know each other's voices. This is not as much of a problem 171 as it seems, because it isn't necessary that they recognize each 172 other by their voice, it's only necessary that they detect that the 173 voice used for the SAS procedure matches the voice in the rest of the 174 phone call. These concerns are not enough reason to embrace PKIs as 175 an alternative, in my opinion. 177 A popular and field-proven approach is used by SSH (Secure Shell) 178 [18], which Peter Gutmann likes to call the "baby duck" security 179 model. SSH establishes a relationship by exchanging public keys in 180 the initial session, when we assume no attacker is present, and this 181 makes it possible to authenticate all subsequent sessions. A 182 successful MitM attacker has to have been present in all sessions all 183 the way back to the first one, which is assumed to be difficult for 184 the attacker. All this is accomplished without resorting to a 185 centrally-managed PKI. 187 We use an analogous baby duck security model to authenticate the DH 188 exchange in ZRTP. We don't need to exchange persistent public keys, 189 we can simply cache a shared secret and re-use it to authenticate a 190 long series of DH exchanges for secure phone calls over a long period 191 of time. If we read aloud just one SAS, and then cache a shared 192 secret for later calls to use for authentication, no new voice 193 authentication rituals need to be executed. We just have to remember 194 we did one already. 196 If we ever lose this cached shared secret, it is no longer available 197 for authentication of DH exchanges, so we would have to do a new SAS 198 procedure and start over with a new cached shared secret. Then we 199 could go back to omitting the voice authentication on later calls. 201 A particularly compelling reason why this approach is attractive is 202 that SAS is easiest to implement when a GUI or some sort of display 203 is available, which raises the question of what to do when no display 204 is available. We envision some products that implement secure VoIP 205 via a local network proxy, which lacks a display in many cases. If 206 we take an approach that greatly reduces the need for a SAS in each 207 and every call, we can operate in GUI-less products with greater 208 ease. 210 It's a good idea to force your opponent to have to solve multiple 211 problems in order to mount a successful attack. Some examples of 212 widely differing problems we might like to present him with are: 213 Stealing a shared secret from one of the parties, being present on 214 the very first session and every subsequent session to carry out an 215 active MitM attack, and solving the discrete log problem. We want to 216 force the opponent to solve more than one of these problems to 217 succeed. 219 The protocol can make use different kinds of shared secrets. Each 220 type of shared secret is determined by a different method. All of 221 the shared secrets are hashed together to form a session key to 222 encrypt the call. An attacker must defeat all of the methods in 223 order to determine the session key. 225 First, there is the shared secret determined entirely by a Diffie- 226 Hellman key agreement. It changes with every call, based on random 227 numbers. An attacker may attempt a classic DH MitM attack on this 228 secret, but we can protect against this by displaying and reading 229 aloud a SAS, combined with adding a hash commitment at the beginning 230 of the DH exchange. 232 Second, there is an evolving shared secret, or ongoing shared secret 233 that is automatically changed and refreshed and cached with every new 234 session. We will call this the cached shared secret, or sometimes 235 the retained shared secret. Each new image of this ongoing secret is 236 a non-invertable function of its previous value and the new secret 237 derived by the new DH agreement. It's possible that no cached shared 238 secret is available, because there were no previous sessions to 239 inherit this value from, or because one side loses its cache. 241 There are other approaches for key agreement for SRTP that compute a 242 shared secret using information in the signaling. For example, [20] 243 describes how to carry a MIKEY (Multimedia Internet KEYing) [21] 244 payload in SDP [11]. Or [19] describes directly carrying SRTP keying 245 and configuration information in SDP. ZRTP does not rely on the 246 signaling to compute a shared secret, but If a client does produce a 247 shared secret via the signaling, and makes it available to the ZRTP 248 protocol, ZRTP can make use of this shared secret to augment the list 249 of shared secrets that will be hashed together to form a session key. 250 This way, any security weaknesses that might compromise the shared 251 secret contributed by the signaling will not harm the final resulting 252 session key. 254 There may also be a static shared secret that the two parties agree 255 on out-of-band in advance. A hashed passphrase would suffice. 257 The shared secret provided by the signaling (if available), the 258 shared secret computed by DH, and the cached shared secret are all 259 hashed together to compute the session key for a call. If the cached 260 shared secret is not available, it is omitted from the hash 261 computation. If the signaling provides no shared secret, it is also 262 omitted from the hash computation. 264 No DH MitM attack can succeed if the ongoing shared secret is 265 available to the two parties, but not to the attacker. This is 266 because the attacker cannot compute a common session key with either 267 party without knowing the cached secret component, even if he 268 correctly executes a classic DH MitM attack. Mixing in the cached 269 shared secret for the session key calculation allows it to act as an 270 implicit authenticator to protect the DH exchange, without requiring 271 additional explicit HMACs to be computed on the DH parameters. If 272 the cached shared secret is available, a MitM attack would be 273 instantly detected by the failure to achieve a shared session key, 274 resulting in undecryptable packets. The protocol can easily detect 275 this. It would be more accurate to say that the MitM attack is not 276 merely detected, but thwarted. 278 When adding the complexity of additional shared secrets beyond the 279 familiar DH key agreement, we must make sure the lack of availability 280 of the cached shared secret cannot prevent a call from going through, 281 and we must also prevent false alarms that claim an attack was 282 detected. 284 An added benefit of using these cached shared secrets to mix in with 285 the session keys is that it augments the entropy of the session key. 286 Even if limits on the size of the DH exchange produces a session key 287 with less than 256 bits of real work factor, the added entropy from 288 the cached shared secret can bring up all the subsequent session keys 289 to the full 256-bit AES key strength, assuming no attacker was 290 present in the first call. 292 We could have authenticated the DH exchange the same way SSH does it, 293 with digital signatures, caching public keys instead of shared 294 secrets. But this approach with caching shared secrets seemed a bit 295 simpler, and has the added benefit of adding more entropy to the 296 session keys. 298 The following sections provide an overview of the ZRTP protocol, 299 describe the key agreement algorithm and RTP header extensions. 301 2. Terminology 303 In this document, the key words "MUST", "MUST NOT", "REQUIRED", 304 "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", 305 and "OPTIONAL" are to be interpreted as described in RFC 2119 and 306 indicate requirement levels for compliant implementations [1]. 308 3. ZRTP and RTP Keying Requirements 310 This section discuses how ZRTP meets the RTP keying requirements 311 discussed in [12]. The section numbers referenced are those in this 312 document. 314 Due to the in-band key management approach, ZRTP meets the following 315 requirements: 4.1 Secure Retargeting and Secure Forking and 4.2 316 Clipping Media Before SDP Answer. 318 Due to the built-in in-band discovery mechanisms, ZRTP meets the 5.3 319 Best Effort Encryption requirement. 321 The use of Diffie-Hellman ensures that ZRTP meets the 5.2 Perfect 322 Forward Secrecy requirement. 324 Since the supported SRTP algorithms are not exchanged in the 325 signaling but in the media path, there is no computational penalty in 326 allowing additional supported algorithms as described in 5.4 327 Upgrading Algorithms. 329 ZRTP does not require the SSRC or ROC be signaled per requirement 4.4 330 SSRC and ROC 332 ZRTP does not currently use certificates for authentication so it 333 does not meet requirement 5.1 Public Key Infrastructure. However, 334 ZRTP could be extended to utilize a certificate to perform a digital 335 signature over the Diffie-Hellman values exchanged. 337 ZRTP does not support 4.3 Centralized Keying due to its point-to- 338 point design. 340 4. Overview 342 This section provides a description of how ZRTP works. This 343 description is non-normative in nature but is included to build 344 understanding of the protocol. 346 ZRTP is negotiated the same way a conventional RTP session is 347 negotiated in an offer/answer exchange using the AVP/RTP profile. 348 The ZRTP protocol begins after two endpoints have utilized a 349 signaling protocol such as SIP and are ready to send or have already 350 begun sending RTP packets. This specification defines a new RTP 351 extension header which is used to carry the ZRTP messages between the 352 endpoints. Since RTP endpoints ignore unknown extension headers, the 353 protocol is fully backwards compatible - a ZRTP endpoint attempting 354 to perform key agreement with a non-ZRTP endpoint will simply receive 355 normal RTP responses and can then inform the user that a secure 356 session is not possible and either continue with the insecure session 357 or terminate the session depending on the user's security policy. 359 The ZRTP exchange begins at the same time that the first RTP packets 360 are exchanged between the endpoints. A ZRTP message is transported 361 in an RTP no-op packet. 363 A ZRTP endpoint initiates the exchange by sending a ZRTP Hello 364 message to the other endpoint. The purpose of the Hello message is 365 to discover if the other endpoint supports the protocol and to see 366 what algorithms the two ZRTP endpoints have in common. This 367 discovery can also be achieved if a=zrtp attribute is present in an 368 SDP offer or answer, as described in Appendix A. 370 The Hello message contains the SRTP configuration options, and the 371 ZID. Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID 372 that is generated once at installation time. ZIDs are discovered 373 during the Hello message exchange. The received ZID is used to look 374 up retained shared secrets in a local cache and are used by ZRTP to 375 manage lookup cached or retained shared secrets from previous ZRTP 376 sessions with the endpoint. 378 A response to a ZRTP Hello message is a ZRTP HelloACK message. The 379 HelloACK message simply acknowledges receipt of the Hello message and 380 indicates support for the ZRTP protocol. Since RTP uses best effort 381 UDP transport, ZRTP has retransmission timers in case of lost 382 datagrams. There are two timers, both with exponential backoff 383 mechanisms. One timer is used for retransmissions of Hello messages 384 and the other is used for retransmissions of all other messages after 385 receipt of a HelloACK which indicates support of ZRTP by the other 386 endpoint. 388 4.1. Key Agreement Modes 390 After both endpoints exchange Hello and HelloACK messages, the key 391 agreement exchange can begin with the ZRTP Commit message. ZRTP 392 supports a number of key agreement modes including both Diffie- 393 Hellman and non-Diffie-Hellman as described in the following 394 sections. 396 4.1.1. Diffie-Hellman Mode 398 An example ZRTP call flow is shown in Figure 1 below. Note that the 399 order of the Hello/HelloACK exchanges in F1/F2 and F3/F4 may be 400 reversed. That is, either Alice or Bob might send the first Hello 401 message. Also, an endpoint that receives a Hello message and wishes 402 to immediately begin the ZRTP key agreement can omit the HelloACK and 403 send the Commit instead. In Figure 1, this would result in messages 404 F2, F3, and F4 being omitted. Note that the endpoint which sends the 405 Commit message is considered the initiator of the ZRTP session and 406 drives the key agreement exchange. The Diffie-Hellman public values 407 are exchanged in the DHPart1 and DHPart2 messages. SRTP keys and 408 salts are then calculated along with a ZRTP Session key. 410 Alice Bob 411 | | 412 | Alice and Bob establish a media session.| 413 | | 414 | RTP | 415 |<=======================================>| 416 | | 417 | Hello (version, options, Alice's ZID) F1| 418 |---------------------------------------->| 419 | HelloACK F2 | 420 |<----------------------------------------| 421 | Hello (version, options, Bob's ZID) F3 | 422 |<----------------------------------------| 423 | HelloACK F4 | 424 |---------------------------------------->| 425 | | 426 | Bob acts as the initiator | 427 | | 428 | Commit (Bob's ZID, options, hvi or nonce) F5 429 |<----------------------------------------| 430 | DHPart1 (pvr, shared secret hashes) F6 | 431 |---------------------------------------->| 432 | DHPart2 (pvi, shared secret hashes) F7 | 433 |<----------------------------------------| 434 | | 435 | Alice and Bob generate SRTP session key.| 436 | | 437 | SRTP begins | 438 |<=======================================>| 439 | | 440 | Confirm1 (plaintext, D,S,V flags, hmac) F8 441 |---------------------------------------->| 442 | Confirm2 (plaintext, D,S,V flags, hmac) F9 443 |<----------------------------------------| 444 | Confirm2AK F10 | 445 |---------------------------------------->| 447 Figure 1. Establishment of an SRTP session using ZRTP 449 4.1.2. Multistream Mode 451 Multistream mode is an alternative key agreement method when two 452 endpoints have an establish SRTP media stream between them and hence 453 an active ZRTP Session key. ZRTP can derive multiple SRTP keys from 454 a single DH exchange. For example, an established secure voice call 455 that adds a video stream could use Multistream mode to quickly 456 initiate the video stream without a second DH exchange. 458 When Multistream mode is indicated in the Commit message, a call flow 459 similar to Figure 1 is used, but no DH calculation is performed by 460 either endpoint and the DHPart1 and DHPart2 messages are omitted. In 461 this mode, multiple non-DH ZRTP exchanges can be performed in 462 parallel between two endpoints. 464 Alternatively, each stream can be handled independently using the 465 call flow of Figure 1, resulting in a DH exchange per media stream. 466 To keep the integrity of the retained shared secrets, only a single 467 DH exchange can be processed at a time between two endpoints. 469 5. Protocol Description 471 ZRTP uses RTP [2] to transport discovery and key agreement messages. 472 The messages are carried as RTP header extensions as defined in 473 Section 6. It is RECOMMENDED to use the no-op RTP/AVP payload type 474 [7]. No-op packets are ideal for ZRTP transport as it is permissible 475 to send no-op packets even for media streams marked 'recvonly' or 476 'inactive'. Also, no-op packets can be used with any media type. An 477 endpoint MAY use a different SSRC for ZRTP messages than for RTP 478 media. 480 Note: the use of separate SSRC numbers and hence separate sequence 481 number space allows for very loose coupling between the ZRTP 482 application and the RTP media application. 484 To support best effort encryption [12], ZRTP uses normal RTP/AVP 485 profile (AVP) media lines in the initial offer/answer exchange. The 486 ZRTP SDP attribute flag a=zrtp defined in Appendix A SHOULD be used 487 in all offers and answers to indicate support for the ZRTP protocol. 488 In subsequent offer/answer exchanges after a successful ZRTP exchange 489 has resulted in an SRTP session, the Secure RTP/AVP (SAVP) profile 490 MAY be used. 492 5.1. Key Agreement and Derivation Algorithm 494 The key agreement algorithm has four phases that are described 495 normatively in the following sections. 497 5.1.1. Discovery 499 During the discovery phase, a ZRTP endpoint discovers if the other 500 endpoint supports ZRTP and which ZRTP version, hash, cipher, auth tag 501 length, key agreement type, and SAS algorithms are supported. In 502 addition, each endpoint sends and discovers ZIDs. The received ZID 503 is used to retrieve previous retained shared secrets, rs1 and rs2. 504 If the endpoint has other secrets, then they are also collected. The 505 signaling secret (sigs), is passed from the signaling protocol used 506 to establish the RTP session. For SIP, it is the dialog identifier 507 of a Secure SIP (sips) session: a string composed of Call-ID, to tag, 508 and from tag. From the definitions in RFC 3261 [17]: 510 sigs = hash(call-id | tag1 | tag2) 512 Note: the dialog identifier of a non-secure SIP session should not be 513 considered a signaling secret as it has no confidentiality 514 protection. 516 For the SRTP secret (srtps), it is the SRTP master key and salt. 517 This information may have been passed in the signaling using MIKEY or 518 SDP Security Descriptions, for example: 520 srtps = hash(SRTP master key | SRTP master salt) 522 Additional shared secrets can be defined and used as other_secret. 523 If no secret of a given type is available, a random value is 524 generated and used for that secret to ensure a mismatch in the hash 525 comparisons in the DHPart1 and DHPart2 messages. This prevents an 526 eavesdropper from knowing how many shared secrets are available 527 between the endpoints. 529 A Hello message can be sent at any time, but is usually sent at the 530 start of an RTP session to determine if the other endpoint supports 531 ZRTP, and also if the SRTP implementations are compatible. A Hello 532 message is retransmitted using timer T1 and an exponential backoff 533 mechanism detailed in Section 7 until the receipt of a HelloACK 534 message or a Commit message. 536 5.1.2. Hash Commitment 538 The hash commitment is performed by the initiator of the ZRTP 539 exchange. From the intersection of the algorithms in the sent and 540 received Hello messages, the initiator chooses a hash, cipher, auth 541 tag length, key agreement type, and sas algorithm to be used. 543 A Diffie-Hellman mode is selected by setting the Key Agreement Type 544 to DH4096 or DH3072 in the Commit. In this mode, the key agreement 545 begins with the initiator choosing a fresh random Diffie-Hellman (DH) 546 secret value (svi) based on the chosen key agreement type value, and 547 computing the public value. (Note that to speed up processing, this 548 computation can be done in advance.) For guidance on generating 549 random numbers, see the section on Random Number Generation. The 550 Diffie-Hellman secret value, svi, SHOULD be twice as long as the AES 551 key length. This means, if AES 128 is used, the DH secret value 552 SHOULD be 256 bits long. If AES 256 is used, the secret value SHOULD 553 be 512 bits long. 555 pvi = g^svi mod p 557 where g and p are determined by the key agreement type value, and a 558 hash, hvi, of the public value using the chosen hash algorithm. The 559 hvi includes the set of hash, cipher, atl, pkt, and sas types from 560 the responder's Hello in the following order: 562 hvi=hash(pvi | hashr1-5 | cipherr1-5 | atl1-5 | pktr1-5 | sasr1-5) 564 The information from the responder's Hello message is included in the 565 hash calculation to prevent a bid-down attack by modification of the 566 responder's Hello message. 568 Note: If both sides send Commit messages initiating a secure session 569 at the same time, the Commit message with the lowest hvi value is 570 discarded and the other side is the initiator. This breaks the tie, 571 allowing the protocol to proceed from this point with a clear 572 definition of who is the initiator and who is the responder. 574 Because the DH exchange affects the state of the retained shared 575 secret cache, only one in-process ZRTP DH exchange may occur at a 576 time between two ZRTP endpoints. Otherwise, race conditions and 577 cache integrity problems will result. When multiple media streams 578 are established in parallel between the same pair of ZRTP endpoints 579 (determined by the ZIDs in the Hello Messages), only one can be 580 processed. Once that exchange completes with Confirm2 and Conf2ACK 581 messages, another ZRTP DH exchange can begin. In the event that 582 Commit messages are sent by both ZRTP endpoints at the same time, but 583 are received in different media streams, the same resolution rules 584 apply - the Commit message with the lowest hvi value is discarded and 585 the other side is the initiator. The media stream in which the 586 Commit was sent will proceed through the ZRTP exchange while the 587 media stream with the discarded Commit must wait for the completion 588 of the other ZRTP exchange. 590 Note: This paragraph does not apply when Multistream mode key 591 agreement is used since the cached shared secrets are not affected. 593 5.1.3. Diffie-Hellman Exchange 595 The purpose of the Diffie-Hellman exchange is for the two ZRTP 596 endpoints to generate a new shared secret, s0. In addition, the 597 endpoints discover if they have any shared secrets in common. If 598 they do, this exchange allows them to discover how many and agree on 599 an ordering for them: s1, s2, etc. 601 5.1.3.1. Responder Behavior 603 Upon receipt of the Commit message, the responder generates its own 604 fresh random DH secret value, svr, and computes the public value. 605 (Note that to speed up processing, this computation can be done in 606 advance.) For guidance on random number generation, see the section 607 on Random Number Generation. The Diffie-Hellman secret value, svr, 608 SHOULD be twice as long as the AES key length. This means, if AES 609 128 is used, the DH secret value SHOULD be 256 bits long. If AES 256 610 is used, the secret value SHOULD be 512 bits long. 612 pvr = g^svr mod p 614 The final shared secret, s0, is calculated by hashing the 615 concatenation of the Diffie-Hellman shared secret (DHSS) followed by 616 the (possibly empty) set of shared secrets that are actually shared 617 between the initiator and responder. For computing the hash, the 618 shared secrets are sorted by the order of the initiator's 619 corresponding shared secret IDs. The remainder of this section 620 describes an algorithm to accomplish this. 622 First, an HMAC keyed hash is calculated using the first retained 623 shared secret, rs1, as the key on the string "Responder" which 624 generates a retained secret ID, rs1IDr, which is truncated to 64 625 bits. HMACs are calculated in a similar way for additonal shared 626 secrets: 628 rs1IDr = HMAC(rs1, "Responder") 630 rs2IDr = HMAC(rs2, "Responder") 632 sigsIDr = HMAC(sigs, "Responder") 634 srtpsIDr = HMAC(srtps, "Responder") 636 other_secretIDr = HMAC(other_secret, "Responder") 638 A ZRTP DHPart1 message is generated containing pvr and the set of 639 keyed hashes (HMACs) derived from the possibly shared secrets. 641 Upon receipt of the DHPart2 message, the responder checks that the 642 initiator's public DH value is not equal to 1 or p-1. An attacker 643 might inject a false DHPart2 packet with a value of 1 or p-1 for 644 g^svi mod p, which would cause a disastrously weak final DH result to 645 be computed. If pvi is 1 or p-1, the user should be alerted of the 646 attack and the protocol exchange must be terminated. Otherwise, the 647 responder then computes the hash of the public DH value in the 648 DHPart2 with the hash from the Commit. If they are different, a MitM 649 attack is taking place and the user is alerted and the protocol 650 exchange terminated. 652 The responder then calculates the Diffie-Hellman result: 654 DHResult = pvi^svr mod p 656 The responder then calculates the Diffie-Hellman shared secret: 658 DHSS = hash(DHResult) 660 The hmacs of the possible shared secrets received are compared 661 against the hmacs of the local set of possible shared secrets. 663 Note: When comparing the signaling secret sigs derived from SIP, both 664 orderings of to-tag followed by from-tag, and from-tag followed by 665 to-tag must be tried. 667 The expected hmac values of the shared secrets are calculated (using 668 the string "Initiator" instead of "Responder") and compared to the 669 hmacs received in the DHPart2 message. The secrets corresponding to 670 matching hmacs are kept while the secrets corresponding to the non- 671 matching ones are replaced with a null. The set of up to five actual 672 shared secrets are then s1, s2, s3, s4, and s5 - the order is that 673 chosen by the initiator. The final shared secret, s0, is calculated 674 by hashing the concatenation of the DHSS and the set of non-null 675 shared secrets. As a result, the null secrets have no effect on the 676 concatenation operation: 678 s0 = hash(DHSS | s1 | s2 | s3 | s4 | s5) 680 For example, consider two ZRTP endpoints who share secrets rs1, rs2, 681 and a hash of a secret passphrase other_secret. During the 682 comparison, rs1ID, rs2ID, and other_secretID will match but sigsID 683 and srtpsID will not. As a result, s1 = rs1, s2 = rs2, s5 = 684 other_secret, while s3 and s4 will be nulls. s0 for this exchange 685 will be calculated as the hash of the concatenation of DHSS, rs1, 686 rs2, and other_secret. 688 5.1.3.2. Initiator Behavior 690 Upon receipt of the DHPart1 message, the initiator checks that the 691 responder's public DH value is not equal to 1 or p-1. An attacker 692 might inject a false DHPart1 packet with a value of 1 or p-1 for 693 g^svr mod p, which would cause a disastrously weak final DH result to 694 be computed. If pvr is 1 or p-1, the user should be alerted of the 695 attack and the protocol exchange must be terminated. 697 If pvr is not 1 or p-1, the initiator looks up any retained shared 698 secrets associated with the responder's ZID. The final shared 699 secret, s0, is calculated by hashing the concatenation of the DHSS 700 followed by the (possibly empty) set of shared secrets that are 701 actually shared between the initiator and responder. For computing 702 the hash, the shared secrets are sorted by the order of the 703 initiator's corresponding shared secret IDs. The remainder of this 704 section describes an algorithm to accomplish this. 706 First, an HMAC keyed hash is calculated using the first retained 707 shared secret, rs1, as the key on the string "Initiator" which 708 generates a retained secret ID, rs1IDi, which is truncated to 64 709 bits. HMACs are calculated in a similar way for additional shared 710 secrets: 712 rs1IDi = HMAC(rs1, "Initiator") 714 rs2IDi = HMAC(rs2, "Initiator") 716 sigsIDi = HMAC(sigs, "Initiator") 718 srtpsIDi = HMAC(srtps, "Initiator") 720 other_secretIDi = HMAC(other_secret, "Initiator") 722 The initiator then sends a DHPart2 message containing the initiator's 723 public DH value and the set of calculated retained secret IDs. 725 The initiator calculates the same Diffie-Hellman result using: 727 DHResult = pvr^svi mod p 729 The initiator then calculates the DH shared secret using: 731 DHSS = hash(DHResult) 733 The initiator then calculates the set of secret IDs that are expected 734 to be received from the responder in the DHPart1 message: 736 rs1IDr = HMAC(rs1, "Responder") 738 rs2IDr = HMAC(rs2, "Responder") 740 sigsIDr = HMAC(sigs, "Responder") 742 srtpsIDr = HMAC(srtps, "Responder") 744 other_secretIDr = HMAC(other_secret, "Responder") 745 The hmacs of the possible shared secrets received are compared 746 against the hmacs of the local set of possible shared secrets. 748 Note: When comparing the signaling secret sigs derived from SIP, both 749 orderings of to-tag followed by from-tag, and from-tag followed by 750 to-tag must be tried. 752 The expected hmac values of the shared secrets are calculated (using 753 the string "Responder" instead of "Initiator") and compared to the 754 hmacs received in the DHPart1 message. The secrets corresponding to 755 matching hmacs are kept while the secrets corresponding to the non- 756 matching ones are replaced with a null. The set of up to five actual 757 shared secrets are then s1, s2, s3, s4, and s5 - the order is that 758 chosen by the initiator. The final shared secret, s0, is calculated 759 by hashing the concatenation of the DHSS and the set of non-null 760 shared secrets. As a result, the null secrets have no effect on the 761 concatenation operation: 763 s0 = hash(DHSS | s1 | s2 | s3 | s4 | s5) 765 5.1.4. Confirmation and Switch to SRTP 767 The SRTP master key and master salt are then generated using the 768 shared secret. Separate SRTP keys and salts are used in each 769 direction for each media stream. Unless otherwise specified, ZRTP 770 uses SRTP with no MKI, 32 bit authentication using HMAC-SHA1, AES-CM 771 128 or 256 bit key length, 112 bit session salt key length, 2^48 key 772 derivation rate, and SRTP prefix length 0. 774 The ZRTP initiator encrypts and the ZRTP responder decrypts packets 775 by using srtpkeyi and srtpsalti, which are generated by: 777 srtpkeyi = HMAC(s0,"Initiator SRTP master key") 779 srtpsalti = HMAC(s0,"Initiator SRTP master salt") 781 The key and salt values are truncated to the length determined by the 782 chosen SRTP algorithm. The ZRTP responder encrypts and the ZRTP 783 initiator decrypts packets by using srtpkeyr and srtpsaltr, which are 784 generated by: 786 srtpkeyr = HMAC(s0,"Responder SRTP master key") 788 srtpsaltr = HMAC(s0,"Responder SRTP master salt") 790 A ZRTP Session Key is generated which then allows the ZRTP 791 Multistream mode to be used to generate SRTP key and salt pairs for 792 additional concurrent media streams between this pair of ZRTP 793 endpoints. If a ZRTP Session Key has already been generated between 794 this pair of endpoints, no new ZRTP Session Key is calculated. 796 ZRTPsess = HMAC(s0,"ZRTP Session Key") 798 The ZRTPsess key is kept for the duration of the call signaling 799 session between the two ZRTP endpoints. That is, if there are two 800 separate calls between the endpoints (in SIP terms, separate SIP 801 dialogs), then a ZRTP Session Key MUST NOT be used across the two 802 call signaling sessions. At the end of the call signaling session, 803 ZRTPSess is destroyed. 805 The HMAC keys are generated by: 807 hmackeyi = HMAC(s0,"Initiator HMAC key") 809 hmackeyr = HMAC(s0,"Responder HMAC key") 811 Note that these HMAC keys are used only by ZRTP and not by SRTP. A 812 new rs1 is calculated from s0: 814 rs1 = HMAC (s0, "retained secret") 816 The endpoints can now switch to SRTP and begin packet encryption. 817 The ZRTP Initiator and Responder use their own keying material for 818 the SRTP session. No MKI is used and a 32 bit authentication tag is 819 used. 821 The ZRTP Confirm1 and Confirm2 messages are sent for two reasons. 822 First, they confirm that all the key agreement calculations were 823 successful and the encryption is working, and they enable automatic 824 detection of a DH MitM attack from a reckless attacker who does not 825 know the retained shared secret. Second, they enable us to transmit 826 the SAS Verified flag (V) under cover of SRTP encryption, shielding 827 it from a passive observer who would like to know if the human users 828 are in the habit of diligently verifying the SAS. 830 The Confirm1 and Confirm2 messages contain the cache expiration 831 interval for the newly generated retained shared secret. Based on 832 this, both sides now discard the rs2 value and store rs1 as rs2. The 833 Confirm1 and Confirm2 messages also contain an HMAC of some known 834 plaintext and the flagoctet. The flagoctet is an 8 bit unsigned 835 integer made up of the Disclosure flag (D), Stay secure flag (S), SAS 836 Verified flag (V): 838 flagoctet = D * 2^2 + S * 2^1 + V * 2^0 840 The HMAC is explicitly included in the payload because we may not 841 always be able to rely on the built-in authentication tag in SRTP, 842 which might be configured to different sizes, including none. 844 hmac = HMAC(hmackey, "known plaintext" | flagoctet ) 846 This information is not carried in the extension header but inserted 847 at the start of the SRTP payload. 849 The Conf2ACK message completes the exchange. 851 5.2. Multistream Mode 853 The Multistream key agreement mode can be used to generate SRTP keys 854 and salts for additional media streams established between a pair of 855 endpoints. Multistream mode cannot be used unless there is an active 856 SRTP session established between the endpoints which means a ZRTP 857 Session key is active. This ZRTP Session key can be used to generate 858 keys and salts without performing another DH calculation. In this 859 mode, the retained shared secret cache is not used or updated. As a 860 result, multiple ZRTP Multistream mode exchanges can be processed in 861 parallel between two endpoints. 863 This mode is selected by setting the Key Agreement Type to "Multistr" 864 in the Commit message. The Cipher Type and Auth Tag Length in 865 Multistream mode MUST be the same as the values in the initial DH 866 Mode Commit and MUST be ignored if different, making bid down 867 impossible. The SAS Type is ignored as there is no SAS 868 authentication in this mode. In in place of hvi in the Commit, a 869 random number, nonce, 32 octets long is chosen. Its value MUST be 870 unique for all nonce values chosen for active ZRTP sessions between a 871 pair of endpoints. If a Commit is received with a reused nonce 872 value, the ZRTP exchange MUST be immediately terminated. 874 Note: Since the nonce is used to calculate different SRTP key and 875 salt pairs for each media stream, a duplication will result in the 876 same key and salt being generated for the two media streams. 878 If a Commit is received selecting Multistream mode, but the responder 879 does not have a ZRTP Session Key available, the exchange MUST be 880 terminated. 882 In Multistream mode, both the DHPart1 and DHPart2 messages are not 883 sent. After the Commit, SRTP begins and the responder sends the 884 Confirm1 message. The SRTP key and salt for the initiator and 885 responder are calculated using the ZRTP Session Key and the nonce 886 from the Commit message. For the nth media stream: 888 s0n= HMAC(ZRTPSess, nonce) 889 The ZRTP initiator encrypts and the ZRTP responder decrypts packets 890 for this nth session by using srtpkeyin and srtpsaltin, which are 891 generated by: 893 srtpkeyin = HMAC(s0n,"Initiator SRTP master key") 895 srtpsaltin = HMAC(s0n,"Initiator SRTP master salt") 897 The key and salt values are truncated to the length determined by the 898 chosen SRTP algorithm. The ZRTP responder encrypts and the ZRTP 899 initiator decrypts packets for this nth stream by using srtpkeyrn and 900 srtpsaltrn, which are generated by: 902 srtpkeyrn = HMAC(s0n,"Responder SRTP master key") 904 srtpsaltrn = HMAC(s0n,"Responder SRTP master salt") 906 The HMAC keys are generated by: 908 hmackeyin = HMAC(s0n,"Initiator HMAC key") 910 hmackeyrn = HMAC(s0n,"Responder HMAC key") 912 5.3. Random Number Generation 914 The ZRTP protocol uses random numbers for cryptographic key material, 915 notably for the DH secret exponents and nonces, which must be freshly 916 generated with each session. Whenever a random number is needed, all 917 of the following criteria must be satisfied: 919 It MUST be derived from a physical entropy source, such as RF noise, 920 acoustic noise, thermal noise, high resolution timings of 921 environmental events, or other unpredictable physical sources of 922 entropy. Chapter 10 of [8] gives a detailed explanation of 923 cryptographic grade random numbers and provides guidance for 924 collecting suitable entropy. The raw entropy must be distilled and 925 processed through a deterministic random bit generator (DRBG). 926 Examples of DRBGs may be found in NIST SP 800-90 [9], and in [8]. 928 It MUST be freshly generated, meaning that it must not have been used 929 in a previous calculation. 931 It MUST be greater than or equal to two, and less than or equal to 932 2^L - 1, where L is the number of random bits required. 934 It MUST be chosen with equal probability from the entire available 935 number space, e.g., [2, 2^L - 1]. 937 5.4. CRC Protection of Messages 939 The ZRTP protocol uses a 32 bit CRC checksum in each ZRTP message as 940 defined in RFC 3309 [6] to detect transmission errors. ZRTP packets 941 are carried by UDP, which carries its own built-in 16-bit checksum 942 for integrity, but ZRTP does not rely on it. This is because of the 943 effect of an undetected transmission error in a ZRTP message. For 944 example, an undetected error in the DH exchange could appear to be an 945 active man-in-the-middle attack. The psychological effects of a 946 false announcement of this by ZTRP clients can not be overstated. 947 The probability of such a false alarm hinges on a mere 16-bit 948 checksum that usually protects UDP packets, so more error detection 949 is needed. For these reasons, this belt-and-suspenders approach is 950 used to minimize the chance of a transmission error affecting the 951 ZRTP key agreement. 953 The CRC is calculated across the ZRTP message only, including the RTP 954 Header extension (0x505A) and length field, followed by the ZRTP 955 message itself, but not including the CRC field. The CRC does not 956 include the normal RTP header (V, P, X, CC, M, PT, sequence number, 957 timestamp, SSRC, CCRC) or payload. In the Confirm1 and Confirm2 958 messages, the CRC does not include the fields transported in the 959 payload (plaintext, flags, hmac). If a ZRTP message fails the CRC 960 check, it is silently discarded. 962 5.5. ZID and Cache Operation 964 Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID that 965 is generated once at installation time. It is used to look up 966 retained shared secrets in a local cache. A single global ZID for a 967 single installation is the simplest way to implement ZIDs. However, 968 it is specifically not precluded for an implementation to use 969 multiple ZIDs, up to the limit of a separate one per callee. This 970 then turns it into a long-lived "association ID" that does not apply 971 to any other associations between a different pair of parties. It is 972 a goal of this protocol to permit both options to interoperate 973 freely. 975 Each time a new s0 is calculated, a new retained shared secret rs1 is 976 generated and stored in the cache, indexed by the ZID of the other 977 endpoint. The previous retained shared secret is then renamed rs2 978 and also stored in the cache. For the new retained shared secret, 979 each endpoint chooses a cache expiration value which is an unsigned 980 32 bit integer of the number of seconds that this secret should be 981 retained in the cache. The time interval is relative to when the 982 Confirm1 message is sent or received. 984 Note: The storage of two retained shared secrets ensures that even 985 when a Commit is sent close to the expiration time of a retained 986 shared secret, there is a high probability of the endpoints having at 987 least one retained shared secret. The exception to this is if both 988 retained shared secrets have identical or near identical expiration 989 times. 991 The cache intervals are exchanged in the Confirm1 and Confirm2 992 messages. The actual cache interval used by both endpoints is the 993 minimum of the values from the Confirm1 and Confirm2 messages. A 994 value of 0 seconds means the secret should not be cached and the 995 current values of rs1 and rs2 MUST be maintained. A value of 996 0xFFFFFFFF means the secret should be cached indefinitely and is the 997 recommended value. If the ZRTP exchange results in no new shared 998 secret generation (i.e. Multistream Mode), the field in the Confirm1 999 and Confirm2 is set to 0xFFFFFFFF and ignored. 1001 Retained shared secrets expiration times are checked at the time of 1002 their inclusion in a DHPart1 or DHPart2 message. Expired values are 1003 not included and dropped from the cache. 1005 5.6. Terminating an SRTP Session or ZRTP Exchange 1007 The GoClear message is used to switch from SRTP to RTP or to 1008 terminate an in-progress ZRTP exchange. The GoClear message contains 1009 a reason string for human purposes and a clear_hmac field. 1011 When used to switch from SRTP to RTP, ZRTP avoids relying on the 1012 optional SRTP authentication tag by using an HMAC of the string 1013 "GoClear" computed with the hmackey derived from the shared secret: 1015 clear_hmac = HMAC(hmackey, "GoClear") 1017 A GoClear message which does not receive a ClearACK response 1018 indicates that the GoClear has failed authentication (the clear_hmac 1019 does not validate) and that the session must stay in secure mode. 1021 When terminating an in-progress ZRTP exchange, no secret hmackey is 1022 available, so the clear_hmac field is set to all zeros and ignored. 1023 The reason string SHOULD indicate the reason for the failure (e.g. 1024 "No Session Key", "Nonce Reuse", "Invalid DH Value"). The 1025 termination of a ZRTP key agreement exchange results in no updates to 1026 the cached shared secrets and deletion of all crypto context. 1028 A ZRTP endpoint that receives a GoClear authenticates the message by 1029 checking the clear_hmac. If the message authenticates, the endpoint 1030 stops sending SRTP packets, generates a ClearACK in response, and 1031 deletes the crypto context for the SRTP session. Until confirmation 1032 from the user is received (e.g. clicking a button, pressing a DTMF 1033 key, etc.), the ZRTP endpoint MUST NOT resume sending RTP packets. 1034 The endpoint then renders the reason string and an indication that 1035 the media session has switched to clear mode to the user and waits 1036 for confirmation from the user. To prevent pinholes from closing or 1037 NAT bindings from expiring, the ClearACK message MAY be resent at 1038 regular intervals (e.g. every 5 seconds) while waiting for 1039 confirmation from the user. After confirmation of the notification 1040 is received from the user, the sending of RTP packets may begin. 1042 After sending a GoClear message, the ZRTP endpoint stops sending SRTP 1043 packets. When a ClearACK is received, the ZRTP endpoint deletes the 1044 crypto context for the SRTP session and may then resume sending RTP 1045 packets. However, the ZRTP Session key is not deleted unless the 1046 signaling session is terminated as well. 1048 A ZRTP endpoint MAY choose not to accept GoClear messages after the 1049 session has switched to SRTP. This is indicated in the Confirm1 or 1050 Confirm2 messages by setting the Stay secure flag (S). 1052 6. RTP Header Extension 1054 This specification defines a new RTP header extension used for all 1055 ZRTP messages. When used, the X bit is set in the RTP header to 1056 indicate the presence of the RTP header extension. 1058 Section 5.3.1 in RFC 3550 defines the format of an RTP Header 1059 extension. The Header extension is appended to the RTP header. The 1060 first 16 bits are an identifier for the header extension, and the 1061 following 16 bits are length of the extension header in 32 bit words. 1062 All word lengths referenced in this specification follow RFC 3550 and 1063 are 32 bits or 4 octets. All integer fields are carried in network 1064 byte order, that is, most significant byte (octet) first, commonly 1065 known as big-endian. Each ZRTP message is carried in a single RTP 1066 header extension which has the value of 0x505A. 1068 6.1. ZRTP Message Formats 1070 ZRTP messages are designed to simplify endpoint parsing requirements 1071 and to reduce the opportunities for buffer overflow attacks (a good 1072 goal of any security extension should be to not introduce new attack 1073 vectors...) 1075 ZRTP uses 8 octets (2 words) to encode many ZRTP parameters. These 1076 fixed-length blocks are used for Message Type, Hash Type, Cipher 1077 Type, and Key Agreement Type. For the Authentication Tag Length, 4 1078 octets are used. The values in the blocks are ASCII strings which 1079 are extended with spaces (0x20) to make them 8 characters long. 1081 Currently defined block values are listed in Tables 1-6 below. 1082 Additional block values may be defined and used. 1084 ZRTP uses this ASCII encoding to simplify debugging and make it 1085 "ethereal friendly". 1087 6.1.1. Message Type Block 1089 Currently ten Message Type Blocks are defined - they represent the 1090 set of ZRTP message primitives. ZRTP endpoints MUST support the 1091 Hello, HelloACK, Commit, DHPart1, DHPart2, Confirm1, Confirm2, 1092 Conf2ACK, GoClear and ClearACK block types. 1094 Message Type Block | Meaning 1095 --------------------------------------------------- 1096 "Hello " | Hello Message 1097 | defined in Section 6.2 1098 --------------------------------------------------- 1099 "HelloACK" | HelloACK Message 1100 | defined in Section 6.3 1101 --------------------------------------------------- 1102 "Commit " | Commit Message 1103 | defined in Section 6.4 1104 --------------------------------------------------- 1105 "DHPart1 " | DHPart1 Message 1106 | defined in Section 6.5 1107 --------------------------------------------------- 1108 "DHPart2 " | DHPart2 Message 1109 | defined in Section 6.6 1110 --------------------------------------------------- 1111 "Confirm1" | Confirm1 Message 1112 | defined in Section 6.7 1113 --------------------------------------------------- 1114 "Confirm2" | Confirm2 Message 1115 | defined in Section 6.8 1116 --------------------------------------------------- 1117 "Conf2ACK" | Conf2ACK Message 1118 | defined in Section 6.9 1119 --------------------------------------------------- 1120 "GoClear " | GoClear Message 1121 | defined in Section 6.10 1122 --------------------------------------------------- 1123 "ClearACK" | ClearACK Message 1124 | defined in Section 6.11 1125 --------------------------------------------------- 1127 Table 1. Message Block Type Values 1129 6.1.2. Hash Type Block 1131 Only one Hash Type is currently defined, SHA256, and all ZRTP 1132 endpoints MUST support this hash. Additional Hash Types can be 1133 registered and used. 1135 Hash Type Block | Meaning 1136 --------------------------------------------------- 1137 "SHA256 " | SHA-256 Hash defined in [SHA-256] 1138 --------------------------------------------------- 1140 Table 2. Hash Block Type Values 1142 6.1.3. Cipher Type Block 1144 All ZRTP endpoints MUST support AES128 and MAY support AES256 [4]. or 1145 other Cipher Types. Also, if AES 128 is used, DH3k should be used. 1146 If AES 256 is used, DH4k should be used. 1148 Cipher Type Block | Meaning 1149 --------------------------------------------------- 1150 "AES128 " | AES-CM with 128 bit keys 1151 | as defined in RFC 3711 1152 --------------------------------------------------- 1153 "AES256 " | AES-CM with 256 bit keys 1154 | as defined in RFC 3711 1155 --------------------------------------------------- 1157 Table 3. Cipher Block Type Values 1159 6.1.4. Auth Tag Length Block 1161 The Auth Tag Length Block is 4 octets (1 word) long. All ZRTP 1162 endpoints MUST support 32 bit and 80 bit authentication tags as 1163 defined in RFC 3711. 1165 Auth Tag Length Block | Meaning 1166 --------------------------------------------------- 1167 "32 " | 32 bit authentication tag 1168 | as defined in RFC 3711 1169 --------------------------------------------------- 1170 "80 " | 80 bit authentication tag 1171 | as defined in RFC 3711 1172 --------------------------------------------------- 1173 Table 4. Auth Tag Length Values 1175 6.1.5. Key Agreement Type Block 1177 All ZRTP endpoints MUST support DH3072 and MAY support DH4096. ZRTP 1178 endpoints MUST use the DH generator function g=2. The choice of AES 1179 key length is coupled to the choice of key agreement type. If AES 1180 128 is chosen, DH3072 SHOULD be used. If AES 256 is chosen, DH4096 1181 SHOULD be used. ZRTP also defines a non-DH mode, Multistream, which 1182 MUST be supported. In Multistream mode, the SRTP key is derived from 1183 a ZRTP Session key and a nonce. 1185 Key Agreement Type Block | Meaning 1186 --------------------------------------------------- 1187 "DH3072 " | DH mode with p=3072 bit prime 1188 | as defined in RFC 3526 1189 --------------------------------------------------- 1190 "DH4096 " | DH mode with p=4096 bit prime 1191 | as defined in RFC 3526 1192 --------------------------------------------------- 1193 "Multistr" | Multistream Non-DH mode 1194 | uses ZRTP Session key 1195 --------------------------------------------------- 1197 Table 5. Key Agreement Block Type Values 1199 6.1.6. SAS Type Block 1201 All ZRTP endpoints SHOULD support the base32 and base256 Short 1202 Authentication String scheme or other SAS schemes. The optional ZRTP 1203 SAS is described in Section 7. 1205 SAS Type Block | Meaning 1206 --------------------------------------------------- 1207 "base32 " | Short Authentication String using 1208 | base32 encoding defined in Section 8. 1209 --------------------------------------------------- 1210 "base256 " | Short Authentication String using 1211 | base 256 encoding defined in Section 8. 1212 --------------------------------------------------- 1214 Table 6. SAS Block Type Values 1216 6.2. Hello message 1218 The Hello message has the format shown in Figure 2 below. The header 1219 extension payload contains the ZRTP version number and the list of 1220 algorithms supported by SRTP. The extension header field format is 1221 shown in Figure 2. 1223 The Hello ZRTP message begins with the ZRTP header extension field 1224 followed by the 32 bit word count of the header field. Next is a 1225 word containing the version (ver) of ZRTP. For this specification, 1226 the version is the string "0.03". Next is the Client Identifier 1227 string (cid) which is 31 octets long and identifies the vendor and 1228 release of the ZRTP software. The Passive bit (P) is a Boolean 1229 normally set to False. A ZRTP endpoint which is configured to never 1230 initiate secure sessions is regarded as passive, and would set the P 1231 bit to True. Next is a list of supported Hash Types, Cipher Types, 1232 Auth Tag length, Key Agreement Types, and SAS Type. Five possible 1233 algorithms are listed for each using the Blocks defined in Tables 2, 1234 3, 4, 5, and 6. If fewer than five algorithms are supported, spaces 1235 (0x20) are used to pad out the 10 words for each type. The last 1236 parameter is the ZID, the 96 bit long unique identifier for the ZRTP 1237 endpoint. 1239 0 1 2 3 1240 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1241 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1242 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=60 words | 1243 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1244 | Message Type Block="Hello " (2 words) | 1245 | | 1246 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1247 | version (1 word) | 1248 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1249 | | 1250 | Client Identifier (31 octets) | 1251 | . . . | 1252 | +-+-+-+-+-+-+-+-+ 1253 | |0 0 0 0 0 0 0|P| 1254 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1255 | | 1256 | Hash Type Blocks 1-5 (10 words) | 1257 | . . . | 1258 | | 1259 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1260 | | 1261 | Cipher Type Blocks 1-5 (10 words) | 1262 | . . . | 1263 | | 1264 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1265 | | 1266 | Auth Tag Length Blocks 1-5 (5 words) | 1267 | . . . | 1268 | | 1269 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1270 | | 1271 | Key Agreement Type Blocks 1-5 (10 words) | 1272 | . . . | 1273 | | 1274 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1275 | | 1276 | SAS Type Blocks 1-5 (10 words) | 1277 | . . . | 1278 | | 1279 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1280 | | 1281 | ZID (3 words) | 1282 | | 1283 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1284 | CRC (1 word) | 1285 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1287 Figure 2. Extension header format for Hello message 1289 6.3. HelloACK message 1291 The HelloACK message is used to stop retransmissions of a Hello 1292 message. A HelloACK is sent regardless if the version number in the 1293 Hello is supported or the algorithm list supported. The receipt of a 1294 HelloACK stops retransmission of the Hello message. The format is 1295 shown in Figure 3 below. Note that a Commit message can be sent in 1296 place of a HelloACK by an initiator. 1298 0 1 2 3 1299 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1300 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1301 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 1302 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1303 | Message Type Block="HelloACK" (2 words) | 1304 | | 1305 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1306 | CRC (1 word) | 1307 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1309 Figure 3. Extension header format for HelloACK message 1311 6.4. Commit message 1313 The Commit message is sent to initiate the key agreement process 1314 after receiving a Hello message. The Commit message contains the 1315 initiator's ZID and a list of selected algorithms (hash, cipher, atl, 1316 pkt, sas), the ZRTP mode, and hvi, a hash of the public DH value of 1317 the initiator and the algorithm list from the responder's Hello 1318 message. If a non-DH mode is used, hvi is replaced by a random 1319 number, nonce. 1321 0 1 2 3 1322 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1323 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1324 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=23 words | 1325 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1326 | Message Type Block="Commit " (2 words) | 1327 | | 1328 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1329 | | 1330 | ZID (3 words) | 1331 | | 1332 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1333 | Hash Type Blocks (2 words) | 1334 | | 1335 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1336 | Cipher Type Block (2 words) | 1337 | | 1338 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1339 | Auth Tag Length Block (1 word) | 1340 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1341 | Key Agreement Type Block (2 words) | 1342 | | 1343 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1344 | SAS Type Block (2 words) | 1345 | | 1346 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1347 | | 1348 | hvi or nonce (8 words) | 1349 | . . . | 1350 | | 1351 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1352 | CRC (1 word) | 1353 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1355 Figure 4. Extension header format for Commit message 1357 6.5. DHPart1 message 1359 The DHPart1 message begins the DH exchange. The format is shown in 1360 Figure 5 below. The DHPart1 message is sent if a valid Commit 1361 message is received. The length of the pvr value depends on the Key 1362 Agreement Type chosen. If DH4096 is used, the pvr will be 128 words 1363 (512 octets). If DH3072 is used, it is 96 words (384 octets). 1365 The next five parameters are HMACs of potential shared secrets used 1366 in generating the ZRTP secret. The first two, rs1IDr and rs2IDr, are 1367 the HMACs of the responder's two retained shared secrets, truncated 1368 to 64 bits. Next is sigsIDr, the HMAC of the responder's signaling 1369 secret, truncated to 64 bits. Next is srtpsIDr, the HMAC of the 1370 responder's SRTP secret, truncated to 64 bits. The last parameter is 1371 the HMAC of an additional shared secret. For example, if multiple 1372 SRTP secrets are available or some other secret is used, it can be 1373 used as the other_secret. 1375 0 1 2 3 1376 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1377 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1378 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on KA Type | 1379 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1380 | Message Type Block="DHPart1 " (2 words) | 1381 | | 1382 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1383 | | 1384 | pvr (length depends on KA Type) | 1385 | . . . | 1386 | | 1387 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1388 | rs1IDr (2 words) | 1389 | | 1390 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1391 | rs2IDr (2 words) | 1392 | | 1393 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1394 | sigsIDr (2 words) | 1395 | | 1396 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1397 | srtpsIDr (2 words) | 1398 | | 1399 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1400 | other_secretIDr (2 words) | 1401 | | 1402 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1403 | CRC (1 word) | 1404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1406 Figure 5. Extension header format for DHPart1 message 1408 6.6. DHPart2 message 1410 The DHPart2 message completes the DH exchange. A DHPart2 message is 1411 sent if a valid DHPart1 message is received. The length of the pvi 1412 value depends on the Key Agreement Type chosen. If DH4096 is used, 1413 the pvr will be 128 words (512 octets). If DH3072 is used, it is 96 1414 words (384 octets). 1416 The next five parameters are HMACs of potential shared secrets used 1417 in generating the ZRTP secret. The first two, rs1IDi and rs2IDi, are 1418 the HMACs of the initiator's two retained shared secrets, truncated 1419 to 64 bits. Next is sigsIDi, the HMAC of the initiator's signaling 1420 secret, truncated to 64 bits. Next is srtpsIDi, the HMAC of the 1421 initiator's SRTP secret, truncated to 64 bits. The last parameter is 1422 the HMAC of an additional shared secret. For example, if multiple 1423 SRTP secrets are available or some other secret is used, it can be 1424 included. 1426 0 1 2 3 1427 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1428 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1429 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on KA Type | 1430 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1431 | Message Type Block="DHPart2 " (2 words) | 1432 | | 1433 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1434 | | 1435 | pvi (length depends on KA Type) | 1436 | . . . | 1437 | | 1438 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1439 | rs1IDi (2 words) | 1440 | | 1441 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1442 | rs2IDi (2 words) | 1443 | | 1444 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1445 | sigsIDi (2 words) | 1446 | | 1447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1448 | srtpsIDi (2 words) | 1449 | | 1450 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1451 | other_secretIDi (2 words) | 1452 | | 1453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1454 | CRC (1 word) | 1455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1457 Figure 6. Extension header format for DHPart2 message 1459 6.7. Confirm1 message 1461 The Confirm1 message is sent in response to a valid DHPart2 message 1462 after the SRTP session key and parameters have been negotiated. As a 1463 result, it is always sent in an SRTP packet. The format is shown in 1464 Figure 7 below. The header extension itself has no parameters 1465 besides the Message Type Block and the CRC. The first 52 octets in 1466 the SRTP payload are used by ZRTP to securely exchange a number of 1467 parameters. The plaintext parameter contains the known plaintext 1468 "known plaintext". The Disclosure Flag (D) is a Boolean bit defined 1469 in Appendix B. The Stay secure flag (S) is a Boolean bit defined in 1470 Section 5.6. The SAS Verified flag (V) is a Boolean bit defined in 1471 Section 8. 1473 The cache expiration interval is an unsigned 32 bit integer of the 1474 number of seconds that the newly generated cached shared secret, rs1, 1475 should be stored. The hmac is a hash over the known plaintext "known 1476 plaintext" and the flagoctet. 1478 The parameters included in the SRTP payload MUST NOT be allowed to 1479 pass to the RTP stack or errors may occur with the media stream. 1481 0 1 2 3 1482 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1483 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1484 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 1485 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1486 | Message Type Block="Confirm1" (2 words) | 1487 | | 1488 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1489 | CRC (1 word) | 1490 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1492 At the start of the SRTP payload: 1494 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1495 | | 1496 | | 1497 | "known plaintext" (15 octets) | 1498 | +-+-+-+-+-+-+-+-+ 1499 | |0 0 0 0 0|D|S|V| 1500 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1501 | cache expiration interval (1 word) | 1502 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1503 | | 1504 | hmac (8 words) | 1505 | . . . | 1506 | | 1507 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1509 Figure 7. Extension header format for Confirm1 message 1511 6.8. Confirm2 message 1513 The Confirm2 message is sent in response to a Confirm1 message after 1514 the SRTP session key and parameters have been negotiated. As a 1515 result, it is always sent in an SRTP packet. The format is shown in 1516 Figure 8 below. The header extension itself has no parameters 1517 besides the Message Type Block and the CRC. The first 52 octets in 1518 the SRTP payload are used by ZRTP to securely exchange a number of 1519 parameters. The plaintext parameter contains the known plaintext 1520 "known plaintext". The Disclosure Flag (D) is a Boolean bit defined 1521 in Appendix B. The Stay secure flag (S) is a Boolean bit defined in 1522 Section 5.6. The SAS Verified flag (V) is a Boolean bit defined in 1523 Section 8. 1525 The cache expiration interval is an unsigned 32 bit integer of the 1526 number of seconds that the newly generated cached shared secret, rs1, 1527 should be stored. The hmac is a hash over the known plaintext "known 1528 plaintext" and the flagoctet. 1530 The parameters included in the SRTP payload MUST NOT be allowed to 1531 pass to the RTP stack or errors may occur with the media stream. 1533 0 1 2 3 1534 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1535 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1536 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 1537 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1538 | Message Type Block="Confirm2" (2 words) | 1539 | | 1540 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1541 | CRC (1 word) | 1542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1544 At the start of the SRTP payload: 1546 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1547 | | 1548 | "known plaintext" (15 octets) | 1549 | +-+-+-+-+-+-+-+-+ 1550 | |0 0 0 0 0|D|S|V| 1551 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1552 | cache expiration interval (1 word) | 1553 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1554 | | 1555 | hmac (8 words) | 1556 | . . . | 1557 | | 1558 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1560 Figure 8. Extension header format for Confirm2 message 1562 6.9. Conf2ACK message 1564 The Conf2ACK message is sent in response to a valid Confirm2 message. 1565 The format is shown in Figure 9 below. The receipt of a Conf2ACK 1566 stops retransmission of the Confirm2 message. 1568 0 1 2 3 1569 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1570 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1571 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 1572 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1573 | Message Type Block="Conf2ACK" (2 words) | 1574 | | 1575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1576 | CRC (1 word) | 1577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1579 Figure 9. Extension header format for Conf2ACK message 1581 6.10. GoClear message 1583 The GoClear message is sent to switch from SRTP back to RTP or to 1584 terminate an in-process ZRTP key agreement exchange. The format is 1585 shown in Figure 11 below. The Reason String is a 16 character string 1586 which contains the reason for the switch to clear. If the GoClear is 1587 sent due to a user interface selection, the reason is "User Request". 1588 If the GoClear is sent due to a protocol error, the reason phrase is 1589 generated to describe the reason. The Reason String can be logged or 1590 rendered for human consumption. 1592 If the GoClear is sent to switch from SRTP back to RTP, the The 1593 clear_hmac is used to authenticate the GoClear message so that bogus 1594 GoClear messages introduced by an attacker can be detected and 1595 discarded. 1597 0 1 2 3 1598 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1599 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1600 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=15 words | 1601 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1602 | Message Type Block="GoClear " (2 words) | 1603 | | 1604 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1605 | | 1606 | Reason String (4 words) | 1607 | | 1608 | | 1609 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1610 | | 1611 | clear_hmac (8 words) | 1612 | . . . | 1613 | | 1614 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1615 | CRC (1 word) | 1616 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1618 Figure 11. Extension header format for GoClear message 1620 6.11. ClearACK message 1622 The ClearACK message is sent to acknowledge receipt of a GoClear. A 1623 ClearACK is only sent if the clear_hmac from the GoClear message is 1624 authenticated. Otherwise, no response is returned. The format is 1625 shown in Figure 12 below. 1627 0 1 2 3 1628 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1629 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1630 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 1631 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1632 | Message Type Block="ClearACK" (2 words) | 1633 | | 1634 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1635 | CRC (1 word) | 1636 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1638 Figure 12. Extension header format for ClearACK message 1640 7. Retransmissions 1642 ZRTP uses two retransmission timers T1 and T2. T1 is used for 1643 retransmission of Hello messages, when the support of ZRTP by the 1644 other endpoint may not be known. T2 is used in retransmissions of 1645 all the other ZRTP messages with the exception of GoClear. 1647 Practical experience has shown that RTP packet loss at the start of 1648 an RTP session can be extremely high. Since the entire ZRTP message 1649 exchange occurs during this period, the defined retransmission scheme 1650 is defined to be aggressive. Since ZRTP packets with the exception 1651 of the DHPart1 and DHPart2 messages are small, this should have 1652 minimal effect on overall bandwidth utilization of the media session. 1654 Hello ZRTP requests are retransmitted at an interval that starts at 1655 T1 seconds and doubles after every retransmission, capping at 200ms. 1656 A Hello message is retransmitted 20 times before giving up. T1 has a 1657 recommended value of 50 ms. Retransmission of a Hello ends upon 1658 receipt of a HelloACK or Commit message. 1660 Non-Hello ZRTP requests are retransmitted only by the initiator - 1661 that is, only Commit, DHPart2, and Confirm2 are retransmitted if the 1662 corresponding message from the responder, DHPart1, Confirm1, and 1663 Conf2ACK, are not received. Non-Hello ZRTP messages are 1664 retransmitted at an interval that starts at T2 seconds and doubles 1665 after every retransmission, capping at 600ms. Only the ZRTP 1666 initiator performs retransmissions. Each message is retransmitted 10 1667 times before giving up and resuming a normal RTP session. T2 has a 1668 default value of 150ms. Each message has a response message that 1669 stops retransmissions, as shown in Table 7. The high value of T2 1670 means that retransmissions will likely only occur with packet loss. 1672 A GoClear message is retransmitted at 500ms intervals until a 1673 ClearACK message is received. 1675 Message Acknowledgement Message 1676 ------- ----------------------- 1677 Hello HelloACK or Commit 1678 Commit DHPart1 or Confirm1 1679 DHPart2 Confirm1 1680 Confirm1 Confirm2 1681 Confirm2 Conf2ACK 1682 GoClear ClearACK 1684 Table 7. Retransmitted ZRTP Messages and Responses 1686 8. Short Authentication String 1688 This section will discuss the implementation of the Short 1689 Authentication String, or SAS in ZRTP. 1691 The Short Authentication String (SAS) value is calculated as the hash 1692 of both DH public values and the string "Short Authentication 1693 String". 1695 sasvalue = last 32 bits of hash(pvi | pvr | "Short Authentication 1696 String") 1698 The rendering of the SAS value depends on the SAS Type agreed upon in 1699 the Commit message. For the SAS Type of base32, the last 20 bits of 1700 the sasvalue are rendered as a form of base32 encoding known as 1701 libbase32 [10]. The purpose of base32 is to represent arbitrary 1702 sequences of octets in a form that is as convenient as possible for 1703 human users to manipulate. As a result, the choice of characters is 1704 slightly different from base32 as defined in RFC 3548. The last 20 1705 bits of the sasvalue results in four base32 characters which are 1706 rendered to both ZRTP endpoints. Other SAS Types may be defined to 1707 render the SAS value in other ways. 1709 The SAS SHOULD be rendered to the user. In addition, the SAS SHOULD 1710 be sent in a subsequent offer/answer exchange (a re-INVITE in SIP) 1711 after the completion of ZRTP exchange using the ZRTP SAS SDP 1712 attributes defined in Appendix A. 1714 The SAS Verified flag (V) is set based on the user indicating that 1715 SAS has been successfully performed. The SAS Verified flag is 1716 exchanged securely in the Confirm1 and Confirm2 messages of the next 1717 session. In other words, each party sends the SAS Verified flag from 1718 the previous session in the Confirm message of the current session. 1719 It is perfectly reasonable to have a ZRTP endpoint that never sets 1720 the SAS Verified flag, because it would require adding complexity to 1721 the user interface to allow the user to set it. The SAS Verified 1722 flag is not required to be set, but if it is available to the client 1723 software, it allows for the possibility that the client software 1724 could render to the user that the SAS verify procedure was carried 1725 out in a previous session. 1727 Regardless of whether there is a user interface element to allow the 1728 user to set the SAS Verified flag, it is worth caching a shared 1729 secret, because doing so reduces opportunities for an attacker in the 1730 next call. 1732 If at any time the users carry out the SAS procedure, and it actually 1733 fails to match, then this means there is a very resourceful man in 1734 the middle. If this is the first call, the MitM was there on the 1735 first call, which is impressive enough. If it happens in a later 1736 call, it also means the MitM must also know your cached shared 1737 secret, because you could not have carried out any voice traffic at 1738 all unless the session key was correctly computed and is also known 1739 to the attacker. This implies the MitM must have been present in all 1740 the previous sessions, since the initial establishment of the first 1741 shared secret. This is indeed a resourceful attacker. It also means 1742 that if at any time he ceases his participation as a MitM on one of 1743 your calls, the protocol will detect that the cached shared secret is 1744 no longer valid -- because it was really two different shared secrets 1745 all along, one of them between Alice and the attacker, and the other 1746 between the attacker and Bob. The continuity of the cached shared 1747 secrets make it possible for us to detect the MitM when he inserts 1748 himself into the ongoing relationship, as well as when he leaves. 1749 Also, if the attacker tries to stay with a long lineage of calls, but 1750 fails to execute a DH MitM attack for even one missed call, he is 1751 permanently excluded. He can no longer resynchronize with the chain 1752 of cached shared secrets. 1754 Some sort of user interface element (maybe a checkbox) is needed to 1755 allow the user to tell the software the SAS verify was successful, 1756 causing the software to set the SAS Verified flag (V), which 1757 (together with our cached shared secret) obviates the need to perform 1758 the SAS procedure in the next call. An additional user interface 1759 element can be provided to let the user tell the software he detected 1760 an actual SAS mismatch, which indicates a MitM attack. The software 1761 can then take appropriate action, clearing the SAS Verified flag, and 1762 erase the cached shared secret from this session. It is up to the 1763 implementer to decide if this added user interface complexity is 1764 warranted. 1766 If the SAS matches, it means there is no MitM, which also implies it 1767 is now safe to trust a cached shared secret for later calls. If 1768 inattentive users don't bother to check the SAS, it means we don't 1769 know whether there is or is not a MitM, so even if we do establish a 1770 new cached shared secret, there is a risk that our potential attacker 1771 may have a subsequent opportunity to continue inserting himself in 1772 the call, until we finally get around to checking the SAS. If the 1773 SAS matches, it means no attacker was present for any previous 1774 session since we started propagating cached shared secrets, because 1775 this session and all the previous sessions were also authenticated 1776 with a continuous lineage of shared secrets. 1778 9. IANA Considerations 1780 This specification defines three new SDP [11] attributes in Appendix 1781 A. The IANA registrations would be as follows: 1783 Contact name: Phil Zimmermann 1785 Attribute name: "zrtp". 1787 Type of attribute: Session level or Media level. 1789 Subject to charset: Not. 1791 Purpose of attribute: The 'zrtp' flag indicates that a UA supports the 1792 ZRTP protocol. 1794 Allowed attribute values: None. 1796 IANA would registered the ZRTP SAS SDP attribute: 1798 Contact name: Phil Zimmermann 1800 Attribute name: "zrtp-sas". 1802 Type of attribute: Media level. 1804 Subject to charset: Yes. 1806 Purpose of attribute: The 'zrtp-sas' is used to convey the ZRTP SAS 1807 string that would be rendered to the users. The 1808 the SAS is carried in the same format as it 1809 would be rendered. 1811 Allowed attribute values: String. 1813 IANA would registered the ZRTP SASvalue SDP attribute: 1815 Contact name: Phil Zimmermann 1817 Attribute name: "zrtp-sasvalue". 1819 Type of attribute: Media level. 1821 Subject to charset: Not. 1823 Purpose of attribute: The 'zrtp-sasvalue' is used to convey the SASvalue 1824 used for deriving the SAS string. The SAS value is 1825 encoded as hexadecimal. 1827 Allowed attribute values: Hex. 1829 10. Security Considerations 1831 This document is all about securely keying SRTP sessions. As such, 1832 security is discussed in every section. The next version of this 1833 draft will have a summary of those security properties discussed 1834 throughout the document. 1836 The ZRTP SDP attributes convey information through the signaling that 1837 is already available in clear text through the media channel. For 1838 example, the ZRTP flag is equivalent to sending a ZRTP Hello message. 1839 The SAS is calculated from the public Diffie-Hellman values exchanged 1840 in the DHPart1 and DHPart2 messages and a known string. As a result, 1841 none of the ZRTP SDP attributes require confidentiality from the 1842 signaling. 1844 The ZRTP SAS attributes can use the signaling channel as an out-of- 1845 band authentication mechanism. This authentication is only useful if 1846 the signaling channel has end-to-end integrity protection. Note that 1847 the SIP Identity header field [23] provides middle-to-end integrity 1848 protection across SDP message bodies which provides useful protection 1849 for ZRTP SAS attributes. 1851 11. Acknowledgments 1853 The authors would like to thank Bryce Wilcox-O'Hearn for his 1854 contributions to the design of this protocol, and to thank Jon 1855 Peterson, Colin Plumb, and Hal Finney for their helpful comments and 1856 suggestions. Also thanks to David McGrew, Roni Even, Viktor Krikun, 1857 Werner Dittmann, Allen Pulsifer, Klaus Peters, and Abhishek Arya for 1858 their feedback and comments. 1860 12. Appendix A - ZRTP, SIP, and SDP 1862 This section discusses how ZRTP, SIP, and SDP work together. 1864 Note that ZRTP may be implemented without coupling with the SIP 1865 signaling. For example, ZRTP can be implemented as a "bump in the 1866 wire" or as a "bump in the stack" in which RTP sent by the SIP UA is 1867 converted to ZRTP. In these cases, the SIP UA will have no knowledge 1868 of ZRTP. As a result, the signaling path discovery mechanisms 1869 introduced in this section should not be definitive - they are a 1870 hint. Despite the absence of an indication of ZRTP support in an 1871 offer or answer, a ZRTP endpoint SHOULD still send Hello messages. 1873 ZRTP endpoints which have control over the signaling path include a 1874 ZRTP SDP attributes in their SDP offers and answers. The ZRTP 1875 attribute, a=zrtp is a flag to indicate support for ZRTP. There are 1876 a number of potential uses for this attribute. It is useful when 1877 signaling elements would like to know when ZRTP may be utilized by 1878 endpoints. It is also useful if endpoints support multiple methods 1879 of SRTP key management. The ZRTP attribute can be used to ensure 1880 that these key management approaches work together instead of against 1881 each other. For example, if only one endpoint supports ZRTP but both 1882 support another method to key SRTP, then the other method will be 1883 used instead. When used in parallel, an SRTP secret carried in an 1884 a=keymgt [20] or a=crypto [19] attribute can be used as a shared 1885 secret for the srtp_secret. The ZRTP attribute is also used to 1886 signal to an intermediary ZRTP device not to act as a ZRTP endpoint, 1887 as discussed in Appendix C. 1889 The a=zrtp attribute can be included at a media level or at the 1890 session level. When used at the media level, it indicates that ZRTP 1891 is supported on this media stream. When used at the session level, 1892 it indicates that ZRTP is supported in all media streams in the 1893 session described by the offer or answer. 1895 In some scenarios, it is desirable for a signaling intermediary to be 1896 able to validate the SAS on behalf of the user. This could be due to 1897 an endpoint which has a user interface unable to render the SAS. Or, 1898 this could be a protection by an organization against lazy users who 1899 never check the SAS. Using either the ZRTP SAS or ZRTP SASvalue 1900 attribute, the SAS check can be performed without requiring the human 1901 users to speak the SAS. Note that this check can only be relied on 1902 if the signaling path has end-to-end integrity protection. 1904 The ZRTP SAS attribute a=zrtp-sas is a Media level SDP attribute that 1905 can be used to carry the SAS string which would be identical to that 1906 rendered to the user. The value passed depends on the negotiated SAS 1907 Type. Since the SAS is not known at the start of a session, the 1908 a=zrtp-sas attribute will never be present in the initial offer/ 1909 answer exchange. After the ZRTP exchange has completed, the SAS is 1910 known and can be exchanged over the signaling using a second offer/ 1911 answer exchange (a re-INVITE in SIP terms). Note that the SAS is not 1912 a secret and as such does not need confidentiality protection when 1913 sent over the signaling path. 1915 The ZRTP SASvalue attribute a=zrtp-sasvalue attribute can be used to 1916 send the 32 bit SAS value encoded as hex. Note that this value is 1917 not the same as that rendered to the user and is independent of the 1918 negotiated SAS type. Since the SAS is not known at the start of a 1919 session, the a=zrtp-sas attribute will never be present in the 1920 initial offer/answer exchange. After the ZRTP exchange has 1921 completed, the SAS is known and can be exchanged over the signaling 1922 using a second offer/answer exchange (a re-INVITE in SIP terms). 1924 The ABNF for the ZRTP attribute is as follows: 1926 zrtp-attribute = "a=zrtp" 1928 The ABNF for the ZRTP SAS attribute is as follows: 1930 zrtp-sas-attribute = "a=zrtp-sas:" sas-string 1932 sas-string = non-ws-string 1934 non-ws-string = 1*(VCHAR/%x80-FF) 1935 ;string of visible characters 1937 The ABNF for the ZRTP SASvalue attribute is as follows: 1939 zrtp-sasvalue-attribute = "a=zrtp-sasvalue:" sas-value 1941 sas-value = 1*(HEXDIG) 1943 Example of the ZRTP attribute in an initial SDP offer or answer used 1944 at the session level: 1946 v=0 1947 o=bob 2890844527 2890844527 IN IP4 client.biloxi.example.com 1948 s= 1949 c=IN IP4 client.biloxi.example.com 1950 a=zrtp 1951 t=0 0 1952 m=audio 3456 RTP/AVP 97 33 1953 a=rtpmap:97 iLBC/8000 1954 a=rtpmap:33 no-op/8000 1956 Example of the ZRTP SAS and SASvalue attribute in a subsequent SDP 1957 offer or answer used at the media level. Note that the a=zrtp 1958 attribute doesn't provide any additional information when used with 1959 the SAS and SASvalue attributes but does not do any harm: 1961 v=0 1962 o=bob 2890844527 2890844528 IN IP4 client.biloxi.example.com 1963 s= 1964 c=IN IP4 client.biloxi.example.com 1965 a=zrtp 1966 t=0 0 1967 m=audio 3456 RTP/AVP 97 33 1968 a=rtpmap:97 iLBC/8000 1969 a=rtpmap:33 no-op/8000 1970 a=zrtp-sas:opz 1971 a=ztrp-sasvalue:45e387ff 1973 Another example showing a second media stream being added to the 1974 session. A second DH exchange is performed (instead of using the 1975 Multistream mode) resulting in a second set of ZRTP SAS and SASvalue 1976 attributes. 1978 v=0 1979 o=bob 2890844527 2890844528 IN IP4 client.biloxi.example.com 1980 s= 1981 c=IN IP4 client.biloxi.example.com 1982 a=zrtp 1983 t=0 0 1984 m=audio 3456 RTP/AVP 97 33 1985 a=rtpmap:97 iLBC/8000 1986 a=rtpmap:33 no-op/8000 1987 a=zrtp-sas:opz 1988 a=ztrp-sasvalue:45e387ff 1989 m=video 51372 RTP/AVP 31 33 1990 a=rtpmap:31 H261/90000 1991 a=rtpmap:33 no-op/8000 1992 a=zrtp-sas:qvj 1993 a=ztrp-sasvalue:5e017f3a 1995 13. Appendix B - The ZRTP Disclosure flag 1997 There are no back doors defined in the ZRTP protocol specification. 1998 The designers of ZRTP would like to discourage back doors in ZRTP- 1999 enabled products. However, despite the lack of back doors in the 2000 actual ZRTP protocol, it must be recognized that a ZRTP implementer 2001 might still deliberately create a rogue ZRTP-enabled product that 2002 implements a back door outside the scope of the ZRTP protocol. For 2003 example, they could create a product that discloses the SRTP session 2004 key generated using ZRTP out-of-band to a third party. They may even 2005 have a legitimate business reason to do this for some customers. 2007 For example, some environments have a need to monitor or record 2008 calls, such as stock brokerage houses who want to discourage insider 2009 trading, or special high security environments with special needs to 2010 monitor their own phone calls. We've all experienced automated 2011 messages telling us that "This call may be monitored for quality 2012 assurance". A ZRTP endpoint in such an environment might 2013 unilaterally disclose the session key to someone monitoring the call. 2014 ZRTP-enabled products that perform such out-of-band disclosures of 2015 the session key can undermine public confidence in the ZRTP protocol, 2016 unless we do everything we can in the protocol to alert the other 2017 user that this is happening. 2019 If one of the parties is using a product that is designed to disclose 2020 their session key, ZRTP requires them to confess this fact to the 2021 other party through a protocol message to the other party's ZRTP 2022 client, which can properly alert that user, perhaps by rendering it 2023 in a GUI. The disclosing party does this by sending a Disclosure 2024 flag (D) in Confirm1 and Confirm2 messages as described in Sections 2025 6.7 and 6.8. 2027 Note that the intention here is to have the Disclosure flag identify 2028 products that are designed to disclose their session keys, not to 2029 identify which particular calls are compromised on a call-by-call 2030 basis. This is an important legal distinction, because most 2031 government sanctioned wiretap regulations require a VoIP service 2032 provider to not reveal which particular calls are wiretapped. But 2033 there is nothing illegal about revealing that a product is designed 2034 to be wiretap-friendly. The ZRTP protocol mandates that such a 2035 product "out" itself. 2037 You might be using a ZRTP-enabled product with no back doors, but if 2038 your own GUI tells you the call is (mostly) secure, except that the 2039 other party is using a product that is designed in such a way that it 2040 may have disclosed the session key for monitoring purposes, you might 2041 ask him what brand of secure telephone he is using, and make a mental 2042 note not to purchase that brand yourself. If we create a protocol 2043 environment that requires such back-doored phones to confess their 2044 nature, word will spread quickly, and the "unseen hand" of the free 2045 market will act. The free market has effectively dealt with this in 2046 the past. 2048 Of course, a ZRTP implementer can lie about his product having a back 2049 door, but the ZRTP standard mandates that ZRTP-compliant products 2050 MUST adhere to the requirement that a back door be confessed by 2051 sending the Disclosure flag to the other party. 2053 There will be inevitable comparisons to Steve Bellovin's 2003 April 2054 fool's joke, when he submitted RFC 3514 [22] which defined the "Evil 2055 bit" in the IPV4 header, for packets with "evil intent". But we 2056 submit that a similar idea can actually have some merit for securing 2057 VoIP. Sure, one can always imagine that some implementer will not be 2058 fazed by the rules and will lie, but they would have lied anyway even 2059 without the Disclosure flag. There are good reasons to believe that 2060 it will improve the overall percentage of implementations that at 2061 least tell us if they put a back door in their products, and may even 2062 get some of them to decide not to put in a back door at all. From a 2063 civic hygiene perspective, we are better off with having the 2064 Disclosure flag in the protocol. 2066 If an endpoint stores or logs SRTP keys or information that can be 2067 used to reconstruct or recover SRTP keys after they are no longer in 2068 use (i.e. the session is active), or otherwise discloses or passes 2069 SRTP keys or information that can be used to reconstruct or recover 2070 SRTP keys to another application or device, the Disclosure flag D 2071 MUST be set in the Confirm1 or Confirm2 message. 2073 14. Appendix C - Intermediary ZRTP Devices 2075 This section discusses the operation of a ZRTP endpoint which is 2076 actually an intermediary. For example, consider a device which 2077 proxies both signaling and media between endpoints. There are three 2078 possible ways in which such a device could support ZRTP. 2080 An intermediary device can act transparently to the ZRTP protocol. 2081 To do this, a device MUST pass RTP header extensions and payloads. 2082 This is the RECOMMENDED behavior for intermediaries as ZRTP and SRTP 2083 are best when done end-to-end. 2085 An intermediary device could implement the ZRTP protocol and act as a 2086 ZRTP endpoint on behalf of non-ZRTP endpoints behind the intermediary 2087 device. The intermediary could determine on a call-by-call basis 2088 whether the endpoint behind it supports ZRTP based on the presence or 2089 absence of the ZRTP SDP attribute flag (a=zrtp). For non-ZRTP 2090 endpoints, the intermediary device could act as the ZRTP endpoint 2091 using its own ZID and cache. This approach MUST only be used when 2092 there is some other security method protecting the confidentiality of 2093 the media between the intermediary and the inside endpoint, such as 2094 IPSec or physical security. 2096 The third mode, which is NOT RECOMMENDED, is for the intermediary 2097 device to attempt to back-to-back the ZRTP protocol. In this mode, 2098 the intermediary would attempt to act as a ZRTP endpoint towards both 2099 endpoints of the media session. This approach MUST NOT be used as it 2100 will always result in a detected Man-in-the-Middle attack and will 2101 generate alarms on both endpoints and likely result in the immediate 2102 termination of the session. It cannot be stated strongly enough that 2103 there are no usable back-to-back uses for the ZRTP protocol. 2105 It is possible that an intermediary device acting as a ZRTP endpoint 2106 might still receive ZRTP Hello and other messages from the inside 2107 endpoint. This could occur if there is another inline ZRTP device 2108 which does not include the ZRTP SDP attribute flag. If this occurs, 2109 the intermediary MUST NOT pass these ZRTP messages if it is acting as 2110 the ZRTP endpoint. 2112 15. References 2114 15.1. Normative References 2116 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 2117 Levels", BCP 14, RFC 2119, March 1997. 2119 [2] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, 2120 "RTP: A Transport Protocol for Real-Time Applications", STD 64, 2121 RFC 3550, July 2003. 2123 [3] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 2124 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 2125 RFC 3711, March 2004. 2127 [4] McGrew, D., "The use of AES-192 and AES-256 in Secure RTP", 2128 draft-mcgrew-srtp-big-aes-00 (work in progress), April 2006. 2130 [5] Kivinen, T. and M. Kojo, "More Modular Exponential (MODP) 2131 Diffie-Hellman groups for Internet Key Exchange (IKE)", 2132 RFC 3526, May 2003. 2134 [6] Stone, J., Stewart, R., and D. Otis, "Stream Control 2135 Transmission Protocol (SCTP) Checksum Change", RFC 3309, 2136 September 2002. 2138 [7] Andreasen, F., "A No-Op Payload Format for RTP", 2139 draft-wing-avt-rtp-noop-03 (work in progress), May 2005. 2141 [8] Ferguson, N. and B. Schneier, "Practical Cryptography", Wiley 2142 Publishing 2003. 2144 [9] Barker, E. and J. Kelsey, "Recommendation for Random Number 2145 Generation Using Deterministic Random Bit Generators", NIST 2146 Special Publication 800-90 DRAFT (December 2005). 2148 [10] Wilcox, B., "Human-oriented base-32 encoding", http:// 2149 cvs.sourceforge.net/viewcvs.py/libbase32/libbase32/ 2150 DESIGN?rev=HEAD . 2152 [11] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 2153 Description Protocol", RFC 4566, July 2006. 2155 15.2. Informative References 2157 [12] Audet, F. and D. Wing, "Evaluation of SRTP Keying with SIP", 2158 draft-wing-rtpsec-keying-eval-01 (work in progress), June 2006. 2160 [13] Zimmermann, P., "PGPfone", 2161 http://www.pgpi.org/products/pgpfone/ . 2163 [14] Zimmermann, P., "Zfone", http://www.philzimmermann.com/zfone . 2165 [15] Blossom, E., "The VP1 Protocol for Voice Privacy Devices 2166 Version 1.2", http://www.comsec.com/vp1-protocol.pdf . 2168 [16] "CryptoPhone", http://www.cryptophone.de/ . 2170 [17] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., 2171 Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: 2172 Session Initiation Protocol", RFC 3261, June 2002. 2174 [18] Ylonen, T. and C. Lonvick, "The Secure Shell (SSH) Protocol 2175 Architecture", RFC 4251, January 2006. 2177 [19] Andreasen, F., Baugher, M., and D. Wing, "Session Description 2178 Protocol (SDP) Security Descriptions for Media Streams", 2179 RFC 4568, July 2006. 2181 [20] Arkko, J., Lindholm, F., Naslund, M., Norrman, K., and E. 2182 Carrara, "Key Management Extensions for Session Description 2183 Protocol (SDP) and Real Time Streaming Protocol (RTSP)", 2184 RFC 4567, July 2006. 2186 [21] Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K. 2187 Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830, 2188 August 2004. 2190 [22] Bellovin, S., "The Security Flag in the IPv4 Header", RFC 3514, 2191 April 1 2003. 2193 [23] Peterson, J. and C. Jennings, "Enhancements for Authenticated 2194 Identity Management in the Session Initiation Protocol (SIP)", 2195 RFC 4474, August 2006. 2197 Authors' Addresses 2199 Philip Zimmermann 2200 Zfone Project 2202 Email: prz@mit.edu 2204 Alan Johnston (editor) 2205 Avaya 2206 St. Louis, MO 63124 2208 Email: alan@sipstation.com 2209 Jon Callas 2210 PGP Corporation 2212 Email: jon@pgp.com 2214 Full Copyright Statement 2216 Copyright (C) The Internet Society (2006). 2218 This document is subject to the rights, licenses and restrictions 2219 contained in BCP 78, and except as set forth therein, the authors 2220 retain all their rights. 2222 This document and the information contained herein are provided on an 2223 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2224 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 2225 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 2226 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 2227 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2228 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2230 Intellectual Property 2232 The IETF takes no position regarding the validity or scope of any 2233 Intellectual Property Rights or other rights that might be claimed to 2234 pertain to the implementation or use of the technology described in 2235 this document or the extent to which any license under such rights 2236 might or might not be available; nor does it represent that it has 2237 made any independent effort to identify any such rights. Information 2238 on the procedures with respect to rights in RFC documents can be 2239 found in BCP 78 and BCP 79. 2241 Copies of IPR disclosures made to the IETF Secretariat and any 2242 assurances of licenses to be made available, or the result of an 2243 attempt made to obtain a general license or permission for the use of 2244 such proprietary rights by implementers or users of this 2245 specification can be obtained from the IETF on-line IPR repository at 2246 http://www.ietf.org/ipr. 2248 The IETF invites any interested party to bring to its attention any 2249 copyrights, patents or patent applications, or other proprietary 2250 rights that may cover technology that may be required to implement 2251 this standard. Please address the information to the IETF at 2252 ietf-ipr@ietf.org. 2254 Acknowledgment 2256 Funding for the RFC Editor function is provided by the IETF 2257 Administrative Support Activity (IASA).