idnits 2.17.1 draft-zimmermann-avt-zrtp-15.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 4 instances of too long lines in the document, the longest one being 12 characters in excess of 72. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 4277 has weird spacing: '...nctions for ...' == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 4, 2009) is 5531 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC2104' is defined on line 4243, but no explicit reference was found in the text ** Obsolete normative reference: RFC 3309 (Obsoleted by RFC 4960) ** Obsolete normative reference: RFC 4753 (Obsoleted by RFC 5903) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 4474 (Obsoleted by RFC 8224) Summary: 5 errors (**), 0 flaws (~~), 5 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Zimmermann 3 Internet-Draft Zfone Project 4 Intended status: Informational A. Johnston, Ed. 5 Expires: September 5, 2009 Avaya 6 J. Callas 7 PGP Corporation 8 March 4, 2009 10 ZRTP: Media Path Key Agreement for Secure RTP 11 draft-zimmermann-avt-zrtp-15 13 Status of this Memo 15 This Internet-Draft is submitted to IETF in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on September 5, 2009. 36 Copyright Notice 38 Copyright (c) 2009 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents in effect on the date of 43 publication of this document (http://trustee.ietf.org/license-info). 44 Please review these documents carefully, as they describe your rights 45 and restrictions with respect to this document. 47 Abstract 49 This document defines ZRTP, a protocol for media path Diffie-Hellman 50 exchange to agree on a session key and parameters for establishing 51 Secure Real-time Transport Protocol (SRTP) sessions. The ZRTP 52 protocol is media path keying because it is multiplexed on the same 53 port as RTP and does not require support in the signaling protocol. 54 ZRTP does not assume a Public Key Infrastructure (PKI) or require the 55 complexity of certificates in end devices. For the media session, 56 ZRTP provides confidentiality, protection against man-in-the-middle 57 (MiTM) attacks, and, in cases where the signaling protocol provides 58 end-to-end integrity protection, authentication. ZRTP can utilize a 59 Session Description Protocol (SDP) attribute to provide discovery and 60 authentication through the signaling channel. To provide best effort 61 SRTP, ZRTP utilizes normal RTP/AVP profiles. 63 Table of Contents 65 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 66 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 67 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 68 3.1. Key Agreement Modes . . . . . . . . . . . . . . . . . . . 7 69 3.1.1. Diffie-Hellman Mode Overview . . . . . . . . . . . . . 7 70 3.1.2. Preshared Mode Overview . . . . . . . . . . . . . . . 9 71 3.1.3. Multistream Mode Overview . . . . . . . . . . . . . . 9 72 4. Protocol Description . . . . . . . . . . . . . . . . . . . . . 10 73 4.1. Discovery . . . . . . . . . . . . . . . . . . . . . . . . 10 74 4.1.1. Protocol Version Negotiation . . . . . . . . . . . . . 11 75 4.1.2. Algorithm Negotiation . . . . . . . . . . . . . . . . 13 76 4.2. Commit Contention . . . . . . . . . . . . . . . . . . . . 14 77 4.3. Matching Shared Secret Determination . . . . . . . . . . . 14 78 4.3.1. Calculation and comparison of hashes of shared 79 secrets . . . . . . . . . . . . . . . . . . . . . . . 16 80 4.3.2. Handling a Shared Secret Cache Mismatch . . . . . . . 17 81 4.4. DH and non-DH key agreements . . . . . . . . . . . . . . . 18 82 4.4.1. Diffie-Hellman Mode . . . . . . . . . . . . . . . . . 18 83 4.4.1.1. Hash Commitment in Diffie-Hellman Mode . . . . . . 19 84 4.4.1.2. Responder Behavior in Diffie-Hellman Mode . . . . 19 85 4.4.1.3. Initiator Behavior in Diffie-Hellman Mode . . . . 20 86 4.4.1.4. Shared Secret Calculation for DH Mode . . . . . . 20 87 4.4.2. Preshared Mode . . . . . . . . . . . . . . . . . . . . 22 88 4.4.2.1. Commitment in Preshared Mode . . . . . . . . . . . 23 89 4.4.2.2. Initiator Behavior in Preshared Mode . . . . . . . 23 90 4.4.2.3. Responder Behavior in Preshared Mode . . . . . . . 24 91 4.4.2.4. Shared Secret Calculation for Preshared Mode . . . 25 92 4.4.3. Multistream Mode . . . . . . . . . . . . . . . . . . . 26 93 4.4.3.1. Commitment in Multistream Mode . . . . . . . . . . 26 94 4.4.3.2. Shared Secret Calculation for Multistream Mode . . 27 95 4.5. Key Derivations . . . . . . . . . . . . . . . . . . . . . 28 96 4.5.1. The ZRTP Key Derivation Function . . . . . . . . . . . 28 97 4.5.2. Deriving ZRTPSess Key and SAS in DH or Preshared 98 modes . . . . . . . . . . . . . . . . . . . . . . . . 29 99 4.5.3. Deriving the rest of the keys from s0 . . . . . . . . 30 100 4.6. Confirmation . . . . . . . . . . . . . . . . . . . . . . . 32 101 4.6.1. Updating the Cache of Shared Secrets . . . . . . . . . 33 102 4.7. Termination . . . . . . . . . . . . . . . . . . . . . . . 33 103 4.7.1. Termination via Error message . . . . . . . . . . . . 34 104 4.7.2. Termination via GoClear message . . . . . . . . . . . 34 105 4.7.2.1. Key Destruction for GoClear message . . . . . . . 35 106 4.7.3. Key Destruction at Termination . . . . . . . . . . . . 36 107 4.8. Random Number Generation . . . . . . . . . . . . . . . . . 36 108 4.9. ZID and Cache Operation . . . . . . . . . . . . . . . . . 37 109 4.9.1. Cacheless implementations . . . . . . . . . . . . . . 38 110 5. ZRTP Messages . . . . . . . . . . . . . . . . . . . . . . . . 38 111 5.1. ZRTP Message Formats . . . . . . . . . . . . . . . . . . . 39 112 5.1.1. Message Type Block . . . . . . . . . . . . . . . . . . 40 113 5.1.2. Hash Type Block . . . . . . . . . . . . . . . . . . . 41 114 5.1.2.1. Implicit Hash and HMAC algorithm . . . . . . . . . 42 115 5.1.3. Cipher Type Block . . . . . . . . . . . . . . . . . . 42 116 5.1.4. Auth Tag Type Block . . . . . . . . . . . . . . . . . 43 117 5.1.5. Key Agreement Type Block . . . . . . . . . . . . . . . 43 118 5.1.6. SAS Type Block . . . . . . . . . . . . . . . . . . . . 45 119 5.1.7. Signature Type Block . . . . . . . . . . . . . . . . . 46 120 5.2. Hello message . . . . . . . . . . . . . . . . . . . . . . 46 121 5.3. HelloACK message . . . . . . . . . . . . . . . . . . . . . 47 122 5.4. Commit message . . . . . . . . . . . . . . . . . . . . . . 48 123 5.5. DHPart1 message . . . . . . . . . . . . . . . . . . . . . 51 124 5.6. DHPart2 message . . . . . . . . . . . . . . . . . . . . . 53 125 5.7. Confirm1 and Confirm2 messages . . . . . . . . . . . . . . 55 126 5.8. Conf2ACK message . . . . . . . . . . . . . . . . . . . . . 56 127 5.9. Error message . . . . . . . . . . . . . . . . . . . . . . 57 128 5.10. ErrorACK message . . . . . . . . . . . . . . . . . . . . . 58 129 5.11. GoClear message . . . . . . . . . . . . . . . . . . . . . 59 130 5.12. ClearACK message . . . . . . . . . . . . . . . . . . . . . 59 131 5.13. SASrelay message . . . . . . . . . . . . . . . . . . . . . 60 132 5.14. RelayACK message . . . . . . . . . . . . . . . . . . . . . 62 133 5.15. Ping message . . . . . . . . . . . . . . . . . . . . . . . 63 134 5.16. PingACK message . . . . . . . . . . . . . . . . . . . . . 64 135 6. Retransmissions . . . . . . . . . . . . . . . . . . . . . . . 65 136 7. Short Authentication String . . . . . . . . . . . . . . . . . 67 137 7.1. SAS Verified Flag . . . . . . . . . . . . . . . . . . . . 68 138 7.2. Signing the SAS . . . . . . . . . . . . . . . . . . . . . 69 139 7.3. Relaying the SAS through a PBX . . . . . . . . . . . . . . 70 140 7.3.1. PBX Enrollment and the PBX Enrollment Flag . . . . . . 72 141 8. Signaling Interactions . . . . . . . . . . . . . . . . . . . . 73 142 8.1. Binding the media stream to the signaling layer via 143 the Hello Hash . . . . . . . . . . . . . . . . . . . . . . 74 144 8.1.1. Integrity-protected signaling enables 145 integrity-protected DH exchange . . . . . . . . . . . 76 146 8.2. Deriving the SRTP secret (srtps) from the signaling 147 layer . . . . . . . . . . . . . . . . . . . . . . . . . . 78 148 8.3. Codec Selection for Secure Media . . . . . . . . . . . . . 78 149 9. False ZRTP Packet Rejection . . . . . . . . . . . . . . . . . 79 150 10. Intermediary ZRTP Devices . . . . . . . . . . . . . . . . . . 80 151 11. The ZRTP Disclosure flag . . . . . . . . . . . . . . . . . . . 82 152 11.1. Guidelines on Proper Implementation of the Disclosure 153 Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 154 12. RTP Header Extension Flag for ZRTP . . . . . . . . . . . . . . 84 155 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 85 156 14. Appendix - Media Security Requirements . . . . . . . . . . . . 85 157 15. Security Considerations . . . . . . . . . . . . . . . . . . . 87 158 15.1. Self-healing Key Continuity Feature . . . . . . . . . . . 91 159 16. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 92 160 17. References . . . . . . . . . . . . . . . . . . . . . . . . . . 92 161 17.1. Normative References . . . . . . . . . . . . . . . . . . . 92 162 17.2. Informative References . . . . . . . . . . . . . . . . . . 94 163 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 97 165 1. Introduction 167 ZRTP is a key agreement protocol which performs Diffie-Hellman key 168 exchange during call setup in the media path, and is transported over 169 the same port as the Real-time Transport Protocol (RTP) [RFC3550] 170 media stream which has been established using a signaling protocol 171 such as Session Initiation Protocol (SIP) [RFC3261]. This generates 172 a shared secret which is then used to generate keys and salt for a 173 Secure RTP (SRTP) [RFC3711] session. ZRTP borrows ideas from PGPfone 174 [pgpfone]. A reference implementation of ZRTP is available as Zfone 175 [zfone]. 177 The ZRTP protocol has some nice cryptographic features lacking in 178 many other approaches to media session encryption. Although it uses 179 a public key algorithm, it does not rely on a public key 180 infrastructure (PKI). In fact, it does not use persistent public 181 keys at all. It uses ephemeral Diffie-Hellman (DH) with hash 182 commitment, and allows the detection of man-in-the-middle (MiTM) 183 attacks by displaying a short authentication string (SAS) for the 184 users to read and verbally compare over the phone. It has Perfect 185 Forward Secrecy, meaning the keys are destroyed at the end of the 186 call, which precludes retroactively compromising the call by future 187 disclosures of key material. But even if the users are too lazy to 188 bother with short authentication strings, we still get reasonable 189 authentication against a MiTM attack, based on a form of key 190 continuity. It does this by caching some key material to use in the 191 next call, to be mixed in with the next call's DH shared secret, 192 giving it key continuity properties analogous to SSH. All this is 193 done without reliance on a PKI, key certification, trust models, 194 certificate authorities, or key management complexity that bedevils 195 the email encryption world. It also does not rely on SIP signaling 196 for the key management, and in fact does not rely on any servers at 197 all. It performs its key agreements and key management in a purely 198 peer-to-peer manner over the RTP packet stream. 200 In cases where the short authentication string (SAS) cannot be 201 verbally compared by two human users, the SAS can be authenticated by 202 exchanging an optional signature over the SAS (described in 203 Section 7.2). 205 ZRTP can be used and discovered without being declared or indicated 206 in the signaling path. This provides a best effort SRTP capability. 207 Also, this reduces the complexity of implementations and minimizes 208 interdependency between the signaling and media layers. However, 209 when ZRTP is indicated in the signaling via the zrtp-hash SDP 210 attribute, ZRTP has additional useful properties. By sending a hash 211 of the ZRTP Hello message in the signaling, ZRTP provides a useful 212 binding between the signaling and media paths, which is explained in 213 Section 8.1. When this is done through a signaling path that has 214 end-to-end integrity protection, the DH exchange is automatically 215 protected from a MiTM attack, which is explained in Section 8.1.1. 217 2. Terminology 219 In this document, the key words "MUST", "MUST NOT", "REQUIRED", 220 "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", 221 and "OPTIONAL" are to be interpreted as described in RFC 2119 and 222 indicate requirement levels for compliant implementations [RFC2119]. 224 3. Overview 226 This section provides a description of how ZRTP works. This 227 description is non-normative in nature but is included to build 228 understanding of the protocol. 230 ZRTP is negotiated the same way a conventional RTP session is 231 negotiated in an offer/answer exchange using the standard AVP/RTP 232 profile. The ZRTP protocol begins after two endpoints have utilized 233 a signaling protocol such as SIP and are ready to exchange media. If 234 ICE [I-D.ietf-mmusic-ice] is being used, ZRTP begins after ICE has 235 completed its connectivity checks. 237 ZRTP is multiplexed on the same ports as RTP. It uses a unique 238 header that makes it clearly differentiable from RTP or STUN. 240 In environments in which sending ZRTP packets to non-ZRTP endpoints 241 might cause problems and signaling path discovery is not an option, 242 ZRTP endpoints can include the RTP header extension flag for ZRTP in 243 normal RTP packets sent at the start of a session as a probe to 244 discover if the other endpoint supports ZRTP. If the flag is 245 received from the other endpoint, ZRTP messages can then be 246 exchanged. 248 A ZRTP endpoint initiates the exchange by sending a ZRTP Hello 249 message to the other endpoint. The purpose of the Hello message is 250 to confirm the endpoint supports the protocol and to see what 251 algorithms the two ZRTP endpoints have in common. 253 The Hello message contains the SRTP configuration options, and the 254 ZID. Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID 255 that is generated once at installation time. ZIDs are discovered 256 during the Hello message exchange. The received ZID is used to look 257 up retained shared secrets from previous ZRTP sessions with the 258 endpoint. 260 A response to a ZRTP Hello message is a ZRTP HelloACK message. The 261 HelloACK message simply acknowledges receipt of the Hello. Since RTP 262 commonly uses best effort UDP transport, ZRTP has retransmission 263 timers in case of lost datagrams. There are two timers, both with 264 exponential backoff mechanisms. One timer is used for 265 retransmissions of Hello messages and the other is used for 266 retransmissions of all other messages after receipt of a HelloACK. 268 If an integrity protected signaling channel is available, a hash of 269 the Hello message can be sent. This allows rejection of false 270 injected ZRTP Hello messages by an attacker. 272 Hello and other ZRTP messages also contain a hash image that is used 273 to link the messages together. This allows rejection of false 274 injected ZRTP messages during an exchange. 276 3.1. Key Agreement Modes 278 After both endpoints exchange Hello and HelloACK messages, the key 279 agreement exchange can begin with the ZRTP Commit message. ZRTP 280 supports a number of key agreement modes including both Diffie- 281 Hellman and non-Diffie-Hellman modes as described in the following 282 sections. 284 The Commit message may be sent immediately after both endpoints have 285 completed the Hello/HelloAck discovery handshake. Or it may be 286 deferred until later in the call, after the participants engage in 287 some unencrypted conversation. The Commit message may be manually 288 activated by a user interface element, such as a GO SECURE button, 289 which becomes enabled after the Hello/HelloAck discovery phase. This 290 emulates the user experience of a number of secure phones in the PSTN 291 world [comsec]. However, it is expected that most simple ZRTP user 292 agents will omit such buttons and proceed directly to secure mode by 293 sending a Commit message immediately after the Hello/HelloAck 294 handshake. 296 3.1.1. Diffie-Hellman Mode Overview 298 An example ZRTP call flow is shown in Figure 1 below. Note that the 299 order of the Hello/HelloACK exchanges in F1/F2 and F3/F4 may be 300 reversed. That is, either Alice or Bob might send the first Hello 301 message. Note that the endpoint which sends the Commit message is 302 considered the initiator of the ZRTP session and drives the key 303 agreement exchange. The Diffie-Hellman public values are exchanged 304 in the DHPart1 and DHPart2 messages. SRTP keys and salts are then 305 calculated. 307 Alice Bob 308 | | 309 | Alice and Bob establish a media session. | 310 | They initiate ZRTP on media ports | 311 | | 312 | F1 Hello (version, options, Alice's ZID) | 313 |-------------------------------------------------->| 314 | HelloACK F2 | 315 |<--------------------------------------------------| 316 | Hello (version, options, Bob's ZID) F3 | 317 |<--------------------------------------------------| 318 | F4 HelloACK | 319 |-------------------------------------------------->| 320 | | 321 | Bob acts as the initiator | 322 | | 323 | Commit (Bob's ZID, options, hvi) F5 | 324 |<--------------------------------------------------| 325 | F6 DHPart1 (pvr, shared secret hashes) | 326 |-------------------------------------------------->| 327 | DHPart2 (pvi, shared secret hashes) F7 | 328 |<--------------------------------------------------| 329 | | 330 | Alice and Bob generate SRTP session key. | 331 | | 332 | F8 Confirm1 (HMAC, D,A,V,E flags, sig) | 333 |-------------------------------------------------->| 334 | Confirm2 (HMAC, D,A,V,E flags, sig) F9 | 335 |<--------------------------------------------------| 336 | F10 Conf2ACK | 337 |-------------------------------------------------->| 338 | SRTP begins | 339 |<=================================================>| 340 | | 342 Figure 1: Establishment of an SRTP session using ZRTP 344 ZRTP authentication uses a Short Authentication String (SAS) which is 345 ideally displayed for the human user. Alternatively, the SAS can be 346 authenticated by exchanging an OPTIONAL digital signature (sig) over 347 the short authentication string in the Confirm1 or Confirm2 messages 348 (described in Section 7.2). 350 The ZRTP Confirm1 and Confirm2 messages are sent for a number of 351 reasons, not the least of which is they confirm that all the key 352 agreement calculations were successful and thus the encryption will 353 work. They also carry other information such as the Disclosure flag 354 (D), the Allow Clear flag (A), the SAS Verified flag (V), and the PBX 355 Enrollment flag (E). All flags are encrypted to shield them from a 356 passive observer. 358 3.1.2. Preshared Mode Overview 360 In the Preshared Mode, endpoints can skip the DH calculation if they 361 have a shared secret from a previous ZRTP session. Preshared mode is 362 indicated in the Commit message and results in the same call flow as 363 Multistream mode. The principal difference between Multistream mode 364 and Preshared mode is that Preshared mode uses a previously cached 365 shared secret, rs1, instead of an active ZRTP Session key as the 366 initial keying material. 368 This mode could be useful for slow processor endpoints so that a DH 369 calculation does not need to be performed every session. Or, this 370 mode could be used to rapidly re-establish an earlier session that 371 was recently torn down or interrupted without the need to perform 372 another DH calculation. 374 Preshared mode has forward secrecy properties. If a phone's cache is 375 captured by an opponent, the cached shared secrets cannot be used to 376 recover earlier encrypted calls, because the shared secrets are 377 replaced with new ones in each new call, as in DH mode. However, the 378 captured secrets can be used by a passive wiretapper in the media 379 path to decrypt the next call, if the next call is in Preshared mode. 380 This differs from DH mode, which requires an active MiTM wiretapper 381 to exploit captured secrets in the next call. However, if the next 382 call is missed by the wiretapper, he cannot wiretap any further 383 calls. It thus preserves most of the self-healing properties 384 (Section 15.1) of key continuity enjoyed by DH mode. 386 3.1.3. Multistream Mode Overview 388 Multistream mode is an alternative key agreement method when two 389 endpoints have an established SRTP media stream between them and 390 hence an active ZRTP Session key. ZRTP can derive multiple SRTP keys 391 from a single DH exchange. For example, an established secure voice 392 call that adds a video stream must use Multistream mode to quickly 393 initiate the video stream without a second DH exchange. 395 When Multistream mode is indicated in the Commit message, a call flow 396 similar to Figure 1 is used, but no DH calculation is performed by 397 either endpoint and the DHPart1 and DHPart2 messages are omitted. 398 The Confirm1, Confirm2, and Conf2ACK messages are still sent. Since 399 the cache is not affected during this mode, multiple Multistream ZRTP 400 exchanges can be performed in parallel between two endpoints. 402 When adding additional media streams to an existing call, only 403 Multistream mode is used. Only one DH operation is performed, just 404 for the first media stream. 406 4. Protocol Description 408 This section begins the normative description of the protocol. 410 ZRTP MUST be multiplexed on the same ports as the RTP media packets. 412 To support best effort encryption from the Media Security 413 Requirements [I-D.ietf-sip-media-security-requirements], ZRTP uses 414 normal RTP/AVP profile (AVP) media lines in the initial offer/answer 415 exchange. The ZRTP SDP attribute a=zrtp-hash defined in Section 8 416 SHOULD be used in all offers and answers to indicate support for the 417 ZRTP protocol. The Secure RTP/AVP (SAVP) profile MAY be used in 418 subsequent offer/answer exchanges after a successful ZRTP exchange 419 has resulted in an SRTP session, or if it is known the other endpoint 420 supports this profile. 422 The use of the RTP/SAVP profile has caused failures in negotiating 423 best effort SRTP due to the limitations on negotiating profiles 424 using SDP. This is why ZRTP supports the RTP/AVP profile and 425 includes its own discovery mechanisms. 427 In all key agreement modes, the initiator SHOULD NOT send RTP media 428 after sending the Commit message, and MUST NOT send SRTP media before 429 receiving either the Conf2ACK or the first SRTP media (with a valid 430 SRTP auth tag) from the responder. The responder SHOULD NOT send RTP 431 media after receiving the Commit message, and MUST NOT send SRTP 432 media before receiving the Confirm2 message. 434 4.1. Discovery 436 During the ZRTP discovery phase, a ZRTP endpoint discovers if the 437 other endpoint supports ZRTP and the supported algorithms and 438 options. This information is transported in a Hello message, 439 described in Section 5.2. 441 ZRTP endpoints SHOULD include the SDP attribute a=zrtp-hash in offers 442 and answers, as defined in Section 8. ZRTP MAY use an RTP [RFC3550] 443 extension field as a flag to indicate support for the ZRTP protocol 444 in RTP packets as described in Section 12. 446 The Hello message includes the ZRTP version, hash type, cipher type, 447 authentication method and tag length, key agreement type, and Short 448 Authentication String (SAS) algorithms that are supported. The Hello 449 message also includes a hash image as described in Section 9. In 450 addition, each endpoint sends and discovers ZIDs. The received ZID 451 is used later in the protocol as an index into a cache of shared 452 secrets that were previously negotiated and retained between the two 453 parties. 455 A Hello message can be sent at any time, but is usually sent at the 456 start of an RTP session to determine if the other endpoint supports 457 ZRTP, and also if the SRTP implementations are compatible. A Hello 458 message is retransmitted using timer T1 and an exponential backoff 459 mechanism detailed in Section 6 until the receipt of a HelloACK 460 message or a Commit message. 462 The use of the a=zrtp-hash SDP attribute to authenticate the Hello 463 message is described in Section 8.1. 465 4.1.1. Protocol Version Negotiation 467 This specification defines ZRTP version 1.10. Since new versions of 468 ZRTP may be developed in the future, this specification defines a 469 protocol version negotiation in this section. 471 Each party declares what version of the ZRTP protocol they support 472 via the version field in the Hello message (Section 5.2). If both 473 parties have the same version number in their Hello messages, they 474 can proceed with the rest of the protocol. To facilitate both 475 parties reaching this state of protocol version agreement in their 476 Hello messages, ZRTP should use information provided in the signaling 477 layer, if available. If a ZRTP endpoint supports more than one 478 version of the protocol, it SHOULD declare them all in a list of SIP 479 SDP a=zrtp-hash attributes (defined in Section 8), listing separate 480 hashes, with separate ZRTP version numbers in each item in the list. 482 Both parties should inspect the list of ZRTP version numbers supplied 483 by the other party in the SIP SDP a=zrtp-hash attributes. Both 484 parties should choose the highest version number that appear in both 485 parties' list of a=zrtp-hash version numbers, and use that version 486 for their Hello messages. If both parties use the SIP signaling in 487 this manner, their initial Hello messages will have the same ZRTP 488 version number, provided they both have at least one supported 489 protocol version in common. Before the ZRTP key agreement can 490 proceed, an endpoint MUST have sent and received Hellos with the same 491 protocol version. 493 It is best if the signaling layer is used to negotiate the protocol 494 version number. However, the a=zrtp-hash SDP attribute is not always 495 present in the SIP packet, as explained in Section 8.1. In the 496 absence of any guidance from the signaling layer, an endpoint MUST 497 send the highest supported version in initial Hello messages. If the 498 two parties send different protocol version numbers in their Hello 499 messages, they can reach agreement to use a common version, if one 500 exists. They iteratively apply the following rules until they both 501 have matching version fields in their Hello messages and the key 502 agreement can proceed: 504 o If an endpoint receives a Hello message with an unsupported 505 version number that is higher than the endpoint's current Hello 506 message version, the received Hello message MUST be ignored. The 507 endpoint continues to retransmit Hello messages on the standard 508 retry schedule (Section 6). 509 o If an endpoint receives a Hello message with a version number that 510 is lower than the endpoint's current Hello message, and the 511 endpoint supports a version that is less than or equal to the 512 received version number, the endpoint MUST stop retransmitting the 513 old version number and MUST start sending a new Hello message with 514 the highest supported version number that is less than or equal to 515 the received version number. 516 o If an endpoint receives a Hello message with an unsupported 517 version number that is lower than the endpoint's current Hello 518 message, the endpoint MUST send an Error message (Section 5.9) 519 indicating failure to support this ZRTP version. 521 The above comparisons are iterated until the version numbers match, 522 or until it exits on a failure to match. 524 For example, assume that Alice supports protocol version 1.10 and 525 2.00, and Bob supports version 1.10 and 1.20. Alice initially 526 sends a Hello with version 2.00, and Bob initially sends a Hello 527 with version 1.20. Bob ignores Alice's 2.00 Hello and continues 528 to send his 1.20 Hello. Alice detects that Bob does not support 529 2.00 and she stops sending her 2.00 Hellos and starts sending a 530 stream of 1.10 Hellos. Bob sees the 1.10 Hello from Alice and 531 stops sending his 1.20 Hellos and switches to sending 1.10 Hellos. 532 At that point, they have converged on using version 1.10 and the 533 protocol proceeds on that basis. 535 When comparing protocol versions, a ZRTP endpoint MUST include only 536 the first three octets of the version field in the comparison. The 537 final octet is ignored, because it is not significant for 538 interoperability. For example, "1.1 ", "1.10", "1.11", or "1.1a" are 539 all regarded as a version match, because they would all be 540 interoperable versions. 542 Changes in protocol version numbers are expected be infrequent after 543 version 1.10. Supporting multiple versions adds code complexity and 544 may introduce security weaknesses in the implementation. The old 545 adage about keeping it simple applies especially to implementing 546 security protocols. Endpoints SHOULD NOT support protocol versions 547 earlier than version 1.10. 549 4.1.2. Algorithm Negotiation 551 A method is provided to allow the two parties to mutually and 552 deterministically choose the same DH key size and algorithm before a 553 Commit message is sent. 555 Each Hello message lists the algorithms in the order of preference 556 for that ZRTP endpoint. Endpoints eliminate the non-intersecting 557 choices from each of their own lists, resulting in each endpoint 558 having a list of algorithms in common that might or might not be 559 ordered the same as the other endpoint's list. Each endpoint 560 compares the first item on their own list with the first item on the 561 other endpoint's list, and SHOULD choose the faster of the two 562 algorithms. For example: 564 o Alice's full list: DH2K, DH3K, EC25 565 o Bob's full list: EC38, EC25, DH3K 566 o Alice's intersecting list: DH3K, EC25 567 o Bob's intersecting list: EC25, DH3K 568 o Alice's first preference is DH3K, and Bob's first preference is 569 EC25. 570 o Thus, both parties choose EC25 (ECDH-256), because it's faster. 572 To decide which DH algorithm is faster, the following ranking is 573 defined: DH2K, EC25, DH3K, EC38, EC52. These are all drawn from 574 Table 5. 576 If both endpoints follow this method, they may each start their DH 577 calculations as soon as they receive the Hello message, and there 578 will be no need for either endpoint to discard their DH calculation 579 if the other endpoint becomes the initiator. 581 This method is used only to negotiate DH key size. For the rest of 582 the algorithm choices, it's simply whatever the initiator selects 583 from the algorithms in common. Note that the DH key size influences 584 the size of the symmetric cipher key, as explained in Section 5.1.5. 586 Unfavorable choices will never be made by this method, because each 587 endpoint will omit from their respective lists choices that are too 588 slow or not secure enough to meet their security policy. 590 4.2. Commit Contention 592 After both parties have received compatible Hello messages, a Commit 593 message (Section 5.4) can be sent to begin the ZRTP key exchange. 594 The endpoint that sends the Commit is known as the initiator, while 595 the receiver of the Commit is known as the responder. 597 If both sides send Commit messages initiating a secure session at the 598 same time the following rules are used to break the tie: 600 o If one Commit is for a DH mode while the other is for Preshared 601 mode, then the Preshared Commit MUST be discarded and the DH 602 Commit proceeds. 603 o If the two Commits are both Preshared mode, and one party has set 604 the MiTM (M) flag in the Hello message and the other has not, the 605 Commit message from the party who set the (M) flag MUST be 606 discarded, and the one who has not set the (M) flag becomes the 607 initiator, regardless of the nonce values. In other words, for 608 Preshared mode, the phone is the initiator and the PBX is the 609 responder. 610 o If the two Commits are either both DH modes or both non-DH modes, 611 then the Commit message with the lowest hvi value (for DH 612 Commits), or lowest nonce value (for non-DH Commits), MUST be 613 discarded and the other side is the initiator, and the protocol 614 proceeds with the initiator's Commit. The two hvi or nonce values 615 are compared as large unsigned integers in network byte order. 617 If one Commit is for Multistream mode while the other is for non- 618 Multistream (DH or Preshared) mode, a software error has occurred and 619 the ZRTP negotiation should be terminated. This should never occur 620 because of the constraints on Multistream mode described in 621 Section 4.4.3. 623 In the event that Commit messages are sent by both ZRTP endpoints at 624 the same time, but are received in different media streams, the same 625 resolution rules apply as if they were received on the same stream. 626 The media stream in which the Commit will proceed through the ZRTP 627 exchange while the media stream with the discarded Commit must wait 628 for the completion of the other ZRTP exchange. 630 If a commit contention forces a DH Commit message to be discarded, 631 the responder's DH public value should only be discarded if it does 632 not match the initiator's DH key size. 634 4.3. Matching Shared Secret Determination 636 The following sections describe how ZRTP endpoints generate and/or 637 use the set of shared secrets s1, auxsecret, and pbxsecret through 638 the exchange of the DHPart1 and DHPart2 messages. This doesn't cover 639 the Diffie-Hellman calculations. It only covers the method whereby 640 the two parties determine if they already have shared secrets in 641 common in their caches. 643 Each ZRTP endpoint maintains a long-term cache of shared secrets that 644 it has previously negotiated with the other party. The ZID of the 645 other party, received in the other party's Hello message, is used as 646 an index into this cache to find the set of shared secrets, if any 647 exist. This cache entry may contain previously retained shared 648 secrets, rs1 and rs2, which give ZRTP its key continuity features. 649 If the other party is a PBX, the cache may also contain a trusted 650 MiTM PBX shared secret, called pbxsecret, defined in Section 7.3.1. 652 The DHPart1 and DHPart2 messages contain a list of hashes of these 653 shared secrets to allow the two endpoints to compare the hashes with 654 what they have in their caches to detect whether the two sides share 655 any secrets that can be used in the calculation of the session key. 656 The use of this shared secret cache is described in Section 4.9. 658 If no secret of a given type is available, a random value is 659 generated and used for that secret to ensure a mismatch in the hash 660 comparisons in the DHPart1 and DHPart2 messages. This prevents an 661 eavesdropper from knowing which types of shared secrets are available 662 between the endpoints. 664 Section 4.3.1 refers to the auxiliary shared secret auxsecret. The 665 auxsecret shared secret may be defined by the VoIP user agent out-of- 666 band from the ZRTP protocol. In some cases it may be provided by the 667 signaling layer as srtps, which is defined in Section 8.2. If it is 668 not provided by the signaling layer, the auxsecret shared secret may 669 be manually provisioned in other application-specific ways that are 670 out-of-band, such as computed from a hashed pass phrase by prior 671 agreement between the two parties. Or it may be a family key used by 672 an institution that the two parties both belong to. It is a 673 generalized mechanism for providing a shared secret that is agreed to 674 between the two parties out of scope of the ZRTP protocol. It is 675 expected that most typical ZRTP endpoints will rarely use auxsecret. 677 For both the initiator and the responder, the shared secrets s1, s2, 678 and s3 will be calculated so that they can all be used later to 679 calculate s0 in Section 4.4.1.4. Here is how s1, s2, and s3 are 680 calculated by both parties: 682 The shared secret s1 will be either the initiator's rs1 or the 683 initiator's rs2, depending on which of them can be found in the 684 responder's cache. If the initiator's rs1 matches the responder's 685 rs1 or rs2, then s1 MUST be set to the initiator's rs1. If and only 686 if that match fails, then if the initiator's rs2 matches the 687 responder's rs1 or rs2, then s1 MUST be set to the initiator's rs2. 688 If that match also fails, then s1 MUST be set to null. The 689 complexity of the s1 calculation is to recover from any loss of cache 690 sync from an earlier aborted session, due to the Byzantine Generals' 691 Problem [Byzantine]. 693 The shared secret s2 MUST be set to the value of auxsecret if and 694 only if both parties have matching values for auxsecret, as 695 determined by comparing the hashes of auxsecret sent in the DH 696 messages. If they don't match, s2 MUST be set to null. 698 The shared secret s3 MUST be set to the value of pbxsecret if and 699 only if both parties have matching values for pbxsecret, as 700 determined by comparing the hashes of pbxsecret sent in the DH 701 messages. If they don't match, s3 MUST be set to null. 703 If s1, s2, or s3 have null values, they are assumed to have a zero 704 length for the purposes of hashing them later during the s0 705 calculation in Section 4.4.1.4. 707 The comparison of hashes of rs1, rs2, auxsecret, and pbxsecret is 708 described below in Section 4.3.1. 710 4.3.1. Calculation and comparison of hashes of shared secrets 712 Both parties calculate a set of keyed hashes (HMACs) of shared 713 secrets that may be present in each of their caches. These hashes 714 are truncated to the leftmost 64 bits: 716 rs1IDr = HMAC(rs1, "Responder") 717 rs2IDr = HMAC(rs2, "Responder") 718 auxsecretIDr = HMAC(auxsecret, "Responder") 719 pbxsecretIDr = HMAC(pbxsecret, "Responder") 720 rs1IDi = HMAC(rs1, "Initiator") 721 rs2IDi = HMAC(rs2, "Initiator") 722 auxsecretIDi = HMAC(auxsecret, "Initiator") 723 pbxsecretIDi = HMAC(pbxsecret, "Initiator") 725 The responder sends rs1IDr, rs2IDr, auxsecretIDr, and pbxsecretIDr in 726 the DHPart1 message. The initiator sends rs1IDi, rs2IDi, 727 auxsecretIDi, and pbxsecretIDi in the DHPart2 message. 729 The responder uses the locally computed rs1IDi, rs2IDi, auxsecretIDi, 730 and pbxsecretIDi to compare against the corresponding fields in the 731 received DHPart2 message. The initiator uses the locally computed 732 rs1IDr, rs2IDr, auxsecretIDr, and pbxsecretIDr to compare against the 733 corresponding fields in the received DHPart1 message. 735 From these comparisons, s1, s2, and s3 are calculated per the methods 736 described above in Section 4.3. The secrets corresponding to 737 matching HMACs are kept while the secrets corresponding to the non- 738 matching ones are replaced with a null, which is assumed to have a 739 zero length for the purposes of hashing them later. The resulting 740 s1, s2, and s3 values are used later to calculate s0 in 741 Section 4.4.1.4. 743 For example, consider two ZRTP endpoints who share secrets rs1 and 744 pbxsecret (defined in Section 7.3.1). During the comparison, rs1ID 745 and pbxsecretID will match but auxsecretID will not. As a result, s1 746 = rs1, s2 will be null, and s3 = pbxsecret. 748 4.3.2. Handling a Shared Secret Cache Mismatch 750 A shared secret cache mismatch is defined to mean that we expected a 751 cache match because rs1 exists in our local cache, but we computed a 752 null value for s1 (per the method described in Section 4.3). 754 If one party has a cached shared secret and the other party does not, 755 this indicates one of two possible situations. Either there is a 756 man-in-the-middle (MiTM) attack, or one of the legitimate parties has 757 lost their cached shared secret by some mishap. Perhaps they 758 inadvertently deleted their cache, or their cache was lost or 759 disrupted due to restoring their disk from an earlier backup copy. 760 The party that has the surviving cache entry can easily detect that a 761 cache mismatch has occurred, because they expect their own cached 762 secret to match the other party's cached secret, but it does not 763 match. It is possible for both parties to detect this condition if 764 both parties have surviving cached secrets that have fallen out of 765 sync, due perhaps to one party restoring from a disk backup. 767 If either party discovers a cache mismatch, the user agent who makes 768 this discovery must treat this as a possible security event and MUST 769 alert their own user that there is a heightened risk of a MiTM 770 attack, and that the user should verbally compare the SAS with the 771 other party to ascertain that no MiTM attack has occurred. If a 772 cache mismatch is detected and it is not possible to compare the SAS, 773 either because the user interface does not support it or because one 774 or both endpoints are unmanned devices, and no other SAS comparison 775 mechanism is available, the session MAY be terminated. 777 The session need not be terminated on a cache mismatch event if the 778 mechanism described in Section 8.1.1 is available, which allows 779 authentication of the DH exchange without human assistance. Or if 780 any mechanism is available to determine if the SAS matches. This 781 would require either circumstances that allow human verbal 782 comparisons of the SAS, or by using the OPTIONAL digital signature 783 feature on the SAS hash, as described in Section 7.2. Even if the 784 user interface does not permit an SAS comparison, the human user MUST 785 be warned, and may elect to proceed with the call at their own risk. 787 Here is a non-normative example of a cache-mismatch alert message 788 from a ZRTP user agent (specifically, Zfone [zfone]), designed for a 789 desktop PC graphical user interface environment. It is by no means 790 required that the alert be this detailed: 792 "We expected the other party to have a shared secret cached from a 793 previous call, but they don't have it. This may mean your partner 794 simply lost his cache of shared secrets, but it could also mean 795 someone is trying to wiretap you. To resolve this question you 796 must check the authentication string with your partner. If it 797 doesn't match, it indicates the presence of a wiretapper." 798 If the alert is rendered by a robot voice instead of a GUI, 799 brevity may be more important: "Something's wrong. You must check 800 the authentication string with your partner. If it doesn't match, 801 it indicates the presence of a wiretapper." 803 4.4. DH and non-DH key agreements 805 The next step is the generation of a secret for deriving SRTP keying 806 material. ZRTP uses Diffie-Hellman and two non-Diffie-Hellman modes, 807 described in the following sections. 809 4.4.1. Diffie-Hellman Mode 811 The purpose of the Diffie-Hellman (either Finite Field Diffie-Hellman 812 or Elliptic Curve Diffie-Hellman) exchange is for the two ZRTP 813 endpoints to generate a new shared secret, s0. In addition, the 814 endpoints discover if they have any cached or previously stored 815 shared secrets in common, and uses them as part of the calculation of 816 the session keys. 818 Because the DH exchange affects the state of the retained shared 819 secret cache, only one in-process ZRTP DH exchange may occur at a 820 time between two ZRTP endpoints. Otherwise, race conditions and 821 cache integrity problems will result. When multiple media streams 822 are established in parallel between the same pair of ZRTP endpoints 823 (determined by the ZIDs in the Hello Messages), only one can be 824 processed. Once that exchange completes with Confirm2 and Conf2ACK 825 messages, another ZRTP DH exchange can begin. This constraint does 826 not apply when Multistream mode key agreement is used since the 827 cached shared secrets are not affected. 829 4.4.1.1. Hash Commitment in Diffie-Hellman Mode 831 From the intersection of the algorithms in the sent and received 832 Hello messages, the initiator chooses a hash, cipher, auth tag, key 833 agreement type, and SAS type to be used. 835 A Diffie-Hellman mode is selected by setting the Key Agreement Type 836 to one of the DH or ECDH values in Table 5 in the Commit. In this 837 mode, the key agreement begins with the initiator choosing a fresh 838 random Diffie-Hellman (DH) secret value (svi) based on the chosen key 839 agreement type value, and computing the public value. (Note that to 840 speed up processing, this computation can be done in advance.) For 841 guidance on generating random numbers, see Section 4.8. The value 842 for the DH generator g, the DH prime p, and the length of the DH 843 secret value, svi, are defined in Section 5.1.5. 845 pvi = g^svi mod p 847 where g and p are determined by the key agreement type value. The 848 pvi value is formatted as a big-endian octet string, fixed to the 849 width of the DH prime, and leading zeros MUST NOT be truncated. 851 The hash commitment is performed by the initiator of the ZRTP 852 exchange. The hash value of the initiator, hvi, includes a hash of 853 the entire DHPart2 message as shown in Figure 9 (which includes the 854 Diffie-Hellman public value, pvi), and the responder's Hello message: 856 hvi = hash(initiator's DHPart2 message || responder's Hello 857 message) 859 Note that the Hello message includes the fields shown in Figure 3. 861 The information from the responder's Hello message is included in the 862 hash calculation to prevent a bid-down attack by modification of the 863 responder's Hello message. 865 The initiator sends hvi in the Commit message. 867 The use of hash commitment in the DH exchange constrains the attacker 868 to only one guess to generate the correct short authentication string 869 (SAS) (Section 7) in his attack, which means the SAS can be quite 870 short. A 16-bit SAS, for example, provides the attacker only one 871 chance out of 65536 of not being detected. 873 4.4.1.2. Responder Behavior in Diffie-Hellman Mode 875 Upon receipt of the Commit message, the responder generates its own 876 fresh random DH secret value, svr, and computes the public value. 878 (Note that to speed up processing, this computation can be done in 879 advance.) For guidance on random number generation, see Section 4.8. 880 The value for the DH generator g, the DH prime p, and the length of 881 the DH secret value, svr, are defined in Section 5.1.5. 883 pvr = g^svr mod p 885 The pvr value is formatted as a big-endian octet string, fixed to the 886 width of the DH prime, and leading zeros MUST NOT be truncated. 888 Upon receipt of the DHPart2 message, the responder checks that the 889 initiator's public DH value is not equal to 1 or p-1. An attacker 890 might inject a false DHPart2 packet with a value of 1 or p-1 for 891 g^svi mod p, which would cause a disastrously weak final DH result to 892 be computed. If pvi is 1 or p-1, the user should be alerted of the 893 attack and the protocol exchange MUST be terminated. Otherwise, the 894 responder computes its own value for the hash commitment using the 895 public DH value (pvi) received in the DHPart2 packet and its Hello 896 packet and compares the result with the hvi received in the Commit 897 packet. If they are different, a MiTM attack is taking place and the 898 user is alerted and the protocol exchange terminated. 900 The responder then calculates the Diffie-Hellman result: 902 DHResult = pvi^svr mod p 904 4.4.1.3. Initiator Behavior in Diffie-Hellman Mode 906 Upon receipt of the DHPart1 message, the initiator checks that the 907 responder's public DH value is not equal to 1 or p-1. An attacker 908 might inject a false DHPart1 packet with a value of 1 or p-1 for 909 g^svr mod p, which would cause a disastrously weak final DH result to 910 be computed. If pvr is 1 or p-1, the user should be alerted of the 911 attack and the protocol exchange MUST be terminated. 913 The initiator then sends a DHPart2 message containing the initiator's 914 public DH value and the set of calculated shared secret IDs as 915 defined in Section 4.3.1. 917 The initiator calculates the same Diffie-Hellman result using: 919 DHResult = pvr^svi mod p 921 4.4.1.4. Shared Secret Calculation for DH Mode 923 A hash of the received and sent ZRTP messages in the current ZRTP 924 exchange in the following order is calculated by both parties: 926 total_hash = hash(Hello of responder || Commit || DHPart1 || 927 DHPart2) 929 Note that only the ZRTP messages (Figure 3, Figure 5, Figure 8, and 930 Figure 9), not the entire ZRTP packets, are included in the 931 total_hash. 933 For both the initiator and responder, the DHResult is formatted as a 934 big-endian octet string, fixed to the width of the DH prime, and 935 leading zeros MUST NOT be truncated. For example, for a 3072-bit p, 936 DHResult would be a 384 octet value, with the first octet the most 937 significant. 939 The calculation of the final shared secret, s0, is in compliance with 940 the recommendations in sections 5.8.1 and 6.1.2.1 of NIST SP 800-56A 941 [SP800-56A]. This is done by hashing a concatenation of a number of 942 items, including the DHResult, the ZID's of the initiator (ZIDi) and 943 the responder (ZIDr), the total_hash, and the set of non-null shared 944 secrets as described in Section 4.3. 946 In section 5.8.1 of NIST SP 800-56A [SP800-56A], NIST requires 947 certain parameters to be hashed together in a particular order, which 948 NIST refers to as: Z, AlgorithmID, PartyUInfo, PartyVInfo, 949 SuppPubInfo, and SuppPrivInfo. In our implementation, our DHResult 950 corresponds to Z, "ZRTP-HMAC-KDF" corresponds to AlgorithmID, our 951 ZIDi and ZIDr correspond to PartyUInfo and PartyVInfo, our total_hash 952 corresponds to SuppPubInfo, and the set of three shared secrets s1, 953 s2, and s3 corresponds to SuppPrivInfo. NIST also requires a 32-bit 954 big-endian integer counter to be included in the hash each time the 955 hash is computed, which we have set to the fixed value of 1, because 956 we only compute the hash once. NIST refers to the final hash output 957 as DerivedKeyingMaterial, which corresponds to our s0 in this 958 calculation. 960 s0 = hash(counter || DHResult || "ZRTP-HMAC-KDF" || ZIDi || ZIDr 961 || total_hash || len(s1) || s1 || len(s2) || s2 || len(s3) || s3) 963 Note that temporary values s1, s2, and s3 were calculated per the 964 methods described above in Section 4.3, and they are erased from 965 memory immediately after they are used to calculate s0. 967 The length of the DHResult field was implicitly agreed to by the 968 negotiated DH prime size. The length of total_hash is implicitly 969 determined by the negotiated hash algorithm. All of the explicit 970 length fields, len(), in the above hash are 32-bit big-endian 971 integers, giving the length in octets of the field that follows. 972 Some members of the set of shared secrets (s1, s2, and s3) may have 973 lengths of zero if they are null (not shared), and are each preceded 974 by a 4-octet length field. For example, if s2 is null, len(s2) is 975 0x00000000, and s2 itself would be absent from the hash calculation, 976 which means len(s3) would immediately follow len(s2). While 977 inclusion of ZIDi and ZIDr may be redundant, because they are 978 implicitly included in the total_hash, we explicitly include them 979 here to follow NIST SP 800-56A. The fixed-length string "ZRTP-HMAC- 980 KDF" (not null-terminated) identifies what purpose the resulting s0 981 will be used for, which is to serve as the key derivation key for the 982 ZRTP HMAC-based key derivation function (KDF) defined in 983 Section 4.5.1 and used in Section 4.5.3. 985 ZRTP DH mode is in full compliance with two relevant NIST documents 986 that cover key derivations. First, section 5.8.1 of NIST SP 800-56A 987 [SP800-56A] computes what NIST refers to as DerivedKeyingMaterial, 988 which ZRTP refers to as s0. This s0 then serves as the key 989 derivation key, which NIST refers to as KI in the key derivation 990 function described in sections 5 and 5.1 of NIST SP 800-108 991 [SP800-108], to derive all the rest of the subkeys needed by ZRTP. 993 The ZRTP key derivation function (KDF) (Section 4.5.1) requires the 994 use of a KDF Context field (per NIST SP 800-108 [SP800-108] 995 guidelines) which should include the ZIDi, ZIDr, and a nonce value 996 known to both parties. The total_hash qualifies as a nonce value, 997 because its computation included nonce material from the initiator's 998 Commit message and the responder's Hello message. 1000 KDF_Context = (ZIDi || ZIDr || total_hash) 1002 At this point in DH mode, the two endpoints proceed to the key 1003 derivations of ZRTPSess and the rest of the keys in Section 4.5.2, 1004 now that there is a defined s0. 1006 4.4.2. Preshared Mode 1008 The Preshared key agreement mode can be used to generate SRTP keys 1009 and salts without a DH calculation, instead relying on a shared 1010 secret from previous DH calculations between the endpoints. 1012 This key agreement mode is useful to rapidly re-establish a secure 1013 session between two parties who have recently started and ended a 1014 secure session that has already performed a DH key agreement, without 1015 performing another lengthy DH calculation, which may be desirable on 1016 slow processors in resource-limited environments. Preshared mode 1017 MUST NOT be used for adding additional media streams to an existing 1018 call. Multistream mode MUST be used for this purpose. 1020 In the most severe resource-limited environments, Preshared mode may 1021 be useful with processors that cannot perform a DH calculation in an 1022 ergonomically acceptable time limit. Shared key material may be 1023 manually provisioned between two such endpoints in advance and still 1024 allow a limited subset of functionality. Such a "better than 1025 nothing" implementation would have to be regarded as non-compliant 1026 with the ZRTP specification, but it could interoperate in Preshared 1027 (and if applicable, Multistream) mode with a compliant ZRTP endpoint. 1029 Because Preshared mode affects the state of the retained shared 1030 secret cache, only one in-process ZRTP Preshared exchange may occur 1031 at a time between two ZRTP endpoints. This rule is explained in more 1032 detail in Section 4.4.1, and applies for the same reasons as in DH 1033 mode. 1035 Preshared mode is only included in this specification to meet the 1036 R-REUSE requirement in the Media Security Requirements 1037 [I-D.ietf-sip-media-security-requirements] document. A series of 1038 preshared-keyed calls between two ZRTP endpoints should use a DH key 1039 exchange periodically. Preshared mode is only used if a cached 1040 shared secret has been established in an earlier session by a DH 1041 exchange, as discussed in Section 4.9. 1043 4.4.2.1. Commitment in Preshared Mode 1045 Preshared mode is selected by setting the Key Agreement Type to 1046 Preshared in the Commit message. This results in the same call flow 1047 as Multistream mode. The principal difference between Multistream 1048 mode and Preshared mode is that Preshared mode uses a previously 1049 cached shared secret, rs1, instead of an active ZRTP Session key, 1050 ZRTPSess, as the initial keying material. 1052 Preshared mode depends on having a reliable shared secret in its 1053 cache. Before Preshared mode is used, the initial DH exchange that 1054 gave rise to the shared secret SHOULD have used at least one of these 1055 anti-MiTM mechanisms: 1) A verbal comparison of the SAS, evidenced by 1056 the SAS Verified flag, or 2) an end-to-end integrity-protected 1057 delivery of the a=zrtp-hash in the signaling (Section 8.1.1), or 3) a 1058 digital signature on the sashash (Section 7.2). 1060 4.4.2.2. Initiator Behavior in Preshared Mode 1062 The Commit message (Figure 7) is sent by the initiator of the ZRTP 1063 exchange. From the intersection of the algorithms in the sent and 1064 received Hello messages, the initiator chooses a hash, cipher, auth 1065 tag, key agreement type, and SAS type to be used. 1067 To assemble a Preshared commit, we must first construct a temporary 1068 preshared_key, which is constructed from one of several possible 1069 combinations of cached key material, depending on what is available 1070 in the shared secret cache. If rs1 is not available in the 1071 initiator's cache, then Preshared mode MUST NOT be used. 1073 preshared_key = hash(len(rs1) || rs1 || len(auxsecret) || 1074 auxsecret || len(pbxsecret) || pbxsecret) 1076 All of the explicit length fields, len(), in the above hash are 32- 1077 bit big-endian integers, giving the length in octets of the field 1078 that follows. Some members of the set of shared secrets (rs1, 1079 auxsecret, and pbxsecret) may have lengths of zero if they are null 1080 (not available), and are each preceded by a 4-octet length field. 1081 For example, if auxsecret is null, len(auxsecret) is 0x00000000, and 1082 auxsecret itself would be absent from the hash calculation, which 1083 means len(pbxsecret) would immediately follow len(auxsecret). 1085 In place of hvi in the Commit message, two smaller fields are 1086 inserted by the initiator: 1088 - A random nonce of length 4-words (16 octets). 1089 - A keyID = HMAC(preshared_key, "Prsh") truncated to 64 bits. 1091 Note: Since the nonce is used to calculate different SRTP key and 1092 salt pairs for each session, a duplication will result in the same 1093 key and salt being generated for the two sessions, which would 1094 have disastrous security consequences. 1096 4.4.2.3. Responder Behavior in Preshared Mode 1098 The responder uses the received keyID to search for matching key 1099 material in its cache. It does this by computing a preshared_key 1100 value and keyID value using the same formula as the initiator, 1101 depending on what is available in the responder's local cache. If 1102 the locally computed keyID does not match the received keyID in the 1103 Commit, the responder recomputes a new preshared_key and keyID from a 1104 different subset of shared keys from the cache, dropping auxsecret or 1105 pbxsecret or both from the hash calculation, until a matching 1106 preshared_key is found or it runs out of possibilities. Note that 1107 rs2 is not included in the process. 1109 If it finds the appropriate matching shared key material, it is used 1110 to derive s0 and a new ZRTPSess key, as described in the next section 1111 on Shared Secret Calculation, Section 4.4.2.4. 1113 If the responder determines that it does not have a cached shared 1114 secret from a previous DH exchange, or it fails to match the keyID 1115 hash from the initiator with any combination of its shared keys, it 1116 SHOULD respond with its own DH Commit message. This would reverse 1117 the roles and the responder would become the initiator, because the 1118 DH Commit must always "trump" the Preshared Commit message as 1119 described in Section 4.2. The key exchange would then proceeds using 1120 DH mode. However, if a severely resource-limited responder lacks the 1121 computing resources to respond in a reasonable time with a DH Commit, 1122 it MAY respond with a ZRTP Error message (Section 5.9) indicating 1123 that no shared secret is available. 1125 If both sides send Preshared Commit messages initiating a secure 1126 session at the same time, the contention is resolved and the 1127 initiator/responder roles are settled according to Section 4.2, and 1128 the protocol proceeds. 1130 In Preshared mode, both the DHPart1 and DHPart2 messages are skipped. 1131 After receiving the Commit message from the initiator, the responder 1132 sends the Confirm1 message after calculating this stream's SRTP keys, 1133 as described below. 1135 4.4.2.4. Shared Secret Calculation for Preshared Mode 1137 Preshared mode requires that the s0 and ZRTPSess keys be derived from 1138 the preshared_key, and this must be done in a way that guarantees 1139 uniqueness for each session. This is done by using nonce material 1140 from both parties: the explicit nonce in the initiator's Preshared 1141 Commit message (Figure 7) and the H3 field in the responder's Hello 1142 message (Figure 3). Thus both parties force the resulting shared 1143 secret to be unique for each session. 1145 A hash of the received and sent ZRTP messages in the current ZRTP 1146 exchange for the current media stream is calculated: 1148 total_hash = hash(Hello of responder || Commit) 1150 Note that only the ZRTP messages (Figure 3 and Figure 7), not the 1151 entire ZRTP packets, are included in the total_hash. 1153 The ZRTP key derivation function (KDF) (Section 4.5.1) requires the 1154 use of a KDF Context field (per NIST SP 800-108 [SP800-108] 1155 guidelines) which should include the ZIDi, ZIDr, and a nonce value 1156 known to both parties. The total_hash qualifies as a nonce value, 1157 because its computation included nonce material from the initiator's 1158 Commit message and the responder's Hello message. 1160 KDF_Context = (ZIDi || ZIDr || total_hash) 1162 The s0 key is derived via the ZRTP key derivation function 1163 (Section 4.5.1) from preshared_key and the nonces implicitly included 1164 in the total_hash. The nonces also ensure KDF_Context is unique for 1165 each session, which is critical for security. 1167 s0 = KDF(preshared_key, "ZRTP PSK", KDF_Context, negotiated hash 1168 length) 1170 The preshared_key MUST be erased as soon as it has been used to 1171 calculate s0. 1173 At this point in Preshared mode, the two endpoints proceed to the key 1174 derivations of ZRTPSess and the rest of the keys in Section 4.5.2, 1175 now that there is a defined s0. 1177 4.4.3. Multistream Mode 1179 The Multistream key agreement mode can be used to generate SRTP keys 1180 and salts for additional media streams established between a pair of 1181 endpoints. Multistream mode cannot be used unless there is an active 1182 SRTP session established between the endpoints which means a ZRTP 1183 Session key is active. This ZRTP Session key can be used to generate 1184 keys and salts without performing another DH calculation. In this 1185 mode, the retained shared secret cache is not used or updated. As a 1186 result, multiple ZRTP Multistream mode exchanges can be processed in 1187 parallel between two endpoints. 1189 Multistream mode is also used to resume a secure call that has gone 1190 clear using a GoClear message as described in Section 4.7.2.1. 1192 When adding additional media streams to an existing call, Multistream 1193 mode MUST be used. The first media stream MUST use either DH mode or 1194 Preshared mode. Only one DH exchange or Preshared exchange is 1195 performed, just for the first media stream. The DH exchange or 1196 Preshared exchange MUST be completed for the first media stream 1197 before Multistream mode is used to add any other media streams. In a 1198 Multistream session, a ZRTP endpoint MUST use the same ZID for all 1199 media streams, matching the ZID used in the first media stream. 1201 4.4.3.1. Commitment in Multistream Mode 1203 Multistream mode is selected by the initiator setting the Key 1204 Agreement Type to "Mult" in the Commit message (Figure 6). The 1205 Cipher Type, Auth Tag Length, and Hash in Multistream mode SHOULD be 1206 set by the initiator to the same as the values as in the initial DH 1207 Mode Commit. The SAS Type is ignored as there is no SAS 1208 authentication in this mode. 1210 Note: This requirement is needed since some endpoints cannot 1211 support different SRTP algorithms for different media streams. 1212 However, in the case of Multstream mode being used to go secure 1213 after a GoClear, the requirement to use the same SRTP algorithms 1214 is relaxed if there are no other active SRTP sessions. 1216 In place of hvi in the Commit, a random nonce of length 4-words (16 1217 octets) is chosen. Its value MUST be unique for all nonce values 1218 chosen for active ZRTP sessions between a pair of endpoints. If a 1219 Commit is received with a reused nonce value, the ZRTP exchange MUST 1220 be immediately terminated. 1222 Note: Since the nonce is used to calculate different SRTP key and 1223 salt pairs for each media stream, a duplication will result in the 1224 same key and salt being generated for the two media streams, which 1225 would have disastrous security consequences. 1227 If a Commit is received selecting Multistream mode, but the responder 1228 does not have a ZRTP Session Key available, the exchange MUST be 1229 terminated. Otherwise, the responder proceeds to the next section on 1230 Shared Secret Calculation, Section 4.4.3.2. 1232 If both sides send Multistream Commit messages at the same time, the 1233 contention is resolved and the initiator/responder roles are settled 1234 according to Section 4.2, and the protocol proceeds. 1236 In Multistream mode, both the DHPart1 and DHPart2 messages are 1237 skipped. After receiving the Commit message from the initiator, the 1238 responder sends the Confirm1 message after calculating this stream's 1239 SRTP keys, as described below. 1241 4.4.3.2. Shared Secret Calculation for Multistream Mode 1243 In Multistream mode, each media stream requires that a set of keys be 1244 derived from the ZRTPSess key, and this must be done in a way that 1245 guarantees uniqueness for each media stream. This is done by using 1246 nonce material from both parties: the explicit nonce in the 1247 initiator's Multistream Commit message (Figure 6) and the H3 field in 1248 the responder's Hello message (Figure 3). Thus both parties force 1249 the resulting shared secret to be unique for each media stream. 1251 A hash of the received and sent ZRTP messages in the current ZRTP 1252 exchange for the current media stream is calculated: 1254 total_hash = hash(Hello of responder || Commit) 1256 This refers to the Hello and Commit messages for the current media 1257 stream which is using Multistream mode, not the original media stream 1258 that included a full DH key agreement. Note that only the ZRTP 1259 messages (Figure 3 and Figure 6), not the entire ZRTP packets, are 1260 included in the hash. 1262 The ZRTP key derivation function (KDF) (Section 4.5.1) requires the 1263 use of a KDF Context field (per NIST SP 800-108 [SP800-108] 1264 guidelines) which should include the ZIDi, ZIDr, and a nonce value 1265 known to both parties. The total_hash qualifies as a nonce value, 1266 because its computation included nonce material from the initiator's 1267 Commit message and the responder's Hello message. 1269 KDF_Context = (ZIDi || ZIDr || total_hash) 1271 The current stream's SRTP keys and salts for the initiator and 1272 responder are calculated using the ZRTP Session Key ZRTPSess and the 1273 nonces implicitly included in the total_hash. The nonces also ensure 1274 KDF_Context will be unique for each media stream, which is critical 1275 for security. For each additional media stream, a separate s0 is 1276 derived from ZRTPSess via the ZRTP key derivation function 1277 (Section 4.5.1): 1279 s0 = KDF(ZRTPSess, "ZRTP MSK", KDF_Context, negotiated hash 1280 length) 1282 Note that the ZRTPSess key was previously derived from material that 1283 also includes a different and more inclusive total_hash from the 1284 entire packet sequence that performed the original DH exchange for 1285 the first media stream in this ZRTP session. 1287 At this point in Multistream mode, the two endpoints begin key 1288 derivations in Section 4.5.3. 1290 4.5. Key Derivations 1292 4.5.1. The ZRTP Key Derivation Function 1294 To derive keys from a shared secret, ZRTP uses an HMAC-based key 1295 derivation function, or KDF. It is used throughout Section 4.5.3 and 1296 in other sections. The HMAC function for the KDF is based on the 1297 negotiated hash algorithm defined in Section 5.1.2. 1299 The ZRTP KDF is in full compliance with the recommendations in NIST 1300 SP 800-108 [SP800-108]. Section 7.5 of the NIST document describes 1301 "key separation", which is a security requirement for the 1302 cryptographic keys derived from the same key derivation key. The 1303 keys shall be separate in the sense that the compromise of some 1304 derived keys will not degrade the security strength of any of the 1305 other derived keys, or the security strength of the key derivation 1306 key. Strong preimage resistance is provided. 1308 The ZRTP KDF runs the NIST pseudorandom function (PRF) in counter 1309 mode, with only a single iteration of the counter. The NIST PRF is 1310 based on the HMAC function. The ZRTP KDF never has to generate more 1311 than 256 bits of output key material, so only a single invocation of 1312 the HMAC function is needed. 1314 The ZRTP KDF is defined in this manner, per sections 5 and 5.1 of 1315 NIST SP 800-108 [SP800-108]: 1317 KDF(KI, Label, Context, L) = HMAC(KI, i || Label || 0x00 || 1318 Context || L) 1320 The HMAC in the KDF is keyed by KI, which is a secret key derivation 1321 key that is unknown to the wiretapper (for example, s0). The HMAC is 1322 computed on a concatenated set of nonsecret fields that are defined 1323 as follows. The first field is a 32-bit big-endian integer counter 1324 (i) required by NIST to be included in the HMAC each time the HMAC is 1325 computed, which we have set to the fixed value of 0x000001, because 1326 we only compute the HMAC once. Label is a string of nonzero octets 1327 that identifies the purpose for the derived keying material. The 1328 octet 0x00 is a delimiter required by NIST. The NIST KDF formula has 1329 a "Context" field which includes ZIDi, ZIDr, and some optional nonce 1330 material known to both parties. L is a 32-bit big-endian positive 1331 integer, not to exceed the length in bits of the output of the HMAC. 1332 The output of the KDF is truncated to the leftmost L bits. If SHA- 1333 256 is the negotiated hash algorithm, the HMAC would be HMAC-SHA-256, 1334 thus the maximum value of L would be 256, the negotiated hash length. 1336 The ZRTP KDF is not to be confused with the SRTP KDF defined in 1337 [RFC3711]. 1339 4.5.2. Deriving ZRTPSess Key and SAS in DH or Preshared modes 1341 Both DH mode and Preshared mode (but not Multistream mode) come to 1342 this common point in the protocol to derive ZRTPSess and the SAS from 1343 s0, via the ZRTP Key Derivation Function (Section 4.5.1). At this 1344 point, s0 has been calculated, as well as KDF_Context. These 1345 calculations are done only for the first media stream, not for 1346 Multistream mode. 1348 The ZRTPSess key is used only for these two purposes: 1) to generate 1349 the additional s0 keys (Section 4.4.3.2) for adding additional media 1350 streams to this session in Multistream mode, and 2) to generate the 1351 pbxsecret (Section 7.3.1) that may be cached for use in future calls. 1352 The ZRTPSess key is kept for the duration of the call signaling 1353 session between the two ZRTP endpoints. That is, if there are two 1354 separate calls between the endpoints (in SIP terms, separate SIP 1355 dialogs), then a ZRTP Session Key MUST NOT be used across the two 1356 call signaling sessions. ZRTPSess MUST be destroyed no later than 1357 the end of the call signaling session. 1359 ZRTPSess = KDF(s0, "ZRTP Session Key", KDF_Context, negotiated 1360 hash length) 1362 Note that KDF_Context is unique for each media stream, but only the 1363 first media stream is permitted to calculate ZRTPSess. 1365 There is only one Short Authentication String (SAS) (Section 7) 1366 computed per call, which is applicable to all media streams derived 1367 from a single DH key agreement in a ZRTP session. KDF_Context is 1368 unique for each media stream, but only the first media stream is 1369 permitted to calculate sashash. 1371 sashash = KDF(s0, "SAS", KDF_Context, negotiated hash length) 1372 sasvalue = sashash [truncated to leftmost 32 bits] 1374 Despite the exposure of the SAS to the two parties, the rest of the 1375 keying material is protected by the key separation properties of the 1376 KDF (Section 4.5.1). 1378 ZRTP-enabled VoIP clients may need to support additional forms of 1379 communication, such as text chat, instant messaging, or file 1380 transfers. These other forms of communication may need to be 1381 encrypted, and would benefit from leveraging the ZRTP key exchange 1382 used for the VoIP part of the call. In that case, more key material 1383 MAY be derived and "exported" from the ZRTP protocol and provided as 1384 a shared secret to the VoIP client for these non-VoIP purposes. The 1385 application can use this exported key in application-specific ways, 1386 outside the scope of the ZRTP protocol. It can be used directly for 1387 encryption, or used to authenticate other key exchanges carried out 1388 by the application, protected by ZRTP's MiTM defense umbrella. This 1389 exported key may be used for as long as needed by the application, 1390 maintained in a separate crypto context that may outlast the VoIP 1391 session. 1393 ExportedKey = KDF(s0, "Exported key", KDF_Context, negotiated hash 1394 length) 1396 At this point in DH mode or Preshared mode, the two endpoints proceed 1397 on to the key derivations in Section 4.5.3, now that there is a 1398 defined s0 and ZRTPSess key. 1400 4.5.3. Deriving the rest of the keys from s0 1402 DH mode, Multistream mode, and Preshared mode all come to this common 1403 point in the protocol to derive a set of keys from s0. It can be 1404 assumed that s0 has been calculated, as well the ZRTPSess key and 1405 KDF_Context. A separate s0 key is associated with each media stream. 1407 Subkeys are not drawn directly from s0, as done in NIST SP 800-56A. 1408 To enhance key separation, ZRTP uses s0 to key a Key Derivation 1409 Function (Section 4.5.1) based on NIST SP 800-108 [SP800-108]. Since 1410 s0 already included total_hash in its derivation, it is redundant to 1411 use total_hash again in the KDF Context in all the invocations of the 1412 KDF keyed by s0. Nonetheless, NIST SP 800-108 always requires KDF 1413 Context to be defined for the KDF, and nonce material is required in 1414 some KDF invocations (especially for Multistream mode and Preshared 1415 mode), so total_hash is included as a nonce in the KDF Context. 1417 Separate SRTP master keys and master salts are derived for use in 1418 each direction for each media stream. Unless otherwise specified, 1419 ZRTP uses SRTP with no MKI, 32 bit authentication using HMAC-SHA1, 1420 AES-CM 128 or 256 bit key length, 112 bit session salt key length, 1421 2^48 key derivation rate, and SRTP prefix length 0. 1423 The ZRTP initiator encrypts and the ZRTP responder decrypts packets 1424 by using srtpkeyi and srtpsalti, while the ZRTP responder encrypts 1425 and the ZRTP initiator decrypts packets by using srtpkeyr and 1426 srtpsaltr. The SRTP key and salt values are truncated (taking the 1427 leftmost bits) to the length determined by the chosen SRTP profile. 1428 These are generated by: 1430 srtpkeyi = KDF(s0, "Initiator SRTP master key", KDF_Context, 1431 negotiated AES key length) 1432 srtpsalti = KDF(s0, "Initiator SRTP master salt", KDF_Context, 1433 112) 1434 srtpkeyr = KDF(s0, "Responder SRTP master key", KDF_Context, 1435 negotiated AES key length) 1436 srtpsaltr = KDF(s0, "Responder SRTP master salt", KDF_Context, 1437 112) 1439 The HMAC keys are the same length as the output of the underlying 1440 hash function in the KDF, and are thus generated without truncation. 1441 They are used only by ZRTP and not by SRTP. Different HMAC keys are 1442 needed for the initiator and the responder to ensure that GoClear 1443 messages in each direction are unique and can not be cached by an 1444 attacker and reflected back to the endpoint. 1446 hmackeyi = KDF(s0, "Initiator HMAC key", KDF_Context, negotiated 1447 hash length) 1448 hmackeyr = KDF(s0, "Responder HMAC key", KDF_Context, negotiated 1449 hash length) 1451 ZRTP keys are generated for the initiator and responder to use to 1452 encrypt the Confirm1 and Confirm2 messages. They are truncated to 1453 the same size as the negotiated SRTP key size. 1455 zrtpkeyi = KDF(s0, "Initiator ZRTP key", KDF_Context, negotiated 1456 AES key length) 1457 zrtpkeyr = KDF(s0, "Responder ZRTP key", KDF_Context, negotiated 1458 AES key length) 1460 All key material is destroyed as soon as it is no longer needed, no 1461 later than the end of the call. s0 is erased in Section 4.6.1, and 1462 the rest of the session key material is erased in Section 4.7.2.1 and 1463 Section 4.7.3. 1465 4.6. Confirmation 1467 The Confirm1 and Confirm2 messages (Figure 10) contain the cache 1468 expiration interval (defined in Section 4.9) for the newly generated 1469 retained shared secret. The flagoctet is an 8 bit unsigned integer 1470 made up of these flags: the PBX Enrollment flag (E) defined in 1471 Section 7.3.1, SAS Verified flag (V) defined in Section 7.1, Allow 1472 Clear flag (A) defined in Section 4.7.2, and Disclosure flag (D) 1473 defined in Section 11. 1475 flagoctet = (E * 2^3) + (V * 2^2) + (A * 2^1) + (D * 2^0) 1477 Part of the Confirm1 and Confirm2 messages are encrypted using full- 1478 block Cipher Feedback Mode, and contain a 128-bit random CFB 1479 Initialization Vector (IV). The Confirm1 and Confirm2 messages also 1480 contain an HMAC covering the encrypted part of the Confirm1 or 1481 Confirm2 message which includes a string of zeros, the signature 1482 length, flag octet, cache expiration interval, signature type block 1483 (if present) and signature block (Section 7.2) (if present). For the 1484 responder: 1486 hmac = HMAC(hmackeyr, encrypted part of Confirm1) 1488 For the initiator: 1490 hmac = HMAC(hmackeyi, encrypted part of Confirm2) 1492 The hmackeyi and hmackeyr keys are computed in Section 4.5.3. 1494 The exchange is completed when the responder sends either the 1495 Conf2ACK message or the responder's first SRTP media packet (with a 1496 valid SRTP auth tag). The initiator MUST treat the first valid SRTP 1497 media from the responder as equivalent to receiving a Conf2ACK. The 1498 responder may respond to Confirm2 with either SRTP media or Conf2ACK, 1499 or both, in whichever order the responder chooses (or whichever order 1500 the "cloud" chooses to deliver them). 1502 4.6.1. Updating the Cache of Shared Secrets 1504 After receiving the Confirm messages, both parties must now update 1505 their retained shared secret rs1 in their respective caches, provided 1506 the following conditions hold: 1508 1) This key exchange is either DH or Preshared mode, not 1509 Multistream mode, which does not update the cache. 1510 2) Depending on the values of the cache expiration intervals that 1511 are received in the two Confirm messages, there are some scenarios 1512 that do not update the cache, as explained in Section 4.9. 1513 3) The responder MUST receive the initiator's Confirm2 message 1514 before updating the responder's cache. 1515 4) The initiator MUST receive either the responder's Conf2ACK 1516 message or the responder's SRTP media (with a valid SRTP auth tag) 1517 before updating the initiator's cache. 1519 For DH mode only, before updating the retained shared secret rs1 in 1520 the cache, each party first discards their old rs2 and copies their 1521 old rs1 to rs2. The old rs1 is saved to rs2 because of the risk of 1522 session interruption after one party has updated his own rs1 but 1523 before the other party has enough information to update her own rs1. 1524 If that happens, they may regain cache sync in the next session by 1525 using rs2 (per Section 4.3). This mitigates the well-known Byzantine 1526 Generals' Problem [Byzantine]. The old rs1 value is not saved in 1527 Preshared mode. 1529 For DH mode and Preshared mode, both parties compute a new rs1 value 1530 from s0 via the ZRTP key derivation function (Section 4.5.1): 1532 rs1 = KDF(s0, "retained secret", KDF_Context, negotiated hash 1533 length) 1535 Note that KDF_Context is unique for each media stream, but only the 1536 first media stream is permitted to update rs1. 1538 Each media stream has its own s0. At this point in the protocol for 1539 each media stream, the corresponding s0 MUST be erased. 1541 4.7. Termination 1543 A ZRTP session is normally terminated at the end of a call, but it 1544 may be terminated early by either the Error message or the GoClear 1545 message. 1547 4.7.1. Termination via Error message 1549 The Error message (Section 5.9) is used to terminate an in-progress 1550 ZRTP exchange due to an error. The Error message contains an integer 1551 Error Code for debugging purposes. The termination of a ZRTP key 1552 agreement exchange results in no updates to the cached shared secrets 1553 and deletion of all crypto context. 1555 The ZRTP Session key, ZRTPSess, is only deleted if the ZRTP session 1556 in which it was generated and all ZRTP sessions which are using it 1557 are terminated. 1559 4.7.2. Termination via GoClear message 1561 The GoClear message (Section 5.11) is used to switch from SRTP to 1562 RTP, usually because the user has chosen to do that by pressing a 1563 button. The GoClear uses an HMAC of the Message Type Block sent in 1564 the GoClear Message computed with the hmackey derived from the shared 1565 secret. This HMAC is truncated to the leftmost 64 bits. When sent 1566 by the initiator: 1568 clear_hmac = HMAC(hmackeyi, "GoClear ") 1570 When sent by the responder: 1572 clear_hmac = HMAC(hmackeyr, "GoClear ") 1574 A GoClear message which does not receive a ClearACK response must be 1575 resent. If a GoClear message is received with a bad HMAC, it must be 1576 ignored, and no ClearACK is sent. 1578 A ZRTP endpoint MAY choose to accept GoClear messages after the 1579 session has switched to SRTP, allowing the session to revert to RTP. 1580 This is indicated in the Confirm1 or Confirm2 messages (Figure 10) by 1581 setting the Allow Clear flag (A). If an endpoint sets the Allow 1582 Clear (A) flag in their Confirm message, it indicates that they 1583 support receiving GoClear messages. 1585 A ZRTP endpoint that receives a GoClear MUST authenticate the message 1586 by checking the clear_hmac. If the message authenticates, the 1587 endpoint stops sending SRTP packets, and generates a ClearACK in 1588 response. It MUST also delete all the crypto key material for all 1589 the SRTP media streams, as defined in Section 4.7.2.1. 1591 Until confirmation from the user is received (e.g. clicking a button, 1592 pressing a DTMF key, etc.), the ZRTP endpoint MUST NOT resume sending 1593 RTP packets. The endpoint then renders to the user an indication 1594 that the media session has switched to clear mode, and waits for 1595 confirmation from the user. This blocks the flow of sensitive 1596 discourse until the user is forced to take notice that he's no longer 1597 protected by encryption. To prevent pinholes from closing or NAT 1598 bindings from expiring, the ClearACK message MAY be resent at regular 1599 intervals (e.g. every 5 seconds) while waiting for confirmation from 1600 the user. After confirmation of the notification is received from 1601 the user, the sending of RTP packets may begin. 1603 After sending a GoClear message, the ZRTP endpoint stops sending SRTP 1604 packets. When a ClearACK is received, the ZRTP endpoint deletes the 1605 crypto context for the SRTP session, as defined in Section 4.7.2.1, 1606 and may then resume sending RTP packets. 1608 In the event a ClearACK is not received before the retransmissions of 1609 GoClear are exhausted, the key material is deleted, as defined in 1610 Section 4.7.2.1. 1612 After the users have transitioned from SRTP media back to RTP media 1613 (clear mode), they may decide later to return to secure mode by 1614 manual activation, usually by pressing a GO SECURE button. In that 1615 case, a new secure session is initiated by the party that presses the 1616 button, by sending a new Commit packet, leadng to a new session key 1617 negotiation. It is not necessary to send another Hello packet, as 1618 the two parties have already done that at the start of the call and 1619 thus have already discovered each other's ZRTP capabilities. It is 1620 possible for users to toggle back and forth between clear and secure 1621 modes multiple times in the same call, just as they could in the old 1622 days of secure PSTN phones. 1624 4.7.2.1. Key Destruction for GoClear message 1626 All SRTP session key material MUST be erased by the receiver of the 1627 GoClear message upon receiving a properly authenticated GoClear. The 1628 same key destruction MUST be done by the sender of GoClear message, 1629 upon receiving the ClearACK. This must be done for the key material 1630 for all of the media streams. 1632 All key material that would have been erased at the end of the SIP 1633 session MUST be erased, as described in Section 4.7.3, with the 1634 single exception of ZRTPSess. In this case, ZRTPSess is destroyed in 1635 a manner different from the other key material. Both parties replace 1636 ZRTPSess with a hash of itself, without truncation: 1638 ZRTPSess = hash(ZRTPSess) 1640 This meets the requirements of Perfect Forward Secrecy (PFS), but 1641 preserves a new version of ZRTPSess, so that the user can later re- 1642 initiate secure mode during the same call without performing another 1643 Diffie-Hellman calculation using Multistream mode which requires and 1644 assumes the existence of ZRTPSess with the same value at both ZRTP 1645 endpoints. A new key negotiation after a GoClear SHOULD use a 1646 Multistream Commit message. 1648 Note: Multistream mode is preferred over a Diffie-Hellman mode 1649 since this does not require the generation of a new hash chain and 1650 a new signaling exchange to exchange new hash values. 1652 Later, at the end of the entire call, ZRTPSess is finally destroyed 1653 along with the other key material, as described in Section 4.7.3. 1655 4.7.3. Key Destruction at Termination 1657 All SRTP session key material MUST be erased by both parties at the 1658 end of the call. In particular, the destroyed key material includes 1659 the SRTP session keys and salts, SRTP master keys and salts, and all 1660 material sufficient to reconstruct the SRTP keys and salts, including 1661 ZRTPSess and s0 (although s0 should have been destroyed earlier, in 1662 Section 4.6.1). This must be done for the key material for all of 1663 the media streams. The only exceptions are the cached shared secrets 1664 needed for future calls, including rs1, rs2, and pbxsecret. 1666 4.8. Random Number Generation 1668 The ZRTP protocol uses random numbers for cryptographic key material, 1669 notably for the DH secret exponents and nonces, which must be freshly 1670 generated with each session. Whenever a random number is needed, all 1671 of the following criteria must be satisfied: 1673 Random numbers MUST be freshly generated, meaning that it must not 1674 have been used in a previous calculation. 1676 When generating a random number k of L bits in length, k MUST be 1677 chosen with equal probability from the range of [1 < k < 2^L]. 1679 It MUST be derived from a physical entropy source, such as RF noise, 1680 acoustic noise, thermal noise, high resolution timings of 1681 environmental events, or other unpredictable physical sources of 1682 entropy. For a detailed explanation of cryptographic grade random 1683 numbers and guidance for collecting suitable entropy, see RFC 4086 1684 [RFC4086] and Chapter 10 of Practical Cryptography [Ferguson]. The 1685 raw entropy must be distilled and processed through a deterministic 1686 random bit generator (DRBG). Examples of DRBGs may be found in NIST 1687 SP 800-90 [SP800-90], and in [Ferguson]. Failure to use true entropy 1688 from the physical environment as a basis for generating random 1689 cryptographic key material would lead to a disastrous loss of 1690 security. 1692 4.9. ZID and Cache Operation 1694 Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID that 1695 is generated once at installation time. It is used to look up 1696 retained shared secrets in a local cache. A single global ZID for a 1697 single installation is the simplest way to implement ZIDs. However, 1698 it is specifically not precluded for an implementation to use 1699 multiple ZIDs, up to the limit of a separate one per callee. This 1700 then turns it into a long-lived "association ID" that does not apply 1701 to any other associations between a different pair of parties. It is 1702 a goal of this protocol to permit both options to interoperate 1703 freely. 1705 Each time a new s0 is calculated, a new retained shared secret rs1 is 1706 generated and stored in the cache, indexed by the ZID of the other 1707 endpoint. This cache updating is described in Section 4.6.1. For 1708 the new retained shared secret, each endpoint chooses a cache 1709 expiration value which is an unsigned 32 bit integer of the number of 1710 seconds that this secret should be retained in the cache. The time 1711 interval is relative to when the Confirm1 message is sent or 1712 received. 1714 The cache intervals are exchanged in the Confirm1 and Confirm2 1715 messages (Figure 10). The actual cache interval used by both 1716 endpoints is the minimum of the values from the Confirm1 and Confirm2 1717 messages. A value of 0 seconds means the newly-computed shared 1718 secret SHOULD NOT be stored in the cache, and if a cache entry 1719 already exists from an earlier call, the stored cache interval should 1720 be set to 0. This means if either Confirm message contains a null 1721 cache expiration interval, and there is no cache entry already 1722 defined, no new cache entry is created. A value of 0xffffffff means 1723 the secret should be cached indefinitely and is the recommended 1724 value. If the ZRTP exchange is Multistream Mode, the field in the 1725 Confirm1 and Confirm2 is set to 0xffffffff and ignored, and the cache 1726 is not updated. 1728 The expiration interval need not be used to force the deletion of a 1729 shared secret from the cache when the interval has expired. It just 1730 means the shared secret MAY be deleted from that cache at any point 1731 after the interval has expired without causing the other party to 1732 note it as an unexpected security event when the next key negotiation 1733 occurs between the same two parties. This means there need not be 1734 perfectly synchronized deletion of expired secrets from the two 1735 caches, and makes it easy to avoid a race condition that might 1736 otherwise be caused by clock skew. 1738 If the expiration interval is not properly agreed to by both 1739 endpoints, it may later result in false alarms of MiTM attacks, due 1740 to apparent cache mismatches (Section 4.3.2). 1742 4.9.1. Cacheless implementations 1744 It is possible to implement a simplified but nonetheless useful 1745 profile of the ZRTP protocol that does not support any caching of 1746 shared secrets. In this case the users would have to rely 1747 exclusively on the verbal SAS comparison for every call. That is, 1748 unless MiTM protection is provided by the mechanisms in Section 8.1.1 1749 or Section 7.2, which introduce their own forms of complexity. 1751 If a ZRTP endpoint does not support caching of shared secrets, it 1752 MUST set the cache expiration interval to zero, and MUST set the SAS 1753 Verified (V) flag (Section 7.1) to false. In addition, because the 1754 ZID serves mainly as a cache index, the ZID would not be required to 1755 maintain the same value across separate SIP sessions, although there 1756 is no reason why it should not. 1758 Cacheless operation would sacrifice the key continuity (Section 15.1) 1759 features, as well as Preshared mode (Section 4.4.2). There would 1760 also be no PBX trusted MiTM (Section 7.3) features, including the PBX 1761 security enrollment (Section 7.3.1) mechanism. 1763 5. ZRTP Messages 1765 All ZRTP messages use the message format defined in Figure 2. All 1766 word lengths referenced in this specification are 32 bits or 4 1767 octets. All integer fields are carried in network byte order, that 1768 is, most significant byte (octet) first, commonly known as big- 1769 endian. 1771 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1772 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1773 |0 0 0 1|Not Used (set to zero) | Sequence Number | 1774 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1775 | Magic Cookie 'ZRTP' (0x5a525450) | 1776 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1777 | Source Identifier | 1778 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1779 | | 1780 | ZRTP Message (length depends on Message Type) | 1781 | . . . | 1782 | | 1783 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1784 | CRC (1 word) | 1785 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1786 Figure 2: ZRTP Packet Format 1788 The Sequence Number is a count that is incremented for each ZRTP 1789 packet sent. The count is initialized to a random value. This is 1790 useful in estimating ZRTP packet loss and also detecting when ZRTP 1791 packets arrive out of sequence. 1793 The ZRTP Magic Cookie is a 32 bit string that uniquely identifies a 1794 ZRTP packet, and has the value 0x5a525450. 1796 Source Identifier is the SSRC number of the RTP stream that this ZRTP 1797 packet relates to. For cases of forking or forwarding, RTP and hence 1798 ZRTP may arrive at the same port from several different sources - 1799 each of these sources will have a different SSRC and may initiate an 1800 independent ZRTP protocol session. 1802 This format is clearly identifiable as non-RTP due to the first two 1803 bits being zero which looks like RTP version 0, which is not a valid 1804 RTP version number. It is clearly distinguishable from STUN since 1805 the magic cookies are different. The 12 not used bits are set to 1806 zero and MUST be ignored when received. 1808 The ZRTP Messages are defined in Figure 3 to Figure 17 and are of 1809 variable length. 1811 The ZRTP protocol uses a 32 bit CRC checksum in each ZRTP packet as 1812 defined in RFC 3309 [RFC3309] to detect transmission errors. ZRTP 1813 packets are typically transported by UDP, which carries its own 1814 built-in 16-bit checksum for integrity, but ZRTP does not rely on it. 1815 This is because of the effect of an undetected transmission error in 1816 a ZRTP message. For example, an undetected error in the DH exchange 1817 could appear to be an active man-in-the-middle attack. The 1818 psychological effects of a false announcement of this by ZRTP clients 1819 can not be overstated. The probability of such a false alarm hinges 1820 on a mere 16-bit checksum that usually protects UDP packets, so more 1821 error detection is needed. For these reasons, this belt-and- 1822 suspenders approach is used to minimize the chance of a transmission 1823 error affecting the ZRTP key agreement. 1825 The CRC is calculated across the entire ZRTP packet shown in 1826 Figure 2, including the ZRTP Header and the ZRTP Message, but not 1827 including the CRC field. If a ZRTP message fails the CRC check, it 1828 is silently discarded. 1830 5.1. ZRTP Message Formats 1832 ZRTP messages are designed to simplify endpoint parsing requirements 1833 and to reduce the opportunities for buffer overflow attacks (a good 1834 goal of any security extension should be to not introduce new attack 1835 vectors). 1837 ZRTP uses a block of 8 octets (2 words) to encode the Message Type. 4 1838 octets (1 word) blocks are used to encode Hash Type, Cipher Type, and 1839 Key Agreement Type, and Authentication Tag Type. The values in the 1840 blocks are ASCII strings which are extended with spaces (0x20) to 1841 make them the desired length. Currently defined block values are 1842 listed in Tables 1-6 below. 1844 Additional block values may be defined and used. 1846 ZRTP uses this ASCII encoding to simplify debugging and make it 1847 "Wireshark (Ethereal) friendly". 1849 5.1.1. Message Type Block 1851 Currently 16 Message Type Blocks are defined - they represent the set 1852 of ZRTP message primitives. ZRTP endpoints MUST support the Hello, 1853 HelloACK, Commit, DHPart1, DHPart2, Confirm1, Confirm2, Conf2ACK, 1854 SASrelay, RelayACK, Error, ErrorACK, and PingACK message types. ZRTP 1855 endpoints MAY support the GoClear, ClearACK, and Ping messages. In 1856 order to generate a PingACK message, it is necessary to parse a Ping 1857 message. Additional messages may be defined in extensions to ZRTP. 1859 Message Type Block | Meaning 1860 --------------------------------------------------- 1861 "Hello " | Hello Message 1862 --------------------------------------------------- 1863 "HelloACK" | HelloACK Message 1864 --------------------------------------------------- 1865 "Commit " | Commit Message 1866 --------------------------------------------------- 1867 "DHPart1 " | DHPart1 Message 1868 --------------------------------------------------- 1869 "DHPart2 " | DHPart2 Message 1870 --------------------------------------------------- 1871 "Confirm1" | Confirm1 Message 1872 --------------------------------------------------- 1873 "Confirm2" | Confirm2 Message 1874 --------------------------------------------------- 1875 "Conf2ACK" | Conf2ACK Message 1876 --------------------------------------------------- 1877 "Error " | Error Message 1878 --------------------------------------------------- 1879 "ErrorACK" | ErrorACK Message 1880 --------------------------------------------------- 1881 "GoClear " | GoClear Message 1882 --------------------------------------------------- 1883 "ClearACK" | ClearACK Message 1884 --------------------------------------------------- 1885 "SASrelay" | SASrelay Message 1886 --------------------------------------------------- 1887 "RelayACK" | RelayACK Message 1888 --------------------------------------------------- 1889 "Ping " | Ping Message 1890 --------------------------------------------------- 1891 "PingACK " | PingACK Message 1892 --------------------------------------------------- 1894 Table 1. Message Type Block Values 1896 5.1.2. Hash Type Block 1898 Only one Hash Type is currently defined, SHA-256 [FIPS-180-3], and 1899 all ZRTP endpoints MUST support this hash. Additional Hash Types can 1900 be registered and used, such as the NIST SHA-3 hash [SHA-3] when it 1901 becomes available. Note that the Hash Type refers to the hash 1902 algorithm that will be used throughout the ZRTP key exchange, not the 1903 hash algorithm to be used in the SRTP Authentication Tag. 1905 ZRTP makes use of HMAC message authentication codes based on the 1906 negotiated Hash Type. The HMAC function is defined in [FIPS-198-1]. 1908 Test vectors for HMAC-SHA-256 may be found in [RFC4231]. The HMAC 1909 function based on the negotiated Hash Type is also used in the ZRTP 1910 key derivation function (Section 4.5.1). 1912 Hash Type Block | Meaning 1913 --------------------------------------------------- 1914 "S256" | SHA-256 Hash defined in FIPS 180-3 1915 --------------------------------------------------- 1917 Table 2. Hash Type Block Values 1919 All hashes and HMACs used throughout the ZRTP protocol will use the 1920 negotiated Hash Type, except for the special cases noted in 1921 Section 5.1.2.1. 1923 5.1.2.1. Implicit Hash and HMAC algorithm 1925 While most of the HMACs used in ZRTP are defined by the negotiated 1926 Hash Type (Section 5.1.2), some hashes and HMACs must be precomputed 1927 prior to negotiations, and thus cannot have their algorithms 1928 negotiated during the ZRTP exchange. They are implicitly 1929 predetermined to use SHA-256 [FIPS-180-3] and HMAC-SHA-256. 1931 These are the hashes and HMACs that MUST use the Implicit hash and 1932 HMAC algorithm: 1934 The hash chain H0-H3 defined in Section 9. 1935 The HMACs that are keyed by this hash chain, as defined in 1936 Section 8.1.1. 1937 The Hello Hash in the a=zrtp-hash attribute defined in 1938 Section 8.1. 1940 ZRTP defines a method for negotiating different ZRTP protocol 1941 versions (Section 4.1.1). SHA-256 is the Implicit Hash for ZRTP 1942 protocol version 1.10. Future ZRTP protocol versions may, if 1943 appropriate, use another hash algorithm as the Implicit Hash, such as 1944 the NIST SHA-3 hash [SHA-3] when it becomes available. For example, 1945 a future SIP packet may list two a=zrtp-hash SDP attributes, one 1946 based on SHA-256 for ZRTP version 1.10, and another based on SHA-3 1947 for ZRTP version 2.00. 1949 5.1.3. Cipher Type Block 1951 All ZRTP endpoints MUST support AES-128 (AES1) and MAY support AES- 1952 256 (AES3) or other Cipher Types. The choice of the AES key length 1953 is coupled to the Key Agreement type, as explained in Section 5.1.5. 1955 The use of AES-128 in SRTP is defined by [RFC3711]. The use of AES- 1956 256 in SRTP is defined by [I-D.ietf-avt-srtp-big-aes]. 1958 The Advanced Encryption Standard is defined in [FIPS-197]. 1960 Cipher Type Block | Meaning 1961 --------------------------------------------------- 1962 "AES1" | AES-CM with 128 bit keys 1963 | as defined in RFC 3711 1964 --------------------------------------------------- 1965 "AES3" | AES-CM with 256 bit keys 1966 | 1967 --------------------------------------------------- 1969 Table 3. Cipher Type Block Values 1971 5.1.4. Auth Tag Type Block 1973 All ZRTP endpoints MUST support HMAC-SHA1 authentication tags for 1974 SRTP, with both 32 bit and 80 bit length tags as defined in 1975 [RFC3711]. 1977 Auth Tag Type Block | Meaning 1978 --------------------------------------------------- 1979 "HS32" | HMAC-SHA1 32 bit authentication 1980 | tag as defined in RFC 3711 1981 --------------------------------------------------- 1982 "HS80" | HMAC-SHA1 80 bit authentication 1983 | tag as defined in RFC 3711 1984 --------------------------------------------------- 1986 Table 4. Auth Tag Type Values 1988 5.1.5. Key Agreement Type Block 1990 All ZRTP endpoints MUST support DH3k, SHOULD support Preshared, and 1991 MAY support EC25, EC38, EC52, and DH2k. 1993 If a ZRTP endpoint supports multiple concurrent media streams, such 1994 as audio and video, it MUST support Multistream (Section 4.4.3) mode. 1995 Also, if a ZRTP endpoint supports the GoClear message 1996 (Section 4.7.2), it SHOULD support Multistream, to be used if the two 1997 parties choose to return to the secure state after going Clear (as 1998 explained in Section 4.7.2.1). 2000 For Finite Field Diffie-Hellman, ZRTP endpoints MUST use the DH 2001 parameters defined in RFC 3526 [RFC3526], as follows. DH3k uses the 2002 3072-bit MODP group. DH2k uses the 2048-bit MODP group. The DH 2003 generator g is 2. The random Diffie-Hellman secret exponent SHOULD 2004 be twice as long as the AES key length. If AES-128 is used, the DH 2005 secret value SHOULD be 256 bits long. If AES-256 is used, the secret 2006 value SHOULD be 512 bits long. 2008 If Elliptic Curve DH is used, the ECDH algorithm and key generation 2009 is from NIST SP 800-56A [SP800-56A]. The curves used are from NSA 2010 Suite B [NSA-Suite-B], which uses the same curves as ECDSA defined by 2011 FIPS 186-3 [FIPS-186-3], and can also be found in RFC 4753 [RFC4753], 2012 sections 3.1 through 3.3. The validation procedures are from NIST SP 2013 800-56A [SP800-56A] section 5.6.2.6, method 3, ECC Partial 2014 Validation. Both the X and Y coordinates of the point on the curve 2015 are sent, in the first and second half of the ECDH public value, 2016 respectively. 2018 The choice of AES key length is coupled to the choice of key 2019 agreement type. If either EC38 or EC52 is chosen as the key 2020 agreement, AES-256 (AES3) SHOULD be used. If DH3K or EC25 is chosen, 2021 either AES-128 (AES1) or AES-256 (AES3) MAY be used. Note that SRTP 2022 as defined in RFC 3711 [RFC3711] only supports AES-128. 2024 DH2k is intended for low power applications, or for applications that 2025 require fast key negotiations, and may be used with AES-128. DH2k is 2026 not recommended for high security applications. Its security can be 2027 augmented by implementing ZRTP's key continuity features 2028 (Section 15.1). 2030 ECDH-521 is NOT RECOMMENDED, due to inconvenient computational 2031 delays. It should not be used except when both endpoints are known 2032 to have very fast hardware. Note that ECDH-521 is not part of NSA 2033 Suite B. 2035 ZRTP also defines two non-DH modes, Multistream and Preshared, in 2036 which the SRTP key is derived from a shared secret and some nonce 2037 material. 2039 Table 5 lists the pv length in words and DHPart1 and DHPart2 message 2040 length in words for each Key Agreement Type Block. 2042 Key Agreement | pv | message | Meaning 2043 Type Block | words | words | 2044 ----------------------------------------------------------- 2045 "DH3k" | 96 | 117 | DH mode with p=3072 bit prime 2046 | | | per RFC 3526, section 4. 2047 ----------------------------------------------------------- 2048 "DH2k" | 64 | 85 | DH mode with p=2048 bit prime 2049 | | | per RFC 3526, section 3. 2050 ----------------------------------------------------------- 2051 "EC25" | 16 | 37 | Elliptic Curve DH, P-256 2052 | | | per RFC 4753, section 3.1 2053 ----------------------------------------------------------- 2054 "EC38" | 24 | 45 | Elliptic Curve DH, P-384 2055 | | | per RFC 4753, section 3.2 2056 ----------------------------------------------------------- 2057 "EC52" | 33 | 54 | Elliptic Curve DH, P-521 2058 | | | per RFC 4753, section 3.3 2059 ----------------------------------------------------------- 2060 "Prsh" | - | - | Preshared Non-DH mode 2061 | | | 2062 ----------------------------------------------------------- 2063 "Mult" | - | - | Multistream Non-DH mode 2064 | | | 2065 ----------------------------------------------------------- 2067 Table 5. Key Agreement Type Block Values 2069 5.1.6. SAS Type Block 2071 The SAS Type determines how the SAS is rendered to the user so that 2072 the user may verbally compare it with his partner over the voice 2073 channel. This allows detection of a man-in-the-middle (MiTM) attack. 2075 All ZRTP endpoints MUST support the base32 and MAY support the 2076 base256 rendering schemes for the Short Authentication String, and 2077 other SAS rendering schemes. The ZRTP SAS rendering schemes are 2078 described in Section 7. 2080 SAS Type Block | Meaning 2081 --------------------------------------------------- 2082 "B32 " | Short Authentication String using 2083 | base32 encoding 2084 --------------------------------------------------- 2085 "B256" | Short Authentication String using 2086 | base256 encoding (PGP Word List) 2087 --------------------------------------------------- 2089 Table 6. SAS Type Block Values 2091 5.1.7. Signature Type Block 2093 The signature type block is a 4 octet (1 word) block used to 2094 represent the signature algorithm discussed in Section 7.2. 2095 Suggested signature algorithms and key lengths are a future subject 2096 of standardization. 2098 5.2. Hello message 2100 The Hello message has the format shown in Figure 3. 2102 All ZRTP messages begin with the preamble value 0x505a, then a 16 bit 2103 length in 32 bit words. This length includes only the ZRTP message 2104 (including the preamble and the length) but not the ZRTP packet 2105 header or CRC. The 8-octet Message Type follows the length field. 2107 Next is a 4 character string containing the version (ver) of the ZRTP 2108 protocol which is "1.10" for this specification. Next is the Client 2109 Identifier string (cid) which is 4 words long and identifies the 2110 vendor and release of the ZRTP software. The 256-bit hash image H3 2111 is defined in Section 9. The next parameter is the ZID, the 96 bit 2112 long unique identifier for the ZRTP endpoint, defined in Section 4.9. 2114 The next four bits contains flag bits. The MiTM flag (M) is a 2115 Boolean that is set to true if and only if this Hello message is sent 2116 from a device, usually a PBX, that has the capability to send an 2117 SASrelay message (Section 5.13). The Passive flag (P) is a Boolean 2118 normally set to False. A ZRTP endpoint which is configured to never 2119 initiate secure sessions is regarded as passive, and would set the P 2120 bit to True. The next 8 bits are unused and SHOULD be set to zero 2121 when sent and MUST be ignored on receipt. 2123 Next is a list of supported Hash algorithms, Cipher algorithms, SRTP 2124 Auth Tag types, Key Agreement types, and SAS types. The number of 2125 listed algorithms are listed for each type: hc=hash count, cc=cipher 2126 count, ac=auth tag count, kc=key agreement count, and sc=sas count. 2127 The values for these algorithms are defined in Tables 2, 3, 4, 5, and 2128 6. A count of zero means that only the mandatory to implement 2129 algorithms are supported. Mandatory algorithms MAY be included in 2130 the list. The order of the list indicates the preferences of the 2131 endpoint. If a mandatory algorithm is not included in the list, it 2132 is added to the end of the list for preference. 2134 The 64-bit HMAC at the end of the message is computed across the 2135 whole message, not including the HMAC. The HMAC key is the sender's 2136 H2 (defined in Section 9), and thus the HMAC cannot be checked by the 2137 receiving party until the sender's H2 value is known to the receiving 2138 party later in the protocol. 2140 0 1 2 3 2141 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2142 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2143 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length | 2144 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2145 | Message Type Block="Hello " (2 words) | 2146 | | 2147 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2148 | version="1.10" (1 word) | 2149 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2150 | | 2151 | Client Identifier (4 words) | 2152 | | 2153 | | 2154 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2155 | | 2156 | Hash image H3 (8 words) | 2157 | . . . | 2158 | | 2159 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2160 | | 2161 | ZID (3 words) | 2162 | | 2163 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2164 |0|0|M|P| unused (zeros)| hc | cc | ac | kc | sc | 2165 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2166 | hash algorthms (0 to 7 values) | 2167 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2168 | cipher algorthms (0 to 7 values) | 2169 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2170 | auth tag types (0 to 7 values) | 2171 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2172 | key agreement types (0 to 7 values) | 2173 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2174 | SAS types (0 to 7 values) | 2175 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2176 | HMAC (2 words) | 2177 | | 2178 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2180 Figure 3: Hello message format 2182 5.3. HelloACK message 2184 The HelloACK message is used to stop retransmissions of a Hello 2185 message. A HelloACK is sent regardless if the version number in the 2186 Hello is supported or the algorithm list supported. The receipt of a 2187 HelloACK stops retransmission of the Hello message. The format is 2188 shown in the Figure below. Note that a Commit message can be sent in 2189 place of a HelloACK by an Initiator. 2191 0 1 2 3 2192 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2193 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2194 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 2195 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2196 | Message Type Block="HelloACK" (2 words) | 2197 | | 2198 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2200 Figure 4: HelloACK message format 2202 5.4. Commit message 2204 The Commit message is sent to initiate the key agreement process 2205 after both sides have received a Hello message, which means it can 2206 only be sent after receiving both a Hello message and a HelloACK 2207 message. There are three subtypes of Commit messages, whose formats 2208 are shown in Figure 5, Figure 6, and Figure 7. 2210 The Commit message contains the Message Type Block, then the 256-bit 2211 hash image H2 which is defined in Section 9. The next parameter is 2212 the initiator's ZID, the 96 bit long unique identifier for the ZRTP 2213 endpoint, which must have the same value as was used in the Hello 2214 message. 2216 Next is a list of algorithms selected by the initiator (hash, cipher, 2217 auth tag type, key agreement, sas type). For a DH Commit, the hash 2218 value hvi is a hash of the DHPart2 of the Initiator and the 2219 Responder's Hello message, as explained in Section 4.4.1.1. 2221 The 64-bit HMAC at the end of the message is computed across the 2222 whole message, not including the HMAC. The HMAC key is the sender's 2223 H1 (defined in Section 9), and thus the HMAC cannot be checked by the 2224 receiving party until the sender's H1 value is known to the receiving 2225 party later in the protocol. 2227 0 1 2 3 2228 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2229 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2230 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=29 words | 2231 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2232 | Message Type Block="Commit " (2 words) | 2233 | | 2234 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2235 | | 2236 | Hash image H2 (8 words) | 2237 | . . . | 2238 | | 2239 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2240 | | 2241 | ZID (3 words) | 2242 | | 2243 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2244 | hash algorihm | 2245 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2246 | cipher algorihm | 2247 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2248 | auth tag type | 2249 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2250 | key agreement type | 2251 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2252 | SAS type | 2253 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2254 | | 2255 | hvi (8 words) | 2256 | . . . | 2257 | | 2258 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2259 | HMAC (2 words) | 2260 | | 2261 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2263 Figure 5: DH Commit message format 2265 0 1 2 3 2266 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2267 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2268 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=25 words | 2269 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2270 | Message Type Block="Commit " (2 words) | 2271 | | 2272 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2273 | | 2274 | Hash image H2 (8 words) | 2275 | . . . | 2276 | | 2277 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2278 | | 2279 | ZID (3 words) | 2280 | | 2281 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2282 | hash algorihm | 2283 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2284 | cipher algorihm | 2285 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2286 | auth tag type | 2287 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2288 | key agreement type = "Mult" | 2289 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2290 | SAS type | 2291 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2292 | | 2293 | nonce (4 words) | 2294 | . . . | 2295 | | 2296 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2297 | HMAC (2 words) | 2298 | | 2299 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2301 Figure 6: Multistream Commit message format 2303 0 1 2 3 2304 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2305 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2306 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=27 words | 2307 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2308 | Message Type Block="Commit " (2 words) | 2309 | | 2310 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2311 | | 2312 | Hash image H2 (8 words) | 2313 | . . . | 2314 | | 2315 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2316 | | 2317 | ZID (3 words) | 2318 | | 2319 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2320 | hash algorihm | 2321 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2322 | cipher algorihm | 2323 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2324 | auth tag type | 2325 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2326 | key agreement type = "Prsh" | 2327 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2328 | SAS type | 2329 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2330 | | 2331 | nonce (4 words) | 2332 | . . . | 2333 | | 2334 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2335 | keyID (2 words) | 2336 | | 2337 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2338 | HMAC (2 words) | 2339 | | 2340 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2342 Figure 7: Preshared Commit message format 2344 5.5. DHPart1 message 2346 The DHPart1 message begins the DH exchange. The format is shown in 2347 Figure 8 below. The DHPart1 message is sent by the Responder if a 2348 valid Commit message is received from the Initiator. The length of 2349 the pvr value and the length of the DHPart1 message depends on the 2350 Key Agreement Type chosen. This information is contained in Table 5. 2352 Note that for both Multistream and Preshared modes, no DHPart1 or 2353 DHPart2 message will be sent. 2355 The 256-bit hash image H1 is defined in Section 9. 2357 The next four parameters are HMACs of potential shared secrets used 2358 in generating the ZRTP secret. The first two, rs1IDr and rs2IDr, are 2359 the HMACs of the responder's two retained shared secrets, truncated 2360 to 64 bits. Next is auxsecretIDr, the HMAC of the responder's 2361 auxsecret (defined in Section 4.3), truncated to 64 bits. The last 2362 parameter is the HMAC of the trusted MiTM PBX shared secret 2363 pbxsecret, defined in Section 7.3.1. The Message format for the 2364 DHPart1 message is shown in Figure 8. 2366 The 64-bit HMAC at the end of the message is computed across the 2367 whole message, not including the HMAC. The HMAC key is the sender's 2368 H0 (defined in Section 9), and thus the HMAC cannot be checked by the 2369 receiving party until the sender's H0 value is known to the receiving 2370 party later in the protocol. 2372 0 1 2 3 2373 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2374 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2375 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on KA Type | 2376 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2377 | Message Type Block="DHPart1 " (2 words) | 2378 | | 2379 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2380 | | 2381 | Hash image H1 (8 words) | 2382 | . . . | 2383 | | 2384 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2385 | rs1IDr (2 words) | 2386 | | 2387 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2388 | rs2IDr (2 words) | 2389 | | 2390 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2391 | auxsecretIDr (2 words) | 2392 | | 2393 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2394 | pbxsecretIDr (2 words) | 2395 | | 2396 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2397 | | 2398 | pvr (length depends on KA Type) | 2399 | . . . | 2400 | | 2401 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2402 | HMAC (2 words) | 2403 | | 2404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2406 Figure 8: DHPart1 message format 2408 5.6. DHPart2 message 2410 The DHPart2 message completes the DH exchange. A DHPart2 message is 2411 sent by the Initiator if a valid DHPart1 message is received from the 2412 Responder. The length of the pvr value and the length of the DHPart2 2413 message depends on the Key Agreement Type chosen. This information 2414 is contained in Table 5. Note that for both Multistream and 2415 Preshared modes, no DHPart1 or DHPart2 message will be sent. 2417 The 256-bit hash image H1 is defined in Section 9. 2419 The next four parameters are HMACs of potential shared secrets used 2420 in generating the ZRTP secret. The first two, rs1IDi and rs2IDi, are 2421 the HMACs of the initiator's two retained shared secrets, truncated 2422 to 64 bits. Next is auxsecretIDi, the HMAC of the initiator's 2423 auxsecret (defined in Section 4.3), truncated to 64 bits. The last 2424 parameter is the HMAC of the trusted MiTM PBX shared secret 2425 pbxsecret, defined in Section 7.3.1. The message format for the 2426 DHPart2 message is shown in Figure 9. 2428 The 64-bit HMAC at the end of the message is computed across the 2429 whole message, not including the HMAC. The HMAC key is the sender's 2430 H0 (defined in Section 9), and thus the HMAC cannot be checked by the 2431 receiving party until the sender's H0 value is known to the receiving 2432 party later in the protocol. 2434 0 1 2 3 2435 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2436 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2437 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on KA Type | 2438 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2439 | Message Type Block="DHPart2 " (2 words) | 2440 | | 2441 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2442 | | 2443 | Hash image H1 (8 words) | 2444 | . . . | 2445 | | 2446 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2447 | rs1IDi (2 words) | 2448 | | 2449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2450 | rs2IDi (2 words) | 2451 | | 2452 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2453 | auxsecretIDi (2 words) | 2454 | | 2455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2456 | pbxsecretIDi (2 words) | 2457 | | 2458 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2459 | | 2460 | pvi (length depends on KA Type) | 2461 | . . . | 2462 | | 2463 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2464 | HMAC (2 words) | 2465 | | 2466 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2467 Figure 9: DHPart2 message format 2469 5.7. Confirm1 and Confirm2 messages 2471 The Confirm1 message is sent by the Responder in response to a valid 2472 DHPart2 message after the SRTP session key and parameters have been 2473 negotiated. The Confirm2 message is sent by the Initiator in 2474 response to a Confirm1 message. The format is shown in Figure 10 2475 below. The message contains the Message Type Block "Confirm1" or 2476 "Confirm2". Next is the HMAC, a keyed hash over encrypted part of 2477 the message (shown enclosed by "====" in Figure 10). This HMAC is 2478 keyed and computed according to Section 4.6. The next 16 octets 2479 contain the CFB Initialization Vector. The rest of the message is 2480 encrypted using CFB and protected by the HMAC. 2482 The first field inside the encrypted region is the hash preimage H0, 2483 which is defined in detail in Section 9. 2485 The next 15 bits are not used and SHOULD be set to zero when sent and 2486 MUST be ignored when received in Confirm1 or Confirm2 messages. 2488 The next 9 bits contain the signature length. If no SAS signature 2489 (described in Section 7.2) is present, all bits are set to zero. The 2490 signature length is in words and includes the signature type block. 2491 If the calculated signature octet count is not a multiple of 4, zeros 2492 are added to pad it out to a word boundary. If no signature block is 2493 present, the overall length of the Confirm1 or Confirm2 Message will 2494 be set to 19 words. 2496 The next 8 bits are used for flags. Undefined flags are set to zero 2497 and ignored. Four flags are currently defined. The PBX Enrollment 2498 flag (E) is a Boolean bit defined in Section 7.3.1. The SAS Verified 2499 flag (V) is a Boolean bit defined in Section 7.1. The Allow Clear 2500 flag (A) is a Boolean bit defined in Section 4.7.2. The Disclosure 2501 Flag (D) is a Boolean bit defined in Section 11. The cache 2502 expiration interval is defined in Section 4.9. 2504 If the signature length (in words) is non-zero, a signature type 2505 block will be present along with a signature block. Next is the 2506 signature block. The signature block includes the key used to 2507 generate the signature (Section 7.2). 2509 CFB [SP800-38A] mode is applied with a feedback length of 128-bits, a 2510 full cipher block, and the final block is truncated to match the 2511 exact length of the encrypted data. The CFB Initialization Vector is 2512 a 128 bit random nonce. The block cipher algorithm and the key size 2513 is the same as what was negotiated for the media encryption. CFB is 2514 used to encrypt the part of the Confirm1 message beginning after the 2515 CFB IV to the end of the message (the encrypted region is enclosed by 2516 "====" in Figure 10). 2518 The responder uses the zrtpkeyr to encrypt the Confirm1 message. The 2519 initiator uses the zrtpkeyi to encrypt the Confirm2 message. 2521 0 1 2 3 2522 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2523 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2524 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=variable | 2525 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2526 | Message Type Block="Confirm1" or "Confirm2" (2 words) | 2527 | | 2528 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2529 | HMAC (2 words) | 2530 | | 2531 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2532 | | 2533 | CFB Initialization Vector (4 words) | 2534 | | 2535 | | 2536 +===============================================================+ 2537 | | 2538 | Hash preimage H0 (8 words) | 2539 | . . . | 2540 | | 2541 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2542 | Unused (15 bits of zeros) | sig len (9 bits)|0 0 0 0|E|V|A|D| 2543 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2544 | cache expiration interval (1 word) | 2545 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2546 | optional signature type block (1 word if present) | 2547 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2548 | | 2549 | optional signature block (variable length) | 2550 | . . . | 2551 | | 2552 | | 2553 +===============================================================+ 2555 Figure 10: Confirm1 and Confirm2 message format 2557 5.8. Conf2ACK message 2559 The Conf2ACK message is sent by the Responder in response to a valid 2560 Confirm2 message. The message format for the Conf2ACK is shown in 2561 the Figure below. The receipt of a Conf2ACK stops retransmission of 2562 the Confirm2 message. Note that the first SRTP media (with a valid 2563 SRTP auth tag) from the responder also stops retransmission of the 2564 Confirm2 message. 2566 0 1 2 3 2567 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2568 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2569 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 2570 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2571 | Message Type Block="Conf2ACK" (2 words) | 2572 | | 2573 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2575 Figure 11: Conf2ACK message format 2577 5.9. Error message 2579 The Error message is sent to terminate an in-process ZRTP key 2580 agreement exchange due to an error. The format is shown in the 2581 Figure below. The use of the Error message is described in 2582 Section 4.7.1. 2584 0 1 2 3 2585 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2586 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2587 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=4 words | 2588 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2589 | Message Type Block="Error " (2 words) | 2590 | | 2591 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2592 | Integer Error Code (1 word) | 2593 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2595 Figure 12: Error message format 2597 Defined hexadecimal values for the Error Code are listed in Table 7. 2599 Error Code | Meaning 2600 ----------------------------------------------------------- 2601 0x10 | Malformed packet (CRC OK, but wrong structure) 2602 ----------------------------------------------------------- 2603 0x20 | Critical software error 2604 ----------------------------------------------------------- 2605 0x30 | Unsupported ZRTP version 2606 ----------------------------------------------------------- 2607 0x40 | Hello components mismatch 2608 ----------------------------------------------------------- 2609 0x51 | Hash type not supported 2610 ----------------------------------------------------------- 2611 0x52 | Cipher type not supported 2612 ----------------------------------------------------------- 2613 0x53 | Public key exchange not supported 2614 ----------------------------------------------------------- 2615 0x54 | SRTP auth. tag not supported 2616 ----------------------------------------------------------- 2617 0x55 | SAS scheme not supported 2618 ----------------------------------------------------------- 2619 0x56 | No shared secret available, DH mode required 2620 ----------------------------------------------------------- 2621 0x61 | DH Error: bad pvi or pvr ( == 1, 0, or p-1) 2622 ----------------------------------------------------------- 2623 0x62 | DH Error: hvi != hashed data 2624 ----------------------------------------------------------- 2625 0x63 | Received relayed SAS from untrusted MiTM 2626 ----------------------------------------------------------- 2627 0x70 | Auth. Error: Bad Confirm pkt HMAC 2628 ----------------------------------------------------------- 2629 0x80 | Nonce reuse 2630 ----------------------------------------------------------- 2631 0x90 | Equal ZIDs in Hello 2632 ----------------------------------------------------------- 2633 0xA0 | Service unavailable 2634 ----------------------------------------------------------- 2635 0xB0 | Protocol timeout error 2636 ----------------------------------------------------------- 2637 0x100 | GoClear packet received, but not allowed 2638 ----------------------------------------------------------- 2640 Table 7. ZRTP Error Codes 2642 5.10. ErrorACK message 2644 The ErrorACK message is sent in response to an Error message. The 2645 receipt of an ErrorACK stops retransmission of the Error message. 2646 The format is shown in the Figure below. 2648 0 1 2 3 2649 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2650 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2651 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 2652 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2653 | Message Type Block="ErrorACK" (2 words) | 2654 | | 2655 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2657 Figure 13: ErrorAck message format 2659 5.11. GoClear message 2661 Support for the GoClear message is OPTIONAL in the protocol, and it 2662 is sent to switch from SRTP to RTP. The format is shown in the 2663 Figure below. The clear_hmac is used to authenticate the GoClear 2664 message so that bogus GoClear messages introduced by an attacker can 2665 be detected and discarded. The use of GoClear is described in 2666 Section 4.7.2. 2668 0 1 2 3 2669 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2670 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2671 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=5 words | 2672 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2673 | Message Type Block="GoClear " (2 words) | 2674 | | 2675 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2676 | clear_hmac (2 words) | 2677 | | 2678 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2680 Figure 14: GoClear message format 2682 5.12. ClearACK message 2684 Support for the ClearACK message is OPTIONAL in the protocol, and it 2685 is sent to acknowledge receipt of a GoClear. A ClearACK is only sent 2686 if the clear_hmac from the GoClear message is authenticated. 2687 Otherwise, no response is returned. The format is shown in the 2688 Figure below. 2690 0 1 2 3 2691 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2692 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2693 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 2694 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2695 | Message Type Block="ClearACK" (2 words) | 2696 | | 2697 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2699 Figure 15: ClearAck message format 2701 5.13. SASrelay message 2703 The SASrelay message is sent by a trusted Man in The Middle (MiTM), 2704 most often a PBX. It is not sent as a response to a packet, but is 2705 sent as a self-initiated packet by the trusted MiTM. It can only be 2706 sent after the rest of the ZRTP key negotiations have completed, 2707 after the Confirm packets and their ACKs. It can only be sent after 2708 the trusted MiTM has finished key negotiations with the other party, 2709 because it is the other party's SAS that is being relayed. It is 2710 sent with retry logic until a RelayACK message (Section 5.14) is 2711 received or the retry schedule has been exhausted. 2713 If a device, usually a PBX, sends an SASrelay message, it MUST have 2714 previously declared itself as a MiTM device by setting the MiTM (M) 2715 flag in the Hello message (Section 5.2). If the receiver of the 2716 SASrelay message did not previously receive a Hello message with the 2717 MiTM (M) flag set, the Relayed SAS SHOULD NOT be rendered. A 2718 RelayACK is still sent, but no Error message is sent. 2720 The SASrelay message format is shown in Figure 16 below. The message 2721 contains the Message Type Block "SASrelay". Next is the HMAC, a 2722 keyed hash over encrypted part of the message (shown enclosed by 2723 "====" in Figure 16). This HMAC is keyed the same way as the HMAC in 2724 the Confirm messages (see Section 4.6). The next 16 octets contain 2725 the CFB Initialization Vector. The rest of the message is encrypted 2726 using CFB and protected by the HMAC. 2728 The next 15 bits are not used and SHOULD be set to zero when sent and 2729 MUST be ignored when received in SASrelay messages. 2731 The next 9 bits contain the signature length. The trusted MiTM MAY 2732 compute a digital signature on the SAS hash, as described in 2733 Section 7.2, using a persistant signing key owned by the trusted 2734 MiTM. If no SAS signature is present, all bits are set to zero. The 2735 signature length is in words and includes the signature type block. 2736 If the calculated signature octet count is not a multiple of 4, zeros 2737 are added to pad it out to a word boundary. If no signature block is 2738 present, the overall length of the SASrelay Message will be set to 12 2739 words. 2741 The next 8 bits are used for flags. Undefined flags are set to zero 2742 and ignored. Three flags are currently defined. The Disclosure Flag 2743 (D) is a Boolean bit defined in Section 11. The Allow Clear flag (A) 2744 is a Boolean bit defined in Section 4.7.2. The SAS Verified flag (V) 2745 is a Boolean bit defined in Section 7.1. These flags are updated 2746 values to the same flags provided earlier in the Confirm packet, but 2747 they are updated to reflect the new flag information relayed by the 2748 PBX from the other party. 2750 The next 32 bit word contains the rendering scheme for the relayed 2751 sasvalue, which will be the same rendering scheme used by the other 2752 party on the other side of the trusted MiTM. Section 7.3 describes 2753 how the PBX determines whether the ZRTP client regards the PBX as a 2754 trusted MiTM. If the PBX determines that the ZRTP client trusts the 2755 PBX, the next 32 bit word contains the binary sasvalue relayed from 2756 the other party. If this SASrelay packet is being sent to a ZRTP 2757 client that does not trust this MiTM, the next 32 bit word will be 2758 ignored by the recipient and should be set to zero by the PBX. 2760 If the signature length (in words) is non-zero, a signature type 2761 block will be present along with a signature block. Next is the 2762 signature block. The signature block includes the key used to 2763 generate the signature (Section 7.2). 2765 CFB [SP800-38A] mode is applied with a feedback length of 128-bits, a 2766 full cipher block, and the final block is truncated to match the 2767 exact length of the encrypted data. The CFB Initialization Vector is 2768 a 128 bit random nonce. The block cipher algorithm and the key size 2769 is the same as what was negotiated for the media encryption. CFB is 2770 used to encrypt the part of the SASrelay message beginning after the 2771 CFB IV to the end of the message (the encrypted region is enclosed by 2772 "====" in Figure 16). 2774 Depending on whether the trusted MiTM had taken the role of the 2775 initiator or the responder during the ZRTP key negotiation, the 2776 SASrelay message is encrypted with zrtpkeyi or zrtpkeyr. 2778 0 1 2 3 2779 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2780 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2781 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=variable | 2782 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2783 | Message Type Block="SASrelay" (2 words) | 2784 | | 2785 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2786 | HMAC (2 words) | 2787 | | 2788 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2789 | | 2790 | CFB Initialization Vector (4 words) | 2791 | | 2792 | | 2793 +===============================================================+ 2794 | Unused (15 bits of zeros) | sig len (9 bits)|0 0 0 0|0|V|A|D| 2795 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2796 | rendering scheme of relayed sasvalue (1 word) | 2797 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2798 | Trusted MiTM relayed sasvalue (1 word) | 2799 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2800 | optional signature type block (1 word if present) | 2801 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2802 | | 2803 | optional signature block (variable length) | 2804 | . . . | 2805 | | 2806 | | 2807 +===============================================================+ 2809 Figure 16: SASrelay message format 2811 5.14. RelayACK message 2813 The RelayACK message is sent in response to a valid SASrelay message. 2814 The message format for the RelayACK is shown in the Figure below. 2815 The receipt of a RelayACK stops retransmission of the SASrelay 2816 message. 2818 0 1 2 3 2819 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2820 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2821 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 2822 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2823 | Message Type Block="RelayACK" (2 words) | 2824 | | 2825 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2827 Figure 17: RelayACK message format 2829 5.15. Ping message 2831 The Ping and PingACK messages are unrelated to the rest of the ZRTP 2832 protocol. No ZRTP endpoint is required to generate a Ping message, 2833 but every ZRTP endpoint SHOULD respond to a Ping message with a 2834 PingACK message. 2836 Although Ping and PingACK messages have no effect on the rest of the 2837 ZRTP protocol, their inclusion in this specification simplifies the 2838 design of "bump-in-the-wire" ZRTP proxies (Section 10) (notably, 2839 Zfone [zfone]). It enables proxies to be designed that do not rely 2840 on assistance from the signaling layer to map out the associations 2841 between media streams and ZRTP endpoints. 2843 Before sending a ZRTP Hello message, a ZRTP proxy MAY send a Ping 2844 message as a means to sort out which RTP media streams are connected 2845 to particular ZRTP endpoints. Ping messages are generated only by 2846 ZRTP proxies. If neither party is a ZRTP proxy, no Ping messages 2847 will be encountered. Ping retransmission behavior is discussed in 2848 Section 6. 2850 The Ping message (Figure 18) contains an "EndpointHash", defined in 2851 Section 5.16. 2853 The Ping message contains a version number that defines what version 2854 of PingACK is requested. If that version number is supported by the 2855 Ping responder, a PingACK with a format that matches that version 2856 will be received. Otherwise, a PingACK with a lower version number 2857 may be received. 2859 0 1 2 3 2860 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2861 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2862 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=6 words | 2863 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2864 | Message Type Block="Ping " (2 words) | 2865 | | 2866 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2867 | version="1.10" (1 word) | 2868 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2869 | EndpointHash (2 words) | 2870 | | 2871 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2873 Figure 18: Ping message format 2875 5.16. PingACK message 2877 A PingACK message is sent only in response to a Ping. A ZRTP 2878 endpoint SHOULD respond to a Ping with a PingACK message. The 2879 version of PingACK requested is contained in the Ping message. If 2880 that version number is supported, a PingACK with a format that 2881 matches that version SHOULD be sent. Otherwise, if the version 2882 number of the Ping is not supported, a PingACK SHOULD be sent in the 2883 format of the highest supported version known to the Ping responder. 2884 Only version "1.10" is supported in this specification. 2886 The PingACK message carries its own 64-bit EndpointHash, distinct 2887 from the EndpointHash of the other party's Ping message. It is 2888 REQUIRED that it be highly improbable for two participants in a call 2889 to have the same EndpointHash, and that an EndpointHash maintains a 2890 persistent value between calls. For a normal ZRTP endpoint such as a 2891 ZRTP-enabled VoIP client, the EndpointHash can be just the truncated 2892 ZID. For a ZRTP endpoint such as a PBX that has multiple endpoints 2893 behind it, the EndpointHash must be a distinct value for each 2894 endpoint behind it. It is recommended that the EndpointHash be a 2895 truncated hash of the ZID of the ZRTP endpoint concatenated with 2896 something unique about the actual endpoint or phone behind the PBX. 2897 This may be the SIP URI of the phone, the PBX extension number, or 2898 the local IP address of the phone, whichever is more readily 2899 available in the application environment: 2901 o EndpointHash = hash(ZID || SIP URI of the endpoint) 2902 o EndpointHash = hash(ZID || PBX extension number of the endpoint) 2903 o EndpointHash = hash(ZID || local IP address of the endpoint) 2905 Any of these formulae confers uniqueness for the simple case of 2906 terminating the ZRTP connection at the VoIP client, or the more 2907 complex case of a PBX terminating the ZRTP connection for multiple 2908 VoIP phones in a conference call, all sharing the PBX's ZID, but with 2909 separate IP addresses behind the PBX. There is no requirement for 2910 the same hash function to be used by both parties. 2912 The PingACK message contains the EndpointHash of the sender of the 2913 PingACK as well as the EndpointHash of the sender of the Ping. The 2914 Source Identifier (SSRC) received in the ZRTP header from the Ping 2915 packet (Figure 2) is copied into the PingACK message body 2916 (Figure 19). This SSRC is not the SSRC of the sender of the PingACK. 2918 0 1 2 3 2919 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2920 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2921 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=9 words | 2922 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2923 | Message Type Block="PingACK " (2 words) | 2924 | | 2925 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2926 | version="1.10" (1 word) | 2927 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2928 | EndpointHash of PingACK Sender (2 words) | 2929 | | 2930 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2931 | EndpointHash of Received Ping (2 words) | 2932 | | 2933 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2934 | Source Identifier (SSRC) of Received Ping (1 word) | 2935 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2937 Figure 19: PingACK message format 2939 6. Retransmissions 2941 ZRTP uses two retransmission timers T1 and T2. T1 is used for 2942 retransmission of Hello messages, when the support of ZRTP by the 2943 other endpoint may not be known. T2 is used in retransmissions of 2944 all the other ZRTP messages. 2946 All message retransmissions MUST be identical to the initial message 2947 including nonces, public values, etc; otherwise, hashes of the 2948 message sequences may not agree. 2950 Practical experience has shown that RTP packet loss at the start of 2951 an RTP session can be extremely high. Since the entire ZRTP message 2952 exchange occurs during this period, the defined retransmission scheme 2953 is defined to be aggressive. Since ZRTP packets with the exception 2954 of the DHPart1 and DHPart2 messages are small, this should have 2955 minimal effect on overall bandwidth utilization of the media session. 2957 ZRTP endpoints MUST NOT exceed the bandwidth of the resulting media 2958 session as determined by the offer/answer exchange in the signaling 2959 layer. 2961 The Ping message (Section 5.15) may follow the same retransmission 2962 schedule as the Hello message, but this is not required in this 2963 specification. Ping message retransmission is subject to 2964 application-specific ZRTP proxy heuristics. 2966 Hello ZRTP messages are retransmitted at an interval that starts at 2967 T1 seconds and doubles after every retransmission, capping at 200ms. 2968 T1 has a recommended initial value of 50 ms. A Hello message is 2969 retransmitted 20 times before giving up, which means the entire retry 2970 schedule for Hello messages is exhausted after 3.75 seconds (50 + 100 2971 + 18*200 ms). Retransmission of a Hello ends upon receipt of a 2972 HelloACK or Commit message. 2974 The post-Hello ZRTP messages are retransmitted only by the session 2975 initiator - that is, only Commit, DHPart2, and Confirm2 are 2976 retransmitted if the corresponding message from the responder, 2977 DHPart1, Confirm1, and Conf2ACK, are not received. Note that the 2978 Confirm2 message retransmission can also be stopped by receiving the 2979 first SRTP media (with a valid SRTP auth tag) from the responder. 2981 The GoClear, Error, and SASrelay messages may be initiated and 2982 retransmitted by either party, and responded to by the other party, 2983 regardless of which party is the overall session initiator. They are 2984 retransmitted if the corresponding response message ClearACK, 2985 ErrorACK, and RelayACK, are not received. 2987 Non-Hello (and non-Ping) ZRTP messages are retransmitted at an 2988 interval that starts at T2 seconds and doubles after every 2989 retransmission, capping at 1200ms. T2 has a recommended initial 2990 value of 150 ms. Each non-Hello message is retransmitted 10 times 2991 before giving up, which means the entire retry schedule is exhausted 2992 after 9.45 seconds (150 + 300 + 600 + 7*1200 ms). Only the initiator 2993 performs retransmissions. Each message has a response message that 2994 stops retransmissions, as shown below in Table 8. The higher values 2995 of T2 means that retransmissions will likely occur only in the event 2996 of packet loss. 2998 These recommended retransmission intervals are designed for a typical 2999 broadband Internet connection. In some high latency communication 3000 channels, such as those provided by some mobile phone environments or 3001 geostationary satellites, a different retransmission schedule may be 3002 used. The initial value for the T1 or T2 retransmission timer should 3003 be increased to be no less than the round trip time provided by the 3004 communications channel. It should take into account the time 3005 required to transmit the entire message and the entire reply, as well 3006 as a reasonable time estimate to perform the DH calculation. 3008 After receiving a Commit message, but before receiving a Confirm2 3009 message, if a ZRTP responder receives no ZRTP messages for more than 3010 10 seconds, the responder MAY send a protocol timeout Error message 3011 and terminate the ZRTP protocol. 3013 Message Acknowledgement Message 3014 ------- ----------------------- 3015 Hello HelloACK or Commit 3016 Commit DHPart1 or Confirm1 3017 DHPart2 Confirm1 3018 Confirm2 Conf2ACK or SRTP media 3019 GoClear ClearACK 3020 Error ErrorACK 3021 SASrelay RelayACK 3022 Ping PingACK 3024 Table 8. Retransmitted ZRTP Messages and Responses 3026 7. Short Authentication String 3028 This section will discuss the implementation of the Short 3029 Authentication String, or SAS in ZRTP. The SAS can be verbally 3030 compared by the human users reading the string aloud, or by 3031 validating an OPTIONAL digital signature (described in Section 7.2) 3032 exchanged in the Confirm1 or Confirm2 messages. 3034 The use of hash commitment in the DH exchange (Section 4.4.1.1) 3035 constrains the attacker to only one guess to generate the correct SAS 3036 in his attack, which means the SAS can be quite short. A 16-bit SAS, 3037 for example, provides the attacker only one chance out of 65536 of 3038 not being detected. 3040 The rendering of the SAS value to the user depends on the SAS Type 3041 agreed upon in the Commit message. For the SAS Type of base32, the 3042 leftmost 20 bits of the 32-bit sasvalue are rendered as a form of 3043 base32 encoding known as z-base-32 [z-base-32]. The purpose of 3044 z-base-32 is to represent arbitrary sequences of octets in a form 3045 that is as convenient as possible for human users to manipulate. As 3046 a result, the choice of characters is slightly different from base32 3047 as defined in RFC 3548. The leftmost 20 bits of the sasvalue results 3048 in four base32 characters which are rendered to both ZRTP endpoints. 3049 For the SAS Type of base256, the leftmost 16 bits of the 32-bit 3050 sasvalue are rendered using the PGP Wordlist [pgpwordlist] 3051 [Juola1][Juola2]. Other SAS Types may be defined to render the SAS 3052 value in other ways. 3054 The SAS SHOULD be rendered to the user for authentication. 3056 The SAS is not treated as a secret value, but it must be compared to 3057 see if it matches at both ends of the communications channel. The 3058 two users read it aloud to their partners to see if it matches. This 3059 allows detection of a man-in-the-middle (MiTM) attack. 3061 There is only one SAS value computed per call. That is the SAS value 3062 for the first media stream established, which is calculated in 3063 Section 4.5.2. This SAS applies to all media streams for the same 3064 call. 3066 7.1. SAS Verified Flag 3068 The SAS Verified flag (V) is set based on the user indicating that 3069 SAS comparison has been successfully performed. The SAS Verified 3070 flag is exchanged securely in the Confirm1 and Confirm2 messages 3071 (Figure 10) of the next session. In other words, each party sends 3072 the SAS Verified flag from the previous session in the Confirm 3073 message of the current session. It is perfectly reasonable to have a 3074 ZRTP endpoint that never sets the SAS Verified flag, because it would 3075 require adding complexity to the user interface to allow the user to 3076 set it. The SAS Verified flag is not required to be set, but if it 3077 is available to the client software, it allows for the possibility 3078 that the client software could render to the user that the SAS verify 3079 procedure was carried out in a previous session. 3081 Regardless of whether there is a user interface element to allow the 3082 user to set the SAS Verified flag, it is worth caching a shared 3083 secret, because doing so reduces opportunities for an attacker in the 3084 next call. 3086 If at any time the users carry out the SAS comparison procedure, and 3087 it actually fails to match, then this means there is a very 3088 resourceful man-in-the-middle. If this is the first call, the MiTM 3089 was there on the first call, which is impressive enough. If it 3090 happens in a later call, it also means the MiTM must also know the 3091 cached shared secret, because you could not have carried out any 3092 voice traffic at all unless the session key was correctly computed 3093 and is also known to the attacker. This implies the MiTM must have 3094 been present in all the previous sessions, since the initial 3095 establishment of the first shared secret. This is indeed a 3096 resourceful attacker. It also means that if at any time he ceases 3097 his participation as a MiTM on one of your calls, the protocol will 3098 detect that the cached shared secret is no longer valid -- because it 3099 was really two different shared secrets all along, one of them 3100 between Alice and the attacker, and the other between the attacker 3101 and Bob. The continuity of the cached shared secrets make it possible 3102 for us to detect the MiTM when he inserts himself into the ongoing 3103 relationship, as well as when he leaves. Also, if the attacker tries 3104 to stay with a long lineage of calls, but fails to execute a DH MiTM 3105 attack for even one missed call, he is permanently excluded. He can 3106 no longer resynchronize with the chain of cached shared secrets. 3108 Some sort of user interface element (maybe a checkbox) is needed to 3109 allow the user to tell the software the SAS verify was successful, 3110 causing the software to set the SAS Verified flag (V), which 3111 (together with our cached shared secret) obviates the need to perform 3112 the SAS procedure in the next call. An additional user interface 3113 element can be provided to let the user tell the software he detected 3114 an actual SAS mismatch, which indicates a MiTM attack. The software 3115 can then take appropriate action, clearing the SAS Verified flag, and 3116 erase the cached shared secret from this session. It is up to the 3117 implementer to decide if this added user interface complexity is 3118 warranted. 3120 If the SAS matches, it means there is no MiTM, which also implies it 3121 is now safe to trust a cached shared secret for later calls. If 3122 inattentive users don't bother to check the SAS, it means we don't 3123 know whether there is or is not a MiTM, so even if we do establish a 3124 new cached shared secret, there is a risk that our potential attacker 3125 may have a subsequent opportunity to continue inserting himself in 3126 the call, until we finally get around to checking the SAS. If the 3127 SAS matches, it means no attacker was present for any previous 3128 session since we started propagating cached shared secrets, because 3129 this session and all the previous sessions were also authenticated 3130 with a continuous lineage of shared secrets. 3132 7.2. Signing the SAS 3134 In some applications, it may be hard to arrange for two human users 3135 to verbally compare the SAS. To handle these cases, ZRTP allows for 3136 an OPTIONAL signature feature, which allows the SAS to be checked 3137 without human participation. The SAS MAY be signed and the signature 3138 sent inside the Confirm1, Confirm2 (Figure 10), or SASrelay 3139 (Figure 16) messages. The signature algorithm, length of the 3140 signature and the key used to create the signature are all sent along 3141 with the signature. The key types and signature algorithms are for 3142 future study. The signature is calculated over the entire SAS hash 3143 result (sashash), from which the sasvalue was derived. The 3144 signatures exchanged in the encrypted Confirm1, Confirm2, or SASrelay 3145 messages MAY be used to authenticate the ZRTP exchange. 3147 Although the signature is sent, the material that is signed, the 3148 sashash, is not sent with it, since both parties already know the 3149 sashash value. 3151 7.3. Relaying the SAS through a PBX 3153 ZRTP is designed to use end-to-end encryption. The two parties' 3154 verbal comparison of the short authentication string (SAS) depends on 3155 this assumption. But in some PBX environments, such as Asterisk, 3156 there are usage scenarios that have the PBX acting as a trusted man- 3157 in-the-middle (MiTM), which means there are two back-to-back ZRTP 3158 connections with separate session keys and separate SAS's. 3160 For example, imagine that Bob has a ZRTP-enabled VoIP phone that has 3161 been registered with his company's PBX, so that it is regarded as an 3162 extension of the PBX. Alice, whose phone is not associated with the 3163 PBX, might dial the PBX from the outside, and a ZRTP connection is 3164 negotiated between her phone and the PBX. She then selects Bob's 3165 extension from the company directory in the PBX. The PBX makes a 3166 call to Bob's phone (which might be offsite, many miles away from the 3167 PBX through the Internet) and a separate ZRTP connection is 3168 negotiated between the PBX and Bob's phone. The two ZRTP sessions 3169 have different session keys and different SAS's, which would render 3170 the SAS useless for verbal comparison between Alice and Bob. They 3171 might even mistakenly believe that a wiretapper is present because of 3172 the SAS mismatch, causing undue alarm. 3174 ZRTP has a mechanism for solving this problem by having the PBX relay 3175 the Alice/PBX SAS to Bob, sending it through to Bob in a special 3176 SASrelay packet as defined in Section 5.13, which is sent after the 3177 PBX/Bob ZRTP negotiation is complete, after the Confirm packets. 3178 Only the PBX, acting as a special trusted MiTM (trusted by the 3179 recipient of the SAS relay packet), will relay the SAS. The SASrelay 3180 packet protects the relayed SAS from tampering via an included HMAC, 3181 similar to how the Confirm packet is protected. Bob's ZRTP-enabled 3182 phone accepts the relayed SAS for rendering only because Bob's phone 3183 had previously been configured to trust the PBX. This special 3184 trusted relationship with the PBX can be established through a 3185 special security enrollment procedure. After that enrollment 3186 procedure, the PBX is treated by Bob as a special trusted MiTM. This 3187 results in Alice's SAS being rendered to Bob, so that Alice and Bob 3188 may verbally compare them and thus prevent a MiTM attack by any other 3189 untrusted MiTM. 3191 A real bad-guy MiTM cannot exploit this protocol feature to mount a 3192 MiTM attack and relay Alice's SAS to Bob, because Bob has not 3193 previously carried out a special registration ritual with the bad 3194 guy. The relayed SAS would not be rendered by Bob's phone, because 3195 it did not come from a trusted PBX. The recognition of the special 3196 trust relationship is achieved with the prior establishment of a 3197 special shared secret between Bob and his PBX, which is called 3198 pbxsecret (defined in Section 7.3.1), also known as the trusted MiTM 3199 key. 3201 The trusted MiTM key can be stored in a special cache at the time of 3202 the initial enrollment (which is carried out only once for Bob's 3203 phone), and Bob's phone associates this key with the ZID of the PBX, 3204 while the PBX associates it with the ZID of Bob's phone. After the 3205 enrollment has established and stored this trusted MiTM key, it can 3206 be detected during subsequent ZRTP call negotiations between the PBX 3207 and Bob's phone, because the PBX and the phone MUST pass the hash of 3208 the trusted MiTM key in the DH packet. It is then used as part of 3209 the key agreement to calculate s0. 3211 During a key agreement with two other ZRTP endpoints, the PBX may 3212 have a shared trusted MiTM key with both endpoints, only one 3213 endpoint, or neither endpoint. If the PBX has a shared trusted MiTM 3214 key with neither endpoint, the PBX SHOULD NOT relay the SAS. If the 3215 PBX has a shared trusted MiTM key with only one endpoint, the PBX 3216 SHOULD relay the SAS from one party the other by sending an SASrelay 3217 message to the endpoint that it shares a trusted MiTM key. If the 3218 PBX has a shared trusted MiTM key with both endpoints, the PBX SHOULD 3219 relay the SAS from one party the other by sending an SASrelay message 3220 to only one of the endpoints. 3222 Note: In the case of sharing trusted MiTM key with both endpoints, 3223 it does not matter which endpoint receives the relayed SAS as long 3224 as only one endpoint receives it. 3226 The PBX can determine whether it is trusted by the ZRTP user agent of 3227 the caller or callee. The presence of a shared trusted MiTM key in 3228 the key negotiation sequence indicates that the phone has been 3229 enrolled with this PBX and therefore trusts it to act as a trusted 3230 MiTM. The PBX SHOULD relay the SAS from the other party in this 3231 case. 3233 The relayed SAS fields contain the SAS rendering type and the binary 3234 32-bit sasvalue. The receiver absolutely MUST NOT render the relayed 3235 SAS if it does not come from a specially trusted ZRTP endpoint. The 3236 security of the ZRTP protocol depends on not rendering a relayed SAS 3237 from an untrusted MiTM, because it may be relayed by a MiTM attacker. 3238 See the SASrelay packet definition (Figure 16) for further details. 3240 To ensure that both Alice and Bob will use the same SAS rendering 3241 scheme after the keys are negotiated, the PBX also sends the SASrelay 3242 message to the unenrolled party (which does not regard this PBX as a 3243 trusted MiTM), conveying the SAS rendering scheme, but not the SAS 3244 value, which it sets to zero. The unenrolled party will ignore the 3245 relayed SAS field, but will use the specified SAS rendering scheme. 3247 The next section describes the initial enrollment procedure that 3248 establishes a special shared secret between the PBX and Bob's phone, 3249 a trusted MiTM key, so that the phone will learn to recognize the PBX 3250 as a trusted MiTM. 3252 7.3.1. PBX Enrollment and the PBX Enrollment Flag 3254 Both the PBX and the endpoint need to know when enrollment is taking 3255 place. One way of doing this is to setup an enrollment extension on 3256 the PBX which a newly configured endpoint would call and establish a 3257 ZRTP session. The PBX would then play audio media that offers the 3258 user an opportunity to configure his phone to trust this PBX as a 3259 trusted MiTM. The PBX calculates and stores the trusted MiTM shared 3260 secret in its cache and associates it with this phone, indexed by the 3261 phone's ZID. The trusted MiTM PBX shared secret is derived from 3262 ZRTPSess via the ZRTP key derivation function (Section 4.5.1) in this 3263 manner: 3265 pbxsecret = KDF(ZRTPSess, "Trusted MiTM key", (ZIDi || ZIDr), 3266 negotiated hash length) 3268 The pbxsecret is calculated for the whole ZRTP session, not for each 3269 stream within a session, thus the KDF Context field in this case does 3270 not include any stream-specific nonce material. 3272 The PBX signals the enrollment process by setting the PBX Enrollment 3273 flag (E) in the Confirm message (Figure 10). This flag is used to 3274 trigger the ZRTP endpoint's user interface to prompt the user if they 3275 want to trust this PBX and calculate and store the pbxsecret in the 3276 cache. If the user decides to respond by activating the appropriate 3277 user interface element (a menu item, checkbox, or button), his ZRTP 3278 user agent calculates pbxsecret using the same formula and saves it 3279 in a special cache entry associated with this PBX. 3281 During a PBX enrollment, the GoClear features are disabled. If the 3282 (E) flag is set by the PBX, the PBX MUST NOT set the Allow Clear (A) 3283 flag. Thus, (E) implies not (A). If a received Confirm message has 3284 the (E) flag set, the (A) flag MUST be disregarded and treated as 3285 false. 3287 If the user elects not to enroll, perhaps because he dialed a wrong 3288 number or does not yet feel comfortable with this PBX, he can simply 3289 hang up and not save the pbxsecret in his cache. The PBX will have 3290 it saved in the PBX cache, but that will do no harm. The SASrelay 3291 scheme does not depend on the PBX trusting the phone. It only 3292 depends on the phone trusting the PBX. It is the phone (the user) 3293 who is at risk if the PBX abuses its MiTM privileges. 3295 An endpoint MUST NOT store the pbxsecret in the cache without 3296 explicit user authorization. 3298 After this enrollment process, the PBX and the ZRTP-enabled phone 3299 both share a secret that enables the phone to recognize the PBX as a 3300 trusted MiTM in future calls. This means that when a future call 3301 from an outside ZRTP-enabled caller is relayed through the PBX to 3302 this phone, the phone will render a relayed SAS from the PBX. If the 3303 SASrelay packet comes from a MiTM which does not know the pbxsecret, 3304 the phone treats it as a "bad guy" MiTM, and refuses to render the 3305 relayed SAS. Regardless of which party initiates any future phone 3306 calls through the PBX, the enrolled phone or the outside phone, the 3307 PBX will relay the SAS to the enrolled phone. 3309 There are other ways that ZRTP user agents can be configured to trust 3310 a PBX. Perhaps the pbxsecret can be configured into the phone by 3311 some automated provisioning process in large IT environments. This 3312 specification does not require that products be configured solely by 3313 this enrollment process. Any process that results in a pbxsecret to 3314 be computed and shared between the PBX and the phone will suffice. 3315 This is one such method that has been shown to work. 3317 8. Signaling Interactions 3319 This section discusses how ZRTP, SIP, and SDP work together. 3321 Note that ZRTP may be implemented without coupling with the SIP 3322 signaling. For example, ZRTP can be implemented as a "bump in the 3323 wire" or as a "bump in the stack" in which RTP sent by the SIP UA is 3324 converted to ZRTP. In these cases, the SIP UA will have no knowledge 3325 of ZRTP. As a result, the signaling path discovery mechanisms 3326 introduced in this section should not be definitive - they are a 3327 hint. Despite the absence of an indication of ZRTP support in an 3328 offer or answer, a ZRTP endpoint SHOULD still send Hello messages. 3330 ZRTP endpoints which have control over the signaling path include a 3331 ZRTP SDP attributes in their SDP offers and answers. The ZRTP 3332 attribute, a=zrtp-hash is used to indicate support for ZRTP and to 3333 convey a hash of the Hello message. The hash is computed according 3334 to Section 8.1. 3336 Aside from the advantages described in Section 8.1, there are a 3337 number of potential uses for this attribute. It is useful when 3338 signaling elements would like to know when ZRTP may be utilized by 3339 endpoints. It is also useful if endpoints support multiple methods 3340 of SRTP key management. The ZRTP attribute can be used to ensure 3341 that these key management approaches work together instead of against 3342 each other. For example, if only one endpoint supports ZRTP but both 3343 support another method to key SRTP, then the other method will be 3344 used instead. When used in parallel, an SRTP secret carried in an 3345 a=keymgt [RFC4567] or a=crypto [RFC4568] attribute can be used as a 3346 shared secret for the srtps computation defined in Section 8.2. The 3347 ZRTP attribute is also used to signal to an intermediary ZRTP device 3348 not to act as a ZRTP endpoint, as discussed in Section 10. 3350 The a=zrtp-hash attribute can only be included in the SDP at the 3351 media level since Hello messages sent in different media streams will 3352 have unique hashes. 3354 The ABNF for the ZRTP attribute is as follows: 3356 zrtp-attribute = "a=zrtp-hash:" zrtp-version zrtp-hash-value 3358 zrtp-version = token 3360 zrtp-hash-value = 1*(HEXDIG) 3362 Example of the ZRTP attribute in an initial SDP offer or answer used 3363 at the session level: 3365 v=0 3366 o=bob 2890844527 2890844527 IN IP4 client.biloxi.example.com 3367 s= 3368 c=IN IP4 client.biloxi.example.com 3369 t=0 0 3370 m=audio 3456 RTP/AVP 97 33 3371 a=rtpmap:97 iLBC/8000 3372 a=rtpmap:33 no-op/8000 3373 a=zrtp-hash:1.10 fe30efd02423cb054e50efd0248742ac7a52c8f91bc2df881ae642c371ba46df 3375 A mechanism for carrying this same zrtp-hash information in the 3376 Jingle signaling protocol is defined in [XEP-0262]. 3378 8.1. Binding the media stream to the signaling layer via the Hello Hash 3380 It is desirable to tie the media stream to the signaling channel to 3381 prevent a third party from inserting false media packets. If the 3382 signaling layer contains information that ties it to the media 3383 stream, false media streams can be rejected. 3385 To accomplish this, a 256-bit hash (using the hash algorithm defined 3386 in Section 5.1.2.1) is computed across the entire Hello message 3387 (including everything shown in Figure 3). The hash does not include 3388 ZRTP packet framing from Figure 2. This hash image is made available 3389 to the signaling layer, where it is transmitted as a hexadecimal 3390 value in the SIP channel using the SDP attribute, a=zrtp-hash defined 3391 in this specification. Each media stream (audio or video) will have 3392 a separate Hello packet, and thus will require a separate a=zrtp-hash 3393 in an SDP attribute. The recipient of the SIP/SDP message can then 3394 use this hash image to detect and reject false Hello packets in the 3395 media channel, as well as identify which media stream is associated 3396 with this SIP call. Each Hello packet hashes uniquely, because it 3397 contains the H3 field derived from a random nonce, defined in 3398 Section 9. 3400 The Hello Hash as an SDP attribute is an OPTIONAL feature, because 3401 some ZRTP endpoints do not have the ability to add SDP attributes to 3402 the signaling. For example, if ZRTP is implemented in a hardware 3403 bump-in-the-wire device, it might only have the ability to modify the 3404 media packets, not the SIP packets, especially if the SIP packets are 3405 integrity protected and thus cannot be modified on the wire. If the 3406 SDP has no hash image of the ZRTP Hello message, the recipient's ZRTP 3407 user agent cannot check it, and thus will not be able to reject Hello 3408 messages based on this hash. 3410 After the Hello Hash is used to properly identify the ZRTP Hello 3411 message as belonging to this particular SIP call, the rest of the 3412 ZRTP message sequence is protected from false packet injection by 3413 other protection mechanisms, such as the hash chaining mechanism 3414 defined in Section 9. 3416 An attacker who controls only the signaling layer, such as an 3417 uncooperative VoIP service provider, may be able to deny service by 3418 corrupting the hash of the Hello message in the SDP attribute, which 3419 would force ZRTP to reject perfectly good Hello messages. If there 3420 is reason to believe this is happening, the ZRTP endpoint MAY allow 3421 Hello messages to be accepted that do not match the hash image in the 3422 SDP attribute. 3424 Even in the absence of SIP integrity protection, the inclusion of the 3425 a=zrtp-hash SDP attribute, when coupled with the hash chaining 3426 mechanism defined in Section 9, meets the R-ASSOC requirement in the 3427 Media Security Requirements 3428 [I-D.ietf-sip-media-security-requirements], which requires: 3430 "...a mechanism for associating key management messages with both 3431 the signaling traffic that initiated the session and with 3432 protected media traffic. Allowing such an association also allows 3433 the SDP offerer to avoid performing CPU-consuming operations 3434 (e.g., Diffie-Hellman or public key operations) with attackers 3435 that have not seen the signaling messages." 3437 The a=zrtp-hash SDP attribute becomes especially useful if the SDP is 3438 integrity-protected end-to-end by SIP Identity (RFC 4474) [RFC4474] 3439 or better still, Dan Wing's SIP Identity using Media Path 3440 [I-D.wing-sip-identity-media]. This leads to an ability to stop MiTM 3441 attacks independent of ZRTP's SAS mechanism, as explained in 3442 Section 8.1.1 below. 3444 8.1.1. Integrity-protected signaling enables integrity-protected DH 3445 exchange 3447 If and only if the signaling path and the SDP is protected by some 3448 form of end-to-end integrity protection, such as one of the 3449 abovementioned mechanisms, so that it can guarantee delivery of the 3450 a=zrtp-hash attribute without any tampering by a third party, and if 3451 there is good reason to trust the signaling layer to protect the 3452 interests of the end user, it is possible to authenticate the key 3453 exchange and prevent a MiTM attack. This can be done without 3454 requiring the users to verbally compare the SAS, by using the hash 3455 chaining mechanism defined in Section 9 to provide a series of HMAC 3456 keys that protect the entire ZRTP key exchange. Thus, an end-to-end 3457 integrity-protected signaling layer automatically enables an 3458 integrity-protected Diffie-Hellman exchange in ZRTP, which in turn 3459 means immunity from a MiTM attack. Here's how it works. 3461 The integrity-protected SIP SDP contains a hash commitment to the 3462 entire Hello message. The Hello message contains H3, which provides 3463 a hash commitment for the rest of the hash chain H0-H2 (Section 9). 3464 The Hello message is protected by a 64-bit HMAC, keyed by H2. The 3465 Commit message is protected by a 64-bit HMAC keyed by H1. The 3466 DHPart1 or DHPart2 messages are protected by a 64-bit HMAC keyed by 3467 H0. The HMAC protecting the Confirm messages are computed by a 3468 different HMAC key derived from the resulting key agreement. Each 3469 message's HMAC is checked when the HMAC key is received in the next 3470 message. If a bad HMAC is discovered, it MUST be treated as a 3471 security exception indicating a MiTM attack, perhaps by logging or 3472 alerting the user, and MUST NOT be treated as a random error. Random 3473 errors are already discovered and quietly rejected by bad CRCs 3474 (Figure 2). 3476 The Hello message must be assembled before any hash algorithms are 3477 negotiated, so an implicit predetermined hash algorthm and HMAC 3478 algorthm (both defined in Section 5.1.2.1) must be used. All of the 3479 aforementioned HMACs keyed by the hashes in the aforementioned hash 3480 chain MUST be computed with the HMAC algorithm defined in 3481 Section 5.1.2.1, with the HMAC truncated to 64 bits. 3483 The Media Security Requirements 3484 [I-D.ietf-sip-media-security-requirements] R-EXISTING requirement can 3485 be fully met by leveraging a certificate-backed PKI in the signaling 3486 layer to integrity-protect the delivery of the a=zrtp-hash SDP 3487 attribute. This would thereby protect ZRTP against a MiTM attack, 3488 without requiring the user to check the SAS, without adding any 3489 explicit signatures or signature keys to the ZRTP key exchange, and 3490 without any extra public key operations or extra packets. 3492 Without an end-to-end integrity protection mechanism in the signaling 3493 layer to guarantee delivery of the a=zrtp-hash SDP attribute without 3494 modification by a third party, these HMACs alone will not prevent a 3495 MiTM attack. In that case, ZRTP's built-in SAS mechanism will still 3496 have to be used to authenticate the key exchange. At the time of 3497 this writing, very few deployed VoIP clients offer a fully 3498 implemented SIP stack that provides end-to-end integrity protection 3499 for the delivery of SDP attributes. Also, end-to-end signaling 3500 integrity becomes more problematic if E.164 numbers [RFC3824] are 3501 used in SIP. Thus, real-world implementations of ZRTP endpoints will 3502 continue to depend on SAS authentication for quite some time. Even 3503 after there is widespread availability of SIP user agents that offer 3504 integrity protected delivery of SDP attributes, many users will still 3505 be faced with the fact that the signaling path may be controlled by 3506 institutions that do not have the best interests of the end user in 3507 mind. In those cases, SAS authentication will remain the gold 3508 standard for the prudent user. 3510 Even without SIP integrity protection, the Media Security 3511 Requirements [I-D.ietf-sip-media-security-requirements] R-ACT-ACT 3512 requirement can be met by ZRTP's SAS mechanism. Although ZRTP may 3513 benefit from an integrity-protected SIP layer, it is fortunate that 3514 ZRTP's self-contained MiTM defenses do not actually require an 3515 integrity-protected SIP layer. ZRTP can bypass the delays and 3516 problems that SIP integrity faces, such as E.164 number usage, and 3517 the complexity of building and maintaining a PKI. 3519 In contrast, DTLS-SRTP [I-D.ietf-avt-dtls-srtp] appears to depend 3520 heavily on end-to-end integrity protection in the SIP layer. 3521 Further, DTLS-SRTP must bear the additional cost of a signature 3522 calculation of its own, in addition to the signature calculation the 3523 SIP layer uses to achieve its integrity protection. ZRTP needs no 3524 signature calculation of its own to leverage the signature 3525 calculation carried out in the SIP layer. 3527 8.2. Deriving the SRTP secret (srtps) from the signaling layer 3529 The shared secret calculations defined in Section 4.3 make use of the 3530 SRTP secret (srtps), if it is provided by the signaling layer. 3532 It is desirable for only one SRTP key negotiation protocol to be 3533 used, and that protocol should be ZRTP. But in the event the 3534 signaling layer negotiates its own SRTP master key and salt, using 3535 the SDES [RFC4568] or [RFC4567], it can be passed from the signaling 3536 to the ZRTP layer and mixed into ZRTP's own shared secret 3537 calculations, without compromising security by creating a dependency 3538 on the signaling for media encryption. 3540 ZRTP computes srtps from the SRTP master key and salt parameters 3541 provided by the signaling layer in this manner: 3543 srtps = hash(SRTP master key || SRTP master salt) 3545 It is expected that the srtps parameter will be rarely computed or 3546 used in typical ZRTP endpoints, because it is likely and desirable 3547 that ZRTP will be the sole means of negotiating SRTP keys, needing no 3548 help from SDES [RFC4568] or [RFC4567]. If srtps is computed, it will 3549 be stored in the auxiliary shared secret auxsecret, defined in 3550 Section 4.3, and used in Section 4.3.1. 3552 8.3. Codec Selection for Secure Media 3554 Codec selection is negotiated in the signaling layer. If the 3555 signaling layer determines that ZRTP is supported by both endpoints, 3556 this should provide guidance in codec selection to avoid variable 3557 bit-rate (VBR) codecs that leak information. 3559 When voice is compressed with a VBR codec, the packet lengths vary 3560 depending on the types of sounds being compressed. This leaks a lot 3561 of information about the content even if the packets are encrypted, 3562 regardless of what encryption protocol is used [Wright1]. It is 3563 RECOMMENDED that VBR codecs be avoided in encrypted calls. It is not 3564 a problem if the codec adapts the bit rate to the available channel 3565 bandwidth. The vulnerable codecs are the ones that change their bit 3566 rate depending on the type of sound being compressed. 3568 It also appears that voice activity detection (VAD) leaks information 3569 about the content of the conversation, but to a lesser extent than 3570 VBR. This effect can be ameliorated by lengthening the VAD hangover 3571 time by about 1 to 2 seconds, if this is feasible in your 3572 application. This is a topic that requires further study. 3574 9. False ZRTP Packet Rejection 3576 An attacker who is not in the media path may attempt to inject false 3577 ZRTP protocol packets, possibly to effect a denial of service attack, 3578 or to inject his own media stream into the call. VoIP by its nature 3579 invites various forms of denial of service attacks and requires 3580 protocol features to reject such attacks. While bogus SRTP packets 3581 may be easily rejected via the SRTP auth tag field, that can only be 3582 applied after a key agreement is completed. During the ZRTP key 3583 negotiation phase, other false packet rejection mechanisms are 3584 needed. One such mechanism is the use of the total_hash in the final 3585 shared secret calculation, but that can only detect false packets 3586 after performing the computationally expensive Diffie-Hellman 3587 calculation. 3589 The VoIP developer community expects to see a lot of denial of 3590 service attacks, especially from attackers who are not in the media 3591 path. Such an attacker might inject false ZRTP packets to force a 3592 ZRTP endpoint to engage in an endless series of pointless and 3593 expensive DH calculations. To detect and reject false packets 3594 cheaply and rapidly as soon as they are received, ZRTP uses a hash 3595 chain, which is a series of successive hash images. Before each 3596 session, the following values are computed: 3598 H0 = 256-bit random nonce (different for each party) 3599 H1 = hash (H0) 3600 H2 = hash (H1) 3601 H3 = hash (H2) 3603 The hash chain MUST use the hash algorithm defined in 3604 Section 5.1.2.1. Each 256-bit hash image is the preimage of the 3605 next, and the sequence of images is sent in reverse order in the ZRTP 3606 packet sequence. The hash image H3 is sent in the Hello packet, H2 3607 is sent in the Commit packet, H1 is sent in the DHPart1 or DHPart2 3608 packets, and H0 is sent in the Confirm1 or Confirm2 packets. The 3609 initial random H0 nonces that each party generates MUST be 3610 unpredictable to an attacker and unique within a ZRTP call, which 3611 thereby forces the derived hash images H1-H3 to also be unique and 3612 unpredictable. 3614 The recipient checks if the packet has the correct hash preimage, by 3615 hashing it and comparing the result with the hash image for the 3616 preceding packet. Packets which contain an incorrect hash preimage 3617 MUST NOT be used by the recipient, but MAY be processed as security 3618 exceptions, perhaps by logging or alerting the user. As long as 3619 these bogus packets are not used, and correct packets are still being 3620 received, the protocol SHOULD be allowed to run to completion, 3621 thereby rendering ineffective this denial of service attack. 3623 Note that since H2 is sent in the Commit message, and the initiator 3624 does not receive a Commit message, the initiator computes the 3625 responder's missing H2 by hashing the responder's H1. An analogous 3626 interpolation is performed by both parties to handle the skipped 3627 DHPart1 and DHPart2 messages in Preshared (Section 3.1.2) or 3628 Multistream (Section 3.1.3) modes. 3630 Because these hash images alone do not protect the rest of the 3631 contents of the packet they reside in, this scheme assumes the 3632 attacker cannot modify the packet contents from a legitimate party, 3633 which is a reasonable assumption for an attacker who is not in the 3634 media path. This covers an important range of denial-of-service 3635 attacks. For dealing with the remaining set of attacks that involve 3636 packet modification, other mechanisms are used, such as the 3637 total_hash in the final shared secret calculation, and the hash 3638 commitment in the Commit packet. 3640 False Hello packets may be detected and rejected by the mechanism 3641 defined in Section 8.1. This mechanism requires that each Hello 3642 packet be unique, and the inclusion of the H3 hash image meets that 3643 requirement. 3645 If and only if an integrity-protected signaling channel is available, 3646 this hash chaining scheme can be used to key HMACs to authenticate 3647 the entire ZRTP key exchange, and thereby prevent a MiTM attack, 3648 without relying on the users verbally comparing the SAS. See 3649 Section 8.1.1 for details. 3651 Some ZRTP user agents allow the user to manually switch to clear mode 3652 (via the GoClear packet) in the middle of a secure call, and then 3653 later initiate secure mode again. Many consumer client products will 3654 omit this feature, but those that allow it may return to secure mode 3655 again in the same media stream. Although the same chain of hash 3656 images will be re-used and thus rendered ineffective the second time, 3657 no real harm is done because the new SRTP session keys will be 3658 derived in part from a cached shared secret, which was safely 3659 protected from the MiTM in the previous DH exchange earlier in the 3660 same call. 3662 10. Intermediary ZRTP Devices 3664 This section discusses the operation of a ZRTP endpoint which is 3665 actually an intermediary. For example, consider a device which 3666 proxies both signaling and media between endpoints. There are three 3667 possible ways in which such a device could support ZRTP. 3669 An intermediary device can act transparently to the ZRTP protocol. 3671 To do this, a device MUST pass RTP header extensions and payloads (to 3672 allow the ZRTP Flag) and non-RTP protocols multiplexed on the same 3673 port as RTP (to allow ZRTP and STUN). This is the RECOMMENDED 3674 behavior for intermediaries as ZRTP and SRTP are best when done end- 3675 to-end. 3677 An intermediary device could implement the ZRTP protocol and act as a 3678 ZRTP endpoint on behalf of non-ZRTP endpoints behind the intermediary 3679 device. The intermediary could determine on a call-by-call basis 3680 whether the endpoint behind it supports ZRTP based on the presence or 3681 absence of the ZRTP SDP attribute flag (a=zrtp-hash). For non-ZRTP 3682 endpoints, the intermediary device could act as the ZRTP endpoint 3683 using its own ZID and cache. This approach SHOULD only be used when 3684 there is some other security method protecting the confidentiality of 3685 the media between the intermediary and the inside endpoint, such as 3686 IPSec or physical security. 3688 The third mode, which is NOT RECOMMENDED, is for the intermediary 3689 device to attempt to back-to-back the ZRTP protocol. The only 3690 exception to this case is where the intermediary device is a trusted 3691 element providing services to one of the endpoints - e.g. a Private 3692 Branch Exchange or PBX. In this mode, the intermediary would attempt 3693 to act as a ZRTP endpoint towards both endpoints of the media 3694 session. This approach MUST NOT be used except as described in 3695 Section 7.3 as it will always result in a detected man-in-the-middle 3696 attack and will generate alarms on both endpoints and likely result 3697 in the immediate termination of the session. 3699 In cases where centralized media mixing is taking place, the SAS will 3700 not match when compared by the humans. However, this situation is 3701 known in the SIP signaling by the presence of the isfocus feature tag 3702 [RFC4579]. As a result, when the isfocus feature tag is present, the 3703 DH exchange can be authenticated by the mechanism defined in 3704 Section 8.1.1 or by validating signatures (Section 7.2) in the 3705 Confirm or SASrelay messages. For example, consider a audio 3706 conference call with three participants Alice, Bob, and Carol hosted 3707 on a conference bridge in Dallas. There will be three ZRTP encrypted 3708 media streams, one encrypted stream between each participant and 3709 Dallas. Each will have a different SAS. Each participant will be 3710 able to validate their SAS with the conference bridge by using 3711 signatures optionally present in the Confirm messages (described in 3712 Section 7.2). Or, if the signaling path has end-to-end integrity 3713 protection, each DH exchange will have automatic MiTM protection by 3714 using the mechanism in Section 8.1.1. 3716 SIP feature tags can also be used to detect if a session is 3717 established with an automaton such as an IVR, voicemail system, or 3718 speech recognition system. The display of SAS strings to users 3719 should be disabled in these cases. 3721 It is possible that an intermediary device acting as a ZRTP endpoint 3722 might still receive ZRTP Hello and other messages from the inside 3723 endpoint. This could occur if there is another inline ZRTP device 3724 which does not include the ZRTP SDP attribute flag. An intermediary 3725 acting as a ZRTP endpoint receiving ZRTP Hello and other messages 3726 from the inside endpoint MUST NOT pass these ZRTP messages. 3728 11. The ZRTP Disclosure flag 3730 There are no back doors defined in the ZRTP protocol specification. 3731 The designers of ZRTP would like to discourage back doors in ZRTP- 3732 enabled products. However, despite the lack of back doors in the 3733 actual ZRTP protocol, it must be recognized that a ZRTP implementer 3734 might still deliberately create a rogue ZRTP-enabled product that 3735 implements a back door outside the scope of the ZRTP protocol. For 3736 example, they could create a product that discloses the SRTP session 3737 key generated using ZRTP out-of-band to a third party. They may even 3738 have a legitimate business reason to do this for some customers. 3740 For example, some environments have a need to monitor or record 3741 calls, such as stock brokerage houses who want to discourage insider 3742 trading, or special high security environments with special needs to 3743 monitor their own phone calls. We've all experienced automated 3744 messages telling us that "This call may be monitored for quality 3745 assurance". A ZRTP endpoint in such an environment might 3746 unilaterally disclose the session key to someone monitoring the call. 3747 ZRTP-enabled products that perform such out-of-band disclosures of 3748 the session key can undermine public confidence in the ZRTP protocol, 3749 unless we do everything we can in the protocol to alert the other 3750 user that this is happening. 3752 If one of the parties is using a product that is designed to disclose 3753 their session key, ZRTP requires them to confess this fact to the 3754 other party through a protocol message to the other party's ZRTP 3755 client, which can properly alert that user, perhaps by rendering it 3756 in a graphical user interface. The disclosing party does this by 3757 sending a Disclosure flag (D) in Confirm1 and Confirm2 messages as 3758 described in Section 5.7. 3760 Note that the intention here is to have the Disclosure flag identify 3761 products that are designed to disclose their session keys, not to 3762 identify which particular calls are compromised on a call-by-call 3763 basis. This is an important legal distinction, because most 3764 government sanctioned wiretap regulations require a VoIP service 3765 provider to not reveal which particular calls are wiretapped. But 3766 there is nothing illegal about revealing that a product is designed 3767 to be wiretap-friendly. The ZRTP protocol mandates that such a 3768 product "out" itself. 3770 You might be using a ZRTP-enabled product with no back doors, but if 3771 your own graphical user interface tells you the call is (mostly) 3772 secure, except that the other party is using a product that is 3773 designed in such a way that it may have disclosed the session key for 3774 monitoring purposes, you might ask him what brand of secure telephone 3775 he is using, and make a mental note not to purchase that brand 3776 yourself. If we create a protocol environment that requires such 3777 back-doored phones to confess their nature, word will spread quickly, 3778 and the "invisible hand" of the free market will act. The free 3779 market has effectively dealt with this in the past. 3781 Of course, a ZRTP implementer can lie about his product having a back 3782 door, but the ZRTP standard mandates that ZRTP-compliant products 3783 MUST adhere to the requirement that a back door be confessed by 3784 sending the Disclosure flag to the other party. 3786 There will be inevitable comparisons to Steve Bellovin's 2003 April 3787 fool's joke, when he submitted RFC 3514 [RFC3514] which defined the 3788 "Evil bit" in the IPV4 header, for packets with "evil intent". But 3789 we submit that a similar idea can actually have some merit for 3790 securing VoIP. Sure, one can always imagine that some implementer 3791 will not be fazed by the rules and will lie, but they would have lied 3792 anyway even without the Disclosure flag. There are good reasons to 3793 believe that it will improve the overall percentage of 3794 implementations that at least tell us if they put a back door in 3795 their products, and may even get some of them to decide not to put in 3796 a back door at all. From a civic hygiene perspective, we are better 3797 off with having the Disclosure flag in the protocol. 3799 If an endpoint stores or logs SRTP keys or information that can be 3800 used to reconstruct or recover SRTP keys after they are no longer in 3801 use (i.e. the session is active), or otherwise discloses or passes 3802 SRTP keys or information that can be used to reconstruct or recover 3803 SRTP keys to another application or device, the Disclosure flag D 3804 MUST be set in the Confirm1 or Confirm2 message. 3806 11.1. Guidelines on Proper Implementation of the Disclosure Flag 3808 Some implementers have asked for guidance on implementing the 3809 Disclosure Flag. Some people have incorrectly thought that a 3810 connection secured with ZRTP cannot be used in a call center, with 3811 voluntary voice recording, or even with a voicemail system. 3812 Similarly, some potential users of ZRTP have over considered the 3813 protection that ZRTP can give them. These guidelines clarify both 3814 concerns. 3816 The ZRTP Disclosure Flag only governs the ZRTP/SRTP stream itself. 3817 It does not govern the underlying RTP media stream, nor the actual 3818 media itself. Consequently, a PBX that uses ZRTP may provide 3819 conference calls, call monitoring, call recording, voicemail, or 3820 other PBX features and still say that it does not disclose the ZRTP 3821 key material. A video system may provide DVR features and still say 3822 that it does not disclose the ZRTP key material. The ZRTP Disclosure 3823 Flag, when not set, means only that the ZRTP cryptographic key 3824 material stays within the bounds of the ZRTP subsystem. 3826 If an application has a need to disclose the ZRTP cryptographic key 3827 material, the easiest way to comply with the protocol is to set the 3828 flag to the proper value. The next easiest way is to overestimate 3829 disclosure. For example, a call center that commonly records calls 3830 might choose to set the disclosure flag even though all recording is 3831 an analog recording of a call (and thus outside the ZRTP scope) 3832 because it sets an expectation with clients that their calls might be 3833 recorded. 3835 Note also that the ZRTP Disclosure Flag does not require an 3836 implementation to preclude hacking or malware. Malware that leaks 3837 ZRTP cryptographic key material does not create a liability for the 3838 implementor from non-compliance with the ZRTP specification. 3840 A user of ZRTP should note that ZRTP is not a panacea against 3841 unauthorized recording. ZRTP does not and cannot protect against an 3842 untrustworthy partner who holds a microphone up to the speaker. It 3843 does not protect against someone else being in the room. It does not 3844 protect against analog wiretaps in the phone or in the room. It does 3845 not mean your partner has not been hacked with spyware. It does not 3846 mean that the software has no flaws. It means that the ZRTP 3847 subsystem is not knowingly leaking ZRTP cryptographic key material. 3849 12. RTP Header Extension Flag for ZRTP 3851 This specification defines a new RTP header extension used only for 3852 discovery of support for ZRTP. No ZRTP data is transported in the 3853 extension. When used, the X bit is set in the RTP header to indicate 3854 the presence of the RTP header extension. 3856 Section 5.3.1 in RFC 3550 [RFC3550] defines the format of an RTP 3857 Header extension. The Header extension is appended to the RTP 3858 header. The first 16 bits are an identifier for the header 3859 extension, and the following 16 bits are length of the extension 3860 header in 32 bit words. The ZRTP flag RTP header extension has the 3861 value of 0x505A and a length of 0. The format of the header 3862 extension is as shown in the Figure below. 3864 0 1 2 3 3865 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3866 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3867 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| 3868 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3870 Figure 20: RTP Extension header format for ZRTP Flag 3872 ZRTP endpoints MAY include the ZRTP Flag in RTP packets sent at the 3873 start of a session. For example, an endpoint may decide to include 3874 the flag in the first 2 seconds of RTP packets sent. The inclusion 3875 of the flag MAY be ended if a ZRTP message (such as Hello) is 3876 received. 3878 13. IANA Considerations 3880 This specification defines a new SDP [RFC4566] attribute in 3881 Section 8. 3883 Contact name: Philip Zimmermann 3885 Attribute name: "zrtp-hash". 3887 Type of attribute: Media level. 3889 Subject to charset: Not. 3891 Purpose of attribute: The 'zrtp-hash' indicates that a UA supports the 3892 ZRTP protocol and provides a hash of the ZRTP Hello 3893 message. The ZRTP protocol version number is also 3894 specified. 3896 Allowed attribute values: Hex. 3898 14. Appendix - Media Security Requirements 3900 This section discuses how ZRTP meets all RTP security requirements 3901 discussed in the Media Security Requirements 3902 [I-D.ietf-sip-media-security-requirements] document without any 3903 dependencies on other protocols or extensions, unlike DTLS-SRTP 3904 [I-D.ietf-avt-dtls-srtp] which requires additional protocols and 3905 mechanisms. 3907 R-FORK-RETARGET is met since ZRTP is a media path key agreement 3908 protocol. 3910 R-DISTINCT is met since ZRTP uses ZIDs and allows multiple 3911 independent ZRTP exchanges to proceed. 3913 R-HERFP is met since ZRTP is a media path key agreement protocol. 3915 R-REUSE is met using the Multistream and Preshared modes. 3917 R-AVOID-CLIPPING is met since ZRTP is a media path key agreement 3918 protocol 3920 R-RTP-CHECK is met since the ZRTP packet format does not pass the 3921 RTP validity check 3923 R-ASSOC is met using the a=zrtp-hash SDP attribute in INVITEs and 3924 responses (Section 8.1). 3926 R-NEGOTIATE is met using the Commit message. 3928 R-PSTN is met since ZRTP can be implemented in Gateways. 3930 R-PFS is met using ZRTP Diffie-Hellman key agreement methods. 3932 R-COMPUTE is met using the Hello/Commit ZRTP exchange. 3934 R-CERTS is met using the verbal comparison of the SAS. 3936 R-FIPS is met since ZRTP uses only FIPS-approved algorithms in all 3937 relevant categories. To meet the FIPS-140 validation requirements 3938 set by NIST FIPS PUB 140-2 Annex A [FIPS-140-2-Annex-A] and NIST 3939 FIPS PUB 140-2 Annex D [FIPS-140-2-Annex-D], ZRTP is compliant 3940 with NIST SP 800-56A [SP800-56A], NIST SP 800-108 [SP800-108], 3941 NIST FIPS PUB 198-1 [FIPS-198-1], NIST FIPS PUB 180-3 3942 [FIPS-180-3], NIST SP 800-38A [SP800-38A], NIST FIPS PUB 197 3943 [FIPS-197], and NSA Suite B [NSA-Suite-B]. 3945 R-DOS is met since ZRTP does not introduce any new denial of 3946 service attacks. 3948 R-EXISTING is met since ZRTP can support the use of certificates 3949 or keys. 3951 R-AGILITY is met since the set of hash, cipher, authentication tag 3952 length, key agreement method, SAS type, and signature type can all 3953 be extended and negotiated. 3955 R-DOWNGRADE is met since ZRTP has protection against downgrade 3956 attacks. 3958 R-PASS-MEDIA is met since ZRTP prevents a passive adversary with 3959 access to the media path from gaining access to keying material 3960 used to protect SRTP media packets. 3962 R-PASS-SIG is met since ZRTP prevents a passive adversary with 3963 access to the signaling path from gaining access to keying 3964 material used to protect SRTP media packets. 3966 R-SIG-MEDIA is met using the a=zrtp-hash SDP attribute in INVITEs 3967 and responses. 3969 R-ID-BINDING is met using the a=zrtp-hash SDP attribute 3970 (Section 8.1). 3972 R-ACT-ACT is met using the a=zrtp-hash SDP attribute in INVITEs 3973 and responses. 3975 R-BEST-SECURE is met since ZRTP utilizes the RTP/AVP profile and 3976 hence best effort SRTP in every case. 3978 R-OTHER-SIGNALING is met since ZRTP can utilize modes in which 3979 there is no dependency on the signaling path. 3981 R-RECORDING is met using the ZRTP Disclosure flag. 3983 R-TRANSCODER is met if the transcoder operates as a trusted MitM 3984 (i.e. a PBX). 3986 R-ALLOW-RTP is met due to ZRTP's best effort encryption. 3988 15. Security Considerations 3990 This document is all about securely keying SRTP sessions. As such, 3991 security is discussed in every section. 3993 Most secure phones rely on a Diffie-Hellman exchange to agree on a 3994 common session key. But since DH is susceptible to a man-in-the- 3995 middle (MiTM) attack, it is common practice to provide a way to 3996 authenticate the DH exchange. In some military systems, this is done 3997 by depending on digital signatures backed by a centrally-managed PKI. 3998 A decade of industry experience has shown that deploying centrally 3999 managed PKIs can be a painful and often futile experience. PKIs are 4000 just too messy, and require too much activation energy to get them 4001 started. Setting up a PKI requires somebody to run it, which is not 4002 practical for an equipment provider. A service provider like a 4003 carrier might venture down this path, but even then you have to deal 4004 with cross-carrier authentication, certificate revocation lists, and 4005 other complexities. It is much simpler to avoid PKIs altogether, 4006 especially when developing secure commercial products. It is 4007 therefore more common for commercial secure phones in the PSTN world 4008 to augment the DH exchange with a Short Authentication String (SAS) 4009 combined with a hash commitment at the start of the key exchange, to 4010 shorten the length of SAS material that must be read aloud. No PKI 4011 is required for this approach to authenticating the DH exchange. The 4012 AT&T TSD 3600, Eric Blossom's COMSEC secure phones [comsec], PGPfone 4013 [pgpfone], and CryptoPhone [cryptophone] are all examples of products 4014 that took this simpler lightweight approach. 4016 The main problem with this approach is inattentive users who may not 4017 execute the voice authentication procedure, or unattended secure 4018 phone calls to answering machines that cannot execute it. 4020 Additionally, some people worry about voice spoofing. But it is a 4021 mistake to think this is simply an exercise in voice impersonation 4022 (perhaps this could be called the "Rich Little" attack). Although 4023 there are digital signal processing techniques for changing a 4024 person's voice, that does not mean a man-in-the-middle attacker can 4025 safely break into a phone conversation and inject his own short 4026 authentication string (SAS) at just the right moment. He doesn't 4027 know exactly when or in what manner the users will choose to read 4028 aloud the SAS, or in what context they will bring it up or say it, or 4029 even which of the two speakers will say it, or if indeed they both 4030 will say it. In addition, some methods of rendering the SAS involve 4031 using a list of words such as the PGP word list[Juola2], in a manner 4032 analogous to how pilots use the NATO phonetic alphabet to convey 4033 information. This can make it even more complicated for the 4034 attacker, because these words can be worked into the conversation in 4035 unpredictable ways. Remember that the attacker places a very high 4036 value on not being detected, and if he makes a mistake, he doesn't 4037 get to do it over. Some people have raised the question that even if 4038 the attacker lacks voice impersonation capabilities, it may be unsafe 4039 for people who don't know each other's voices to depend on the SAS 4040 procedure. This is not as much of a problem as it seems, because it 4041 isn't necessary that they recognize each other by their voice, it is 4042 only necessary that they detect that the voice used for the SAS 4043 procedure matches the voice in the rest of the phone conversation. 4045 A popular and field-proven approach is used by SSH (Secure Shell) 4046 [RFC4251], which Peter Gutmann likes to call the "baby duck" security 4047 model. SSH establishes a relationship by exchanging public keys in 4048 the initial session, when we assume no attacker is present, and this 4049 makes it possible to authenticate all subsequent sessions. A 4050 successful MiTM attacker has to have been present in all sessions all 4051 the way back to the first one, which is assumed to be difficult for 4052 the attacker. ZRTP's key continuity features are actually better 4053 than SSH, at least for VoIP, for reasons described in Section 15.1. 4054 All this is accomplished without resorting to a centrally-managed 4055 PKI. 4057 We use an analogous baby duck security model to authenticate the DH 4058 exchange in ZRTP. We don't need to exchange persistent public keys, 4059 we can simply cache a shared secret and re-use it to authenticate a 4060 long series of DH exchanges for secure phone calls over a long period 4061 of time. If we read aloud just one SAS, and then cache a shared 4062 secret for later calls to use for authentication, no new voice 4063 authentication rituals need to be executed. We just have to remember 4064 we did one already. 4066 If one party ever loses this cached shared secret, it is no longer 4067 available for authentication of DH exchanges. This cache mismatch 4068 situation is easy to detect by the party that still has a surviving 4069 shared secret cache entry. If it fails to match, either there is a 4070 MiTM attack or one side has lost their shared secret cache entry. 4071 The user agent that discovers the cache mismatch must alert the user 4072 that a cache mismatch has been detected, and that he must do a verbal 4073 comparison of the SAS to distinguish if the mismatch is because of a 4074 MiTM attack or because of the other party losing her cache. From 4075 that point on, the two parties start over with a new cached shared 4076 secret. Then they can go back to omitting the voice authentication 4077 on later calls. 4079 A particularly compelling reason why this approach is attractive is 4080 that SAS is easiest to implement when a graphical user interface or 4081 some sort of display is available, which raises the question of what 4082 to do when a display is less conveniently available. For example, 4083 some devices that implement ZRTP might have a graphical user 4084 interface that is only visible through a web browser, such as a PBX 4085 or some other nearby device that implements ZRTP as a "bump-in-the- 4086 wire". If we take an approach that greatly reduces the need for a 4087 SAS in each and every call, we can operate in products without a 4088 graphical user interface with greater ease. Then the SAS can be 4089 compared less frequently through a web browser, or it might even be 4090 presented as needed to the local user through a locally generated 4091 voice prompt, which the local user hears and verbally repeats and 4092 compares with the remote party. Using a voice prompt in this way is 4093 purely for the local ZRTP user agent to render the SAS to the local 4094 user, and is not to be confused with the verbal comparison of the SAS 4095 between two human users. 4097 It is a good idea to force your opponent to have to solve multiple 4098 problems in order to mount a successful attack. Some examples of 4099 widely differing problems we might like to present him with are: 4100 Stealing a shared secret from one of the parties, being present on 4101 the very first session and every subsequent session to carry out an 4102 active MiTM attack, and solving the discrete log problem. We want to 4103 force the opponent to solve more than one of these problems to 4104 succeed. 4106 ZRTP can use different kinds of shared secrets. Each type of shared 4107 secret is determined by a different method. All of the shared 4108 secrets are hashed together to form a session key to encrypt the 4109 call. An attacker must defeat all of the methods in order to 4110 determine the session key. 4112 First, there is the shared secret determined entirely by a Diffie- 4113 Hellman key agreement. It changes with every call, based on random 4114 numbers. An attacker may attempt a classic DH MiTM attack on this 4115 secret, but we can protect against this by displaying and reading 4116 aloud an SAS, combined with adding a hash commitment at the beginning 4117 of the DH exchange. 4119 Second, there is an evolving shared secret, or ongoing shared secret 4120 that is automatically changed and refreshed and cached with every new 4121 session. We will call this the cached shared secret, or sometimes 4122 the retained shared secret. Each new image of this ongoing secret is 4123 a non-invertable function of its previous value and the new secret 4124 derived by the new DH agreement. It is possible that no cached 4125 shared secret is available, because there were no previous sessions 4126 to inherit this value from, or because one side loses its cache. 4128 There are other approaches for key agreement for SRTP that compute a 4129 shared secret using information in the signaling. For example, 4130 [RFC4567] describes how to carry a MIKEY (Multimedia Internet KEYing) 4131 [RFC3830] payload in SDP [RFC4566]. Or RFC 4568 (SDES) [RFC4568] 4132 describes directly carrying SRTP keying and configuration information 4133 in SDP. ZRTP does not rely on the signaling to compute a shared 4134 secret, but if a client does produce a shared secret via the 4135 signaling, and makes it available to the ZRTP protocol, ZRTP can make 4136 use of this shared secret to augment the list of shared secrets that 4137 will be hashed together to form a session key. This way, any 4138 security weaknesses that might compromise the shared secret 4139 contributed by the signaling will not harm the final resulting 4140 session key. 4142 The shared secret provided by the signaling (if available), the 4143 shared secret computed by DH, and the cached shared secret are all 4144 hashed together to compute the session key for a call. If the cached 4145 shared secret is not available, it is omitted from the hash 4146 computation. If the signaling provides no shared secret, it is also 4147 omitted from the hash computation. 4149 No DH MiTM attack can succeed if the ongoing shared secret is 4150 available to the two parties, but not to the attacker. This is 4151 because the attacker cannot compute a common session key with either 4152 party without knowing the cached secret component, even if he 4153 correctly executes a classic DH MiTM attack. 4155 15.1. Self-healing Key Continuity Feature 4157 The key continuity features of ZRTP are analogous to those provided 4158 by SSH (Secure Shell) [RFC4251], but they differ in one respect. SSH 4159 caches public signature keys that never change, and uses a permanent 4160 private signature key that must be guarded from disclosure. If 4161 someone steals your SSH private signature key, they can impersonate 4162 you in all future sessions and mount a successful MiTM attack any 4163 time they want. 4165 ZRTP caches symmetric key material used to compute secret session 4166 keys, and these values change with each session. If someone steals 4167 your ZRTP shared secret cache, they only get one chance to mount a 4168 MiTM attack, in the very next session. If they miss that chance, the 4169 retained shared secret is refreshed with a new value, and the window 4170 of vulnerability heals itself, which means they are locked out of any 4171 future opportunities to mount a MiTM attack. This gives ZRTP a 4172 "self-healing" feature if any cached key material is compromised. 4174 A MiTM attacker must always be in the media path. This presents a 4175 significant operational burden for the attacker in many VoIP usage 4176 scenarios, because being in the media path for every call is often 4177 harder than being in the signaling path. This will likely create 4178 coverage gaps in the attacker's opportunities to mount a MiTM attack. 4179 ZRTP's self-healing key continuity features are better than SSH at 4180 exploiting any temporary gaps in MiTM attack coverage. Thus, ZRTP 4181 quickly recovers from any disclosure of cached key material. 4183 The infamous Debian OpenSSL weak key vulnerability [dsa-1571] 4184 (discovered and patched in May 2008) offers a real-world example of 4185 why ZRTP's self-healing scheme is a good way to do key continuity. 4186 The Debian bug resulted in the production of a lot of weak SSH (and 4187 TLS/SSL) keys, which continued to compromise security even after the 4188 bug had been patched. In contrast, ZRTP's key continuity scheme adds 4189 new entropy to the cached key material with every call, so old 4190 deficiencies in entropy are washed away with each new session. 4192 It should be noted that the addition of shared secret entropy from 4193 previous sessions can extend the strength of the new session key to 4194 AES-256 levels, even if the new session uses Diffie-Hellman keys no 4195 larger than DH-3072 or ECDH-256, provided the cached shared secrets 4196 were initially established when the wiretapper was not present. This 4197 is why AES-256 MAY be used with the smaller DH key sizes in 4198 Section 5.1.5, despite the key strength comparisons in Table 2 of 4199 [SP800-57-Part1]. 4201 Caching shared symmetric key material is also less CPU intensive 4202 compared with using digital signatures, which may be important for 4203 low-power mobile platforms. 4205 16. Acknowledgments 4207 The authors would like to thank Bryce Wilcox-O'Hearn and Colin Plumb 4208 for their contributions to the design of this protocol, and to thank 4209 Hal Finney, Viktor Krikun, Werner Dittmann, Jon Peterson, Dan Wing, 4210 Sagar Pai, Lily Chen, Colin Perkins, David McGrew, and Roni Even for 4211 their helpful comments and suggestions. 4213 The use of hash chains to key HMACs in ZRTP is similar to Adrian 4214 Perrig's TESLA protocol [TESLA]. 4216 17. References 4218 17.1. Normative References 4220 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 4221 Requirement Levels", BCP 14, RFC 2119, March 1997. 4223 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 4224 Jacobson, "RTP: A Transport Protocol for Real-Time 4225 Applications", STD 64, RFC 3550, July 2003. 4227 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 4228 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 4229 RFC 3711, March 2004. 4231 [RFC3526] Kivinen, T. and M. Kojo, "More Modular Exponential (MODP) 4232 Diffie-Hellman groups for Internet Key Exchange (IKE)", 4233 RFC 3526, May 2003. 4235 [RFC3309] Stone, J., Stewart, R., and D. Otis, "Stream Control 4236 Transmission Protocol (SCTP) Checksum Change", RFC 3309, 4237 September 2002. 4239 [RFC4231] Nystrom, M., "Identifiers and Test Vectors for HMAC-SHA- 4240 224, HMAC-SHA-256, HMAC-SHA-384, and HMAC-SHA-512", 4241 RFC 4231, December 2005. 4243 [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- 4244 Hashing for Message Authentication", RFC 2104, 4245 February 1997. 4247 [SP800-90] 4248 Barker, E. and J. Kelsey, "Recommendation for Random 4249 Number Generation Using Deterministic Random Bit 4250 Generators", NIST Special Publication 800-90 (Revised) 4251 March 2007. 4253 [SP800-56A] 4254 Barker, E., Johnson, D., and M. Smid, "Recommendation for 4255 Pair-Wise Key Establishment Schemes Using Discrete 4256 Logarithm Cryptography", NIST Special Publication 800- 4257 56A Revision 1, March 2007. 4259 [SP800-108] 4260 Chen, L., "Recommendation for Key Derivation Using 4261 Pseudorandom Functions", NIST Special Publication 800- 4262 108 November 2008. 4264 [FIPS-197] 4265 "Advanced Encryption Standard (AES)", NIST FIPS PUB 4266 197 November 2001. 4268 [FIPS-180-3] 4269 "Secure Hash Standard (SHS)", NIST FIPS PUB 180-3 October 4270 2008. 4272 [FIPS-198-1] 4273 "The Keyed-Hash Message Authentication Code (HMAC)", NIST 4274 FIPS PUB 198-1 July 2008. 4276 [FIPS-140-2-Annex-A] 4277 "Annex A: Approved Security Functions for FIPS PUB 4278 140-2", NIST FIPS PUB 140-2 Annex A October 2008. 4280 [FIPS-140-2-Annex-D] 4281 "Annex D: Approved Key Establishment Techniques for FIPS 4282 PUB 140-2", NIST FIPS PUB 140-2 Annex D January 2008. 4284 [NSA-Suite-B] 4285 "NSA Suite B Cryptography", NSA Information Assurance 4286 Directorate NSA Suite B Cryptography. 4288 [RFC4753] Fu, D. and J. Solinas, "ECP Groups For IKE and IKEv2", 4289 RFC 4753, January 2007. 4291 [FIPS-186-3] 4292 "Digital Signature Standard (DSS)", NIST FIPS PUB 186- 4293 3 Draft, November 2008. 4295 [SP800-38A] 4296 Dworkin, M., "Recommendation for Block Cipher Modes of 4297 Operation", NIST Special Publication 800-38A 2001 Edition. 4299 [XEP-0262] 4300 Saint-Andre, P., "XEP-0262: Use of ZRTP in Jingle RTP 4301 Sessions", XEP-0262 . 4303 [z-base-32] 4304 Wilcox, B., "Human-oriented base-32 encoding", 4305 http://zooko.com/repos/z-base-32/base32/DESIGN . 4307 [pgpwordlist] 4308 "PGP Words", http://en.wikipedia.org/wiki/PGP_Words . 4310 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 4311 Description Protocol", RFC 4566, July 2006. 4313 17.2. Informative References 4315 [I-D.ietf-sip-media-security-requirements] 4316 Wing, D., Fries, S., Tschofenig, H., and F. Audet, 4317 "Requirements and Analysis of Media Security Management 4318 Protocols", draft-ietf-sip-media-security-requirements-09 4319 (work in progress), January 2009. 4321 [SP800-57-Part1] 4322 Barker, E., Barker, W., Burr, W., Polk, W., and M. Smid, 4323 "Recommendation for Key Management - Part 1: General 4324 (Revised)", NIST Special Publication 800-57 - Part 4325 1 Revised March 2007. 4327 [Ferguson] 4328 Ferguson, N. and B. Schneier, "Practical Cryptography", 4329 Wiley Publishing 2003. 4331 [RFC4086] Eastlake, D., Schiller, J., and S. Crocker, "Randomness 4332 Requirements for Security", BCP 106, RFC 4086, June 2005. 4334 [Juola1] Juola, P. and P. Zimmermann, "Whole-Word Phonetic 4335 Distances and the PGPfone Alphabet", Proceedings of the 4336 International Conference of Spoken Language Processing 4337 (ICSLP-96) 1996. 4339 [Juola2] Juola, P., "Isolated Word Confusion Metrics and the 4340 PGPfone Alphabet", Proceedings of New Methods in Language 4341 Processing 1996. 4343 [pgpfone] Zimmermann, P., "PGPfone", 4344 http://philzimmermann.com/docs/pgpfone10b7.pdf . 4346 [zfone] Zimmermann, P., "Zfone", 4347 http://www.philzimmermann.com/zfone . 4349 [Byzantine] 4350 "The Two Generals' Problem", 4351 http://en.wikipedia.org/wiki/Two_Generals%27_Problem . 4353 [TESLA] Perrig, A., Canetti, R., Tygar, J., and D. Song, "The 4354 TESLA Broadcast Authentication Protocol", http:// 4355 www.ece.cmu.edu/~adrian/projects/tesla-cryptobytes/ 4356 tesla-cryptobytes.pdf . 4358 [SHA-3] "Cryptographic Hash Algorithm Competition", NIST Computer 4359 Security Resource Center Cryptographic Hash Project. 4361 [comsec] Blossom, E., "The VP1 Protocol for Voice Privacy Devices 4362 Version 1.2", http://www.comsec.com/vp1-protocol.pdf . 4364 [cryptophone] 4365 "CryptoPhone", http://www.cryptophone.de/ . 4367 [Wright1] Wright, C., Ballard, L., Coull, S., Monrose, F., and G. 4368 Masson, "Spot me if you can: Uncovering spoken phrases in 4369 encrypted VoIP conversations", Proceedings of the 2008 4370 IEEE Symposium on Security and Privacy 2008. 4372 [dsa-1571] 4373 "Debian Security Advisory - OpenSSL predictable random 4374 number generator", 4375 http://www.debian.org/security/2008/dsa-1571 . 4377 [I-D.ietf-avt-srtp-big-aes] 4378 McGrew, D., "The use of AES-192 and AES-256 in Secure 4379 RTP", http://www1.tools.ietf.org/html/ 4380 draft-ietf-avt-srtp-big-aes . 4382 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 4383 A., Peterson, J., Sparks, R., Handley, M., and E. 4385 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 4386 June 2002. 4388 [RFC4251] Ylonen, T. and C. Lonvick, "The Secure Shell (SSH) 4389 Protocol Architecture", RFC 4251, January 2006. 4391 [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session 4392 Description Protocol (SDP) Security Descriptions for Media 4393 Streams", RFC 4568, July 2006. 4395 [RFC4567] Arkko, J., Lindholm, F., Naslund, M., Norrman, K., and E. 4396 Carrara, "Key Management Extensions for Session 4397 Description Protocol (SDP) and Real Time Streaming 4398 Protocol (RTSP)", RFC 4567, July 2006. 4400 [RFC3830] Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K. 4401 Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830, 4402 August 2004. 4404 [RFC3514] Bellovin, S., "The Security Flag in the IPv4 Header", 4405 RFC 3514, April 1 2003. 4407 [RFC4474] Peterson, J. and C. Jennings, "Enhancements for 4408 Authenticated Identity Management in the Session 4409 Initiation Protocol (SIP)", RFC 4474, August 2006. 4411 [I-D.ietf-mmusic-ice] 4412 Rosenberg, J., "Interactive Connectivity Establishment 4413 (ICE): A Protocol for Network Address Translator (NAT) 4414 Traversal for Offer/Answer Protocols", 4415 draft-ietf-mmusic-ice-19 (work in progress), October 2007. 4417 [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol 4418 (SIP) Call Control - Conferencing for User Agents", 4419 BCP 119, RFC 4579, August 2006. 4421 [I-D.wing-sip-identity-media] 4422 Wing, D. and H. Kaplan, "SIP Identity using Media Path", 4423 draft-wing-sip-identity-media-02 (work in progress), 4424 February 2008. 4426 [RFC3824] Peterson, J., Liu, H., Yu, J., and B. Campbell, "Using 4427 E.164 numbers with the Session Initiation Protocol (SIP)", 4428 RFC 3824, June 2004. 4430 [I-D.ietf-avt-dtls-srtp] 4431 McGrew, D. and E. Rescorla, "Datagram Transport Layer 4432 Security (DTLS) Extension to Establish Keys for Secure 4433 Real-time Transport Protocol (SRTP)", 4434 draft-ietf-avt-dtls-srtp-07 (work in progress), 4435 February 2009. 4437 Authors' Addresses 4439 Philip Zimmermann 4440 Zfone Project 4442 Email: prz@mit.edu 4444 Alan Johnston (editor) 4445 Avaya 4446 St. Louis, MO 63124 4448 Email: alan@sipstation.com 4450 Jon Callas 4451 PGP Corporation 4453 Email: jon@pgp.com