idnits 2.17.1 draft-zimmermann-avt-zrtp-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 4 instances of too long lines in the document, the longest one being 12 characters in excess of 72. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 26, 2009) is 5567 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 3309 (Obsoleted by RFC 4960) ** Obsolete normative reference: RFC 4753 (Obsoleted by RFC 5903) ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 4474 (Obsoleted by RFC 8224) == Outdated reference: A later version (-07) exists of draft-ietf-avt-dtls-srtp-06 Summary: 4 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Zimmermann 3 Internet-Draft Zfone Project 4 Intended status: Informational A. Johnston, Ed. 5 Expires: July 30, 2009 Avaya 6 J. Callas 7 PGP Corporation 8 January 26, 2009 10 ZRTP: Media Path Key Agreement for Secure RTP 11 draft-zimmermann-avt-zrtp-13 13 Status of this Memo 15 This Internet-Draft is submitted to IETF in full conformance with the 16 provisions of BCP 78 and BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on July 30, 2009. 36 Copyright Notice 38 Copyright (c) 2009 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. 48 Abstract 50 This document defines ZRTP, a protocol for media path Diffie-Hellman 51 exchange to agree on a session key and parameters for establishing 52 Secure Real-time Transport Protocol (SRTP) sessions. The ZRTP 53 protocol is media path keying because it is multiplexed on the same 54 port as RTP and does not require support in the signaling protocol. 55 ZRTP does not assume a Public Key Infrastructure (PKI) or require the 56 complexity of certificates in end devices. For the media session, 57 ZRTP provides confidentiality, protection against man-in-the-middle 58 (MiTM) attacks, and, in cases where the signaling protocol provides 59 end-to-end integrity protection, authentication. ZRTP can utilize a 60 Session Description Protocol (SDP) attribute to provide discovery and 61 authentication through the signaling channel. To provide best effort 62 SRTP, ZRTP utilizes normal RTP/AVP profiles. 64 Table of Contents 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 67 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 68 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 69 3.1. Key Agreement Modes . . . . . . . . . . . . . . . . . . . 7 70 3.1.1. Diffie-Hellman Mode Overview . . . . . . . . . . . . . 7 71 3.1.2. Preshared Mode Overview . . . . . . . . . . . . . . . 9 72 3.1.3. Multistream Mode Overview . . . . . . . . . . . . . . 9 73 4. Protocol Description . . . . . . . . . . . . . . . . . . . . . 10 74 4.1. Discovery . . . . . . . . . . . . . . . . . . . . . . . . 10 75 4.1.1. Protocol Version Negotiation . . . . . . . . . . . . . 11 76 4.2. Commit Contention . . . . . . . . . . . . . . . . . . . . 13 77 4.3. Matching Shared Secret Determination . . . . . . . . . . . 14 78 4.3.1. Calculation and comparison of hashes of shared 79 secrets . . . . . . . . . . . . . . . . . . . . . . . 15 80 4.3.2. Handling a Shared Secret Cache Mismatch . . . . . . . 16 81 4.4. DH and non-DH key agreements . . . . . . . . . . . . . . . 17 82 4.4.1. Diffie-Hellman Mode . . . . . . . . . . . . . . . . . 17 83 4.4.1.1. Hash Commitment in Diffie-Hellman Mode . . . . . . 18 84 4.4.1.2. Responder Behavior in Diffie-Hellman Mode . . . . 19 85 4.4.1.3. Initiator Behavior in Diffie-Hellman Mode . . . . 19 86 4.4.1.4. Shared Secret Calculation for DH Mode . . . . . . 20 87 4.4.2. Preshared Mode . . . . . . . . . . . . . . . . . . . . 21 88 4.4.2.1. Commitment in Preshared Mode . . . . . . . . . . . 22 89 4.4.2.2. Initiator Behavior in Preshared Mode . . . . . . . 22 90 4.4.2.3. Responder Behavior in Preshared Mode . . . . . . . 23 91 4.4.2.4. Shared Secret Calculation for Preshared Mode . . . 24 92 4.4.3. Multistream Mode . . . . . . . . . . . . . . . . . . . 25 93 4.4.3.1. Commitment in Multistream Mode . . . . . . . . . . 25 94 4.4.3.2. Shared Secret Calculation for Multistream Mode . . 26 96 4.5. Key Derivations . . . . . . . . . . . . . . . . . . . . . 27 97 4.5.1. The ZRTP Key Derivation Function . . . . . . . . . . . 27 98 4.5.2. Deriving ZRTPSess Key and SAS in DH or Preshared 99 modes . . . . . . . . . . . . . . . . . . . . . . . . 28 100 4.5.3. Deriving the rest of the keys from s0 . . . . . . . . 29 101 4.6. Confirmation . . . . . . . . . . . . . . . . . . . . . . . 30 102 4.6.1. Updating the Cache of Shared Secrets . . . . . . . . . 31 103 4.7. Termination . . . . . . . . . . . . . . . . . . . . . . . 32 104 4.7.1. Termination via Error message . . . . . . . . . . . . 32 105 4.7.2. Termination via GoClear message . . . . . . . . . . . 32 106 4.7.2.1. Key Destruction for GoClear message . . . . . . . 34 107 4.7.3. Key Destruction at Termination . . . . . . . . . . . . 34 108 4.8. Random Number Generation . . . . . . . . . . . . . . . . . 35 109 4.9. ZID and Cache Operation . . . . . . . . . . . . . . . . . 35 110 4.9.1. Cacheless implementations . . . . . . . . . . . . . . 36 111 5. ZRTP Messages . . . . . . . . . . . . . . . . . . . . . . . . 37 112 5.1. ZRTP Message Formats . . . . . . . . . . . . . . . . . . . 38 113 5.1.1. Message Type Block . . . . . . . . . . . . . . . . . . 38 114 5.1.2. Hash Type Block . . . . . . . . . . . . . . . . . . . 39 115 5.1.2.1. Implicit Hash and HMAC algorithm . . . . . . . . . 40 116 5.1.3. Cipher Type Block . . . . . . . . . . . . . . . . . . 40 117 5.1.4. Auth Tag Type Block . . . . . . . . . . . . . . . . . 41 118 5.1.5. Key Agreement Type Block . . . . . . . . . . . . . . . 41 119 5.1.6. SAS Type Block . . . . . . . . . . . . . . . . . . . . 43 120 5.1.7. Signature Type Block . . . . . . . . . . . . . . . . . 44 121 5.2. Hello message . . . . . . . . . . . . . . . . . . . . . . 44 122 5.3. HelloACK message . . . . . . . . . . . . . . . . . . . . . 45 123 5.4. Commit message . . . . . . . . . . . . . . . . . . . . . . 46 124 5.5. DHPart1 message . . . . . . . . . . . . . . . . . . . . . 49 125 5.6. DHPart2 message . . . . . . . . . . . . . . . . . . . . . 51 126 5.7. Confirm1 and Confirm2 messages . . . . . . . . . . . . . . 53 127 5.8. Conf2ACK message . . . . . . . . . . . . . . . . . . . . . 54 128 5.9. Error message . . . . . . . . . . . . . . . . . . . . . . 55 129 5.10. ErrorACK message . . . . . . . . . . . . . . . . . . . . . 56 130 5.11. GoClear message . . . . . . . . . . . . . . . . . . . . . 57 131 5.12. ClearACK message . . . . . . . . . . . . . . . . . . . . . 57 132 5.13. SASrelay message . . . . . . . . . . . . . . . . . . . . . 58 133 5.14. RelayACK message . . . . . . . . . . . . . . . . . . . . . 60 134 6. Retransmissions . . . . . . . . . . . . . . . . . . . . . . . 61 135 7. Short Authentication String . . . . . . . . . . . . . . . . . 62 136 7.1. SAS Verified Flag . . . . . . . . . . . . . . . . . . . . 63 137 7.2. Signing the SAS . . . . . . . . . . . . . . . . . . . . . 65 138 7.3. Relaying the SAS through a PBX . . . . . . . . . . . . . . 65 139 7.3.1. PBX Enrollment and the PBX Enrollment Flag . . . . . . 67 140 8. Signaling Interactions . . . . . . . . . . . . . . . . . . . . 68 141 8.1. Binding the media stream to the signaling layer via 142 the Hello Hash . . . . . . . . . . . . . . . . . . . . . . 70 143 8.1.1. Integrity-protected signaling enables 144 integrity-protected DH exchange . . . . . . . . . . . 71 145 8.2. Deriving the SRTP secret (srtps) from the signaling 146 layer . . . . . . . . . . . . . . . . . . . . . . . . . . 73 147 8.3. Codec Selection for Secure Media . . . . . . . . . . . . . 73 148 9. False ZRTP Packet Rejection . . . . . . . . . . . . . . . . . 74 149 10. Intermediary ZRTP Devices . . . . . . . . . . . . . . . . . . 75 150 11. The ZRTP Disclosure flag . . . . . . . . . . . . . . . . . . . 77 151 11.1. Guidelines on Proper Implementation of the Disclosure 152 Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 153 12. RTP Header Extension Flag for ZRTP . . . . . . . . . . . . . . 79 154 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 80 155 14. Appendix - Media Security Requirements . . . . . . . . . . . . 80 156 15. Security Considerations . . . . . . . . . . . . . . . . . . . 82 157 15.1. Self-healing Key Continuity Feature . . . . . . . . . . . 86 158 16. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 87 159 17. References . . . . . . . . . . . . . . . . . . . . . . . . . . 87 160 17.1. Normative References . . . . . . . . . . . . . . . . . . . 87 161 17.2. Informative References . . . . . . . . . . . . . . . . . . 89 162 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 91 164 1. Introduction 166 ZRTP is a key agreement protocol which performs Diffie-Hellman key 167 exchange during call setup in the media path, and is transported over 168 the same port as the Real-time Transport Protocol (RTP) [RFC3550] 169 media stream which has been established using a signaling protocol 170 such as Session Initiation Protocol (SIP) [RFC3261]. This generates 171 a shared secret which is then used to generate keys and salt for a 172 Secure RTP (SRTP) [RFC3711] session. ZRTP borrows ideas from PGPfone 173 [pgpfone]. A reference implementation of ZRTP is available as Zfone 174 [zfone]. 176 The ZRTP protocol has some nice cryptographic features lacking in 177 many other approaches to media session encryption. Although it uses 178 a public key algorithm, it does not rely on a public key 179 infrastructure (PKI). In fact, it does not use persistent public 180 keys at all. It uses ephemeral Diffie-Hellman (DH) with hash 181 commitment, and allows the detection of man-in-the-middle (MiTM) 182 attacks by displaying a short authentication string (SAS) for the 183 users to read and verbally compare over the phone. It has Perfect 184 Forward Secrecy, meaning the keys are destroyed at the end of the 185 call, which precludes retroactively compromising the call by future 186 disclosures of key material. But even if the users are too lazy to 187 bother with short authentication strings, we still get reasonable 188 authentication against a MiTM attack, based on a form of key 189 continuity. It does this by caching some key material to use in the 190 next call, to be mixed in with the next call's DH shared secret, 191 giving it key continuity properties analogous to SSH. All this is 192 done without reliance on a PKI, key certification, trust models, 193 certificate authorities, or key management complexity that bedevils 194 the email encryption world. It also does not rely on SIP signaling 195 for the key management, and in fact does not rely on any servers at 196 all. It performs its key agreements and key management in a purely 197 peer-to-peer manner over the RTP packet stream. 199 In cases where the short authentication string (SAS) cannot be 200 verbally compared by two human users, the SAS can be authenticated by 201 exchanging an optional signature over the SAS (described in 202 Section 7.2). 204 ZRTP can be used and discovered without being declared or indicated 205 in the signaling path. This provides a best effort SRTP capability. 206 Also, this reduces the complexity of implementations and minimizes 207 interdependency between the signaling and media layers. However, 208 when ZRTP is indicated in the signaling via the zrtp-hash SDP 209 attribute, ZRTP has additional useful properties. By sending a hash 210 of the ZRTP Hello message in the signaling, ZRTP provides a useful 211 binding between the signaling and media paths, which is explained in 212 Section 8.1. When this is done through a signaling path that has 213 end-to-end integrity protection, the DH exchange is automatically 214 protected from a MiTM attack, which is explained in Section 8.1.1. 216 2. Terminology 218 In this document, the key words "MUST", "MUST NOT", "REQUIRED", 219 "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", 220 and "OPTIONAL" are to be interpreted as described in RFC 2119 and 221 indicate requirement levels for compliant implementations [RFC2119]. 223 3. Overview 225 This section provides a description of how ZRTP works. This 226 description is non-normative in nature but is included to build 227 understanding of the protocol. 229 ZRTP is negotiated the same way a conventional RTP session is 230 negotiated in an offer/answer exchange using the standard AVP/RTP 231 profile. The ZRTP protocol begins after two endpoints have utilized 232 a signaling protocol such as SIP and are ready to exchange media. If 233 ICE [I-D.ietf-mmusic-ice] is being used, ZRTP begins after ICE has 234 completed its connectivity checks. 236 ZRTP is multiplexed on the same ports as RTP. It uses a unique 237 header that makes it clearly differentiable from RTP or STUN. 239 In environments in which sending ZRTP packets to non-ZRTP endpoints 240 might cause problems and signaling path discovery is not an option, 241 ZRTP endpoints can include the RTP header extension flag for ZRTP in 242 normal RTP packets sent at the start of a session as a probe to 243 discover if the other endpoint supports ZRTP. If the flag is 244 received from the other endpoint, ZRTP messages can then be 245 exchanged. 247 A ZRTP endpoint initiates the exchange by sending a ZRTP Hello 248 message to the other endpoint. The purpose of the Hello message is 249 to confirm the endpoint supports the protocol and to see what 250 algorithms the two ZRTP endpoints have in common. 252 The Hello message contains the SRTP configuration options, and the 253 ZID. Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID 254 that is generated once at installation time. ZIDs are discovered 255 during the Hello message exchange. The received ZID is used to look 256 up retained shared secrets from previous ZRTP sessions with the 257 endpoint. 259 A response to a ZRTP Hello message is a ZRTP HelloACK message. The 260 HelloACK message simply acknowledges receipt of the Hello. Since RTP 261 commonly uses best effort UDP transport, ZRTP has retransmission 262 timers in case of lost datagrams. There are two timers, both with 263 exponential backoff mechanisms. One timer is used for 264 retransmissions of Hello messages and the other is used for 265 retransmissions of all other messages after receipt of a HelloACK. 267 If an integrity protected signaling channel is available, a hash of 268 the Hello message can be sent. This allows rejection of false 269 injected ZRTP Hello messages by an attacker. 271 Hello and other ZRTP messages also contain a hash image that is used 272 to link the messages together. This allows rejection of false 273 injected ZRTP messages during an exchange. 275 3.1. Key Agreement Modes 277 After both endpoints exchange Hello and HelloACK messages, the key 278 agreement exchange can begin with the ZRTP Commit message. ZRTP 279 supports a number of key agreement modes including both Diffie- 280 Hellman and non-Diffie-Hellman modes as described in the following 281 sections. 283 The Commit message may be sent immediately after both endpoints have 284 completed the Hello/HelloAck discovery handshake. Or it may be 285 deferred until later in the call, after the participants engage in 286 some unencrypted conversation. The Commit message may be manually 287 activated by a user interface element, such as a GO SECURE button, 288 which becomes enabled after the Hello/HelloAck discovery phase. This 289 emulates the user experience of a number of secure phones in the PSTN 290 world [comsec]. However, it is expected that most simple ZRTP user 291 agents will omit such buttons and proceed directly to secure mode by 292 sending a Commit message immediately after the Hello/HelloAck 293 handshake. 295 3.1.1. Diffie-Hellman Mode Overview 297 An example ZRTP call flow is shown in Figure 1 below. Note that the 298 order of the Hello/HelloACK exchanges in F1/F2 and F3/F4 may be 299 reversed. That is, either Alice or Bob might send the first Hello 300 message. Note that the endpoint which sends the Commit message is 301 considered the initiator of the ZRTP session and drives the key 302 agreement exchange. The Diffie-Hellman public values are exchanged 303 in the DHPart1 and DHPart2 messages. SRTP keys and salts are then 304 calculated. 306 Alice Bob 307 | | 308 | Alice and Bob establish a media session. | 309 | They initiate ZRTP on media ports | 310 | | 311 | F1 Hello (version, options, Alice's ZID) | 312 |-------------------------------------------------->| 313 | HelloACK F2 | 314 |<--------------------------------------------------| 315 | Hello (version, options, Bob's ZID) F3 | 316 |<--------------------------------------------------| 317 | F4 HelloACK | 318 |-------------------------------------------------->| 319 | | 320 | Bob acts as the initiator | 321 | | 322 | Commit (Bob's ZID, options, hvi) F5 | 323 |<--------------------------------------------------| 324 | F6 DHPart1 (pvr, shared secret hashes) | 325 |-------------------------------------------------->| 326 | DHPart2 (pvi, shared secret hashes) F7 | 327 |<--------------------------------------------------| 328 | | 329 | Alice and Bob generate SRTP session key. | 330 | | 331 | F8 Confirm1 (HMAC, D,A,V,E flags, sig) | 332 |-------------------------------------------------->| 333 | Confirm2 (HMAC, D,A,V,E flags, sig) F9 | 334 |<--------------------------------------------------| 335 | F10 Conf2ACK | 336 |-------------------------------------------------->| 337 | SRTP begins | 338 |<=================================================>| 339 | | 341 Figure 1: Establishment of an SRTP session using ZRTP 343 ZRTP authentication uses a Short Authentication String (SAS) which is 344 ideally displayed for the human user. Alternatively, the SAS can be 345 authenticated by exchanging an OPTIONAL digital signature (sig) over 346 the short authentication string in the Confirm1 or Confirm2 messages 347 (described in Section 7.2). 349 The ZRTP Confirm1 and Confirm2 messages are sent for a number of 350 reasons, not the least of which is they confirm that all the key 351 agreement calculations were successful and thus the encryption will 352 work. They also carry other information such as the Disclosure flag 353 (D), the Allow Clear flag (A), the SAS Verified flag (V), and the PBX 354 Enrollment flag (E). All flags are encrypted to shield them from a 355 passive observer. 357 3.1.2. Preshared Mode Overview 359 In the Preshared Mode, endpoints can skip the DH calculation if they 360 have a shared secret from a previous ZRTP session. Preshared mode is 361 indicated in the Commit message and results in the same call flow as 362 Multistream mode. The principal difference between Multistream mode 363 and Preshared mode is that Preshared mode uses a previously cached 364 shared secret, rs1, instead of an active ZRTP Session key as the 365 initial keying material. 367 This mode could be useful for slow processor endpoints so that a DH 368 calculation does not need to be performed every session. Or, this 369 mode could be used to rapidly re-establish an earlier session that 370 was recently torn down or interrupted without the need to perform 371 another DH calculation. 373 Preshared mode has forward secrecy properties. If a phone's cache is 374 captured by an opponent, the cached shared secrets cannot be used to 375 recover earlier encrypted calls, because the shared secrets are 376 replaced with new ones in each new call, as in DH mode. However, the 377 captured secrets can be used by a passive wiretapper in the media 378 path to decrypt the next call, if the next call is in Preshared mode. 379 This differs from DH mode, which requires an active MiTM wiretapper 380 to exploit captured secrets in the next call. However, if the next 381 call is missed by the wiretapper, he cannot wiretap any further 382 calls. It thus preserves most of the self-healing properties 383 (Section 15.1) of key continuity enjoyed by DH mode. 385 3.1.3. Multistream Mode Overview 387 Multistream mode is an alternative key agreement method when two 388 endpoints have an established SRTP media stream between them and 389 hence an active ZRTP Session key. ZRTP can derive multiple SRTP keys 390 from a single DH exchange. For example, an established secure voice 391 call that adds a video stream must use Multistream mode to quickly 392 initiate the video stream without a second DH exchange. 394 When Multistream mode is indicated in the Commit message, a call flow 395 similar to Figure 1 is used, but no DH calculation is performed by 396 either endpoint and the DHPart1 and DHPart2 messages are omitted. 397 The Confirm1, Confirm2, and Conf2ACK messages are still sent. Since 398 the cache is not affected during this mode, multiple Multistream ZRTP 399 exchanges can be performed in parallel between two endpoints. 401 When adding additional media streams to an existing call, only 402 Multistream mode is used. Only one DH operation is performed, just 403 for the first media stream. 405 4. Protocol Description 407 This section begins the normative description of the protocol. 409 ZRTP MUST be multiplexed on the same ports as the RTP media packets. 411 To support best effort encryption from the Media Security 412 Requirements [I-D.ietf-sip-media-security-requirements], ZRTP uses 413 normal RTP/AVP profile (AVP) media lines in the initial offer/answer 414 exchange. The ZRTP SDP attribute a=zrtp-hash defined in Section 8 415 SHOULD be used in all offers and answers to indicate support for the 416 ZRTP protocol. The Secure RTP/AVP (SAVP) profile MAY be used in 417 subsequent offer/answer exchanges after a successful ZRTP exchange 418 has resulted in an SRTP session, or if it is known the other endpoint 419 supports this profile. 421 The use of the RTP/SAVP profile has caused failures in negotiating 422 best effort SRTP due to the limitations on negotiating profiles 423 using SDP. This is why ZRTP supports the RTP/AVP profile and 424 includes its own discovery mechanisms. 426 In all key agreement modes, the initiator SHOULD NOT send RTP media 427 after sending the Commit message, and MUST NOT send SRTP media before 428 receiving either the Conf2ACK or the first SRTP media (with a valid 429 SRTP auth tag) from the responder. The responder SHOULD NOT send RTP 430 media after receiving the Commit message, and MUST NOT send SRTP 431 media before receiving the Confirm2 message. 433 4.1. Discovery 435 During the ZRTP discovery phase, a ZRTP endpoint discovers if the 436 other endpoint supports ZRTP and the supported algorithms and 437 options. This information is transported in a Hello message, 438 described in Section 5.2. 440 ZRTP endpoints SHOULD include the SDP attribute a=zrtp-hash in offers 441 and answers, as defined in Section 8. ZRTP MAY use an RTP [RFC3550] 442 extension field as a flag to indicate support for the ZRTP protocol 443 in RTP packets as described in Section 12. 445 The Hello message includes the ZRTP version, hash type, cipher type, 446 authentication method and tag length, key agreement type, and Short 447 Authentication String (SAS) algorithms that are supported. The Hello 448 message also includes a hash image as described in Section 9. In 449 addition, each endpoint sends and discovers ZIDs. The received ZID 450 is used later in the protocol as an index into a cache of shared 451 secrets that were previously negotiated and retained between the two 452 parties. 454 A Hello message can be sent at any time, but is usually sent at the 455 start of an RTP session to determine if the other endpoint supports 456 ZRTP, and also if the SRTP implementations are compatible. A Hello 457 message is retransmitted using timer T1 and an exponential backoff 458 mechanism detailed in Section 6 until the receipt of a HelloACK 459 message or a Commit message. 461 The use of the a=zrtp-hash SDP attribute to authenticate the Hello 462 message is described in Section 8.1. 464 4.1.1. Protocol Version Negotiation 466 This specification defines ZRTP version 1.10. Since new versions of 467 ZRTP may be developed in the future, this specification defines a 468 protocol version negotiation in this section. 470 Each party declares what version of the ZRTP protocol they support 471 via the version field in the Hello message (Section 5.2). If both 472 parties have the same version number in their Hello messages, they 473 can proceed with the rest of the protocol. To facilitate both 474 parties reaching this state of protocol version agreement in their 475 Hello messages, ZRTP should use information provided in the signaling 476 layer, if available. If a ZRTP endpoint supports more than one 477 version of the protocol, it SHOULD declare them all in a list of SIP 478 SDP a=zrtp-hash attributes (defined in Section 8), listing separate 479 hashes, with separate ZRTP version numbers in each item in the list. 481 Both parties should inspect the list of ZRTP version numbers supplied 482 by the other party in the SIP SDP a=zrtp-hash attributes. Both 483 parties should choose the highest version number that appear in both 484 parties' list of a=zrtp-hash version numbers, and use that version 485 for their Hello messages. If both parties use the SIP signaling in 486 this manner, their initial Hello messages will have the same ZRTP 487 version number, provided they both have at least one supported 488 protocol version in common. Before the ZRTP key agreement can 489 proceed, an endpoint MUST have sent and received Hellos with the same 490 protocol version. 492 It is best if the signaling layer is used to negotiate the protocol 493 version number. However, the a=zrtp-hash SDP attribute is not always 494 present in the SIP packet, as explained in Section 8.1. In the 495 absence of any guidance from the signaling layer, an endpoint MUST 496 send the highest supported version in initial Hello messages. If the 497 two parties send different protocol version numbers in their Hello 498 messages, they can reach agreement to use a common version, if one 499 exists. They iteratively apply the following rules until they both 500 have matching version fields in their Hello messages and the key 501 agreement can proceed: 503 o If an endpoint receives a Hello message with an unsupported 504 version number that is higher than the endpoint's current Hello 505 message version, the received Hello message MUST be ignored. The 506 endpoint continues to retransmit Hello messages on the standard 507 retry schedule (Section 6). 508 o If an endpoint receives a Hello message with a version number that 509 is lower than the endpoint's current Hello message, and the 510 endpoint supports a version that is less than or equal to the 511 received version number, the endpoint MUST stop retransmitting the 512 old version number and MUST start sending a new Hello message with 513 the highest supported version number that is less than or equal to 514 the received version number. 515 o If an endpoint receives a Hello message with an unsupported 516 version number that is lower than the endpoint's current Hello 517 message, the endpoint MUST send an Error message (Section 5.9) 518 indicating failure to support this ZRTP version. 520 The above comparisons are iterated until the version numbers match, 521 or until it exits on a failure to match. 523 For example, assume that Alice supports protocol version 1.10 and 524 2.00, and Bob supports version 1.10 and 1.20. Alice initially 525 sends a Hello with version 2.00, and Bob initially sends a Hello 526 with version 1.20. Bob ignores Alice's 2.00 Hello and continues 527 to send his 1.20 Hello. Alice detects that Bob does not support 528 2.00 and she stops sending her 2.00 Hellos and starts sending a 529 stream of 1.10 Hellos. Bob sees the 1.10 Hello from Alice and 530 stops sending his 1.20 Hellos and switches to sending 1.10 Hellos. 531 At that point, they have converged on using version 1.10 and the 532 protocol proceeds on that basis. 534 When comparing protocol versions, a ZRTP endpoint MUST include only 535 the first three octets of the version field in the comparison. The 536 final octet is ignored, because it is not significant for 537 interoperability. For example, "1.1 ", "1.10", "1.11", or "1.1a" are 538 all regarded as a version match, because they would all be 539 interoperable versions. 541 Changes in protocol version numbers are expected be infrequent after 542 version 1.10. Supporting multiple versions adds code complexity and 543 may introduce security weaknesses in the implementation. The old 544 adage about keeping it simple applies especially to implementing 545 security protocols. Endpoints SHOULD NOT support protocol versions 546 earlier than version 1.10. 548 4.2. Commit Contention 550 After both parties have received compatible Hello messages, a Commit 551 message (Section 5.4) can be sent to begin the ZRTP key exchange. 552 The endpoint that sends the Commit is known as the initiator, while 553 the receiver of the Commit is known as the responder. 555 If both sides send Commit messages initiating a secure session at the 556 same time the following rules are used to break the tie: 558 o If one Commit is for a DH mode while the other is for Preshared 559 mode, then the Preshared Commit MUST be discarded and the DH 560 Commit proceeds. 561 o If the two Commits are both Preshared mode, and one party has set 562 the MiTM (M) flag in the Hello message and the other has not, the 563 Commit message from the party who set the (M) flag MUST be 564 discarded, and the one who has not set the (M) flag becomes the 565 initiator, regardless of the nonce values. In other words, for 566 Preshared mode, the phone is the initiator and the PBX is the 567 responder. 568 o If the two Commits are either both DH modes or both non-DH modes, 569 then the Commit message with the lowest hvi value (for DH 570 Commits), or lowest nonce value (for non-DH Commits), MUST be 571 discarded and the other side is the initiator, and the protocol 572 proceeds with the initiator's Commit. The two hvi or nonce values 573 are compared as large unsigned integers in network byte order. 575 If one Commit is for Multistream mode while the other is for non- 576 Multistream (DH or Preshared) mode, a software error has occurred and 577 the ZRTP negotiation should be terminated. This should never occur 578 because of the constraints on Multistream mode described in 579 Section 4.4.3. 581 In the event that Commit messages are sent by both ZRTP endpoints at 582 the same time, but are received in different media streams, the same 583 resolution rules apply as if they were received on the same stream. 584 The media stream in which the Commit will proceed through the ZRTP 585 exchange while the media stream with the discarded Commit must wait 586 for the completion of the other ZRTP exchange. 588 If a commit contention forces a DH Commit message to be discarded, 589 the responder's DH public value should only be discarded if it does 590 not match the initiator's DH key size. 592 4.3. Matching Shared Secret Determination 594 The following sections describe how ZRTP endpoints generate and/or 595 use the set of shared secrets s1, auxsecret, and pbxsecret through 596 the exchange of the DHPart1 and DHPart2 messages. This doesn't cover 597 the Diffie-Hellman calculations. It only covers the method whereby 598 the two parties determine if they already have shared secrets in 599 common in their caches. 601 Each ZRTP endpoint maintains a long-term cache of shared secrets that 602 it has previously negotiated with the other party. The ZID of the 603 other party, received in the other party's Hello message, is used as 604 an index into this cache to find the set of shared secrets, if any 605 exist. This cache entry may contain previously retained shared 606 secrets, rs1 and rs2, which give ZRTP its key continuity features. 607 If the other party is a PBX, the cache may also contain a trusted 608 MiTM PBX shared secret, called pbxsecret, defined in Section 7.3.1. 610 The DHPart1 and DHPart2 messages contain a list of hashes of these 611 shared secrets to allow the two endpoints to compare the hashes with 612 what they have in their caches to detect whether the two sides share 613 any secrets that can be used in the calculation of the session key. 614 The use of this shared secret cache is described in Section 4.9. 616 If no secret of a given type is available, a random value is 617 generated and used for that secret to ensure a mismatch in the hash 618 comparisons in the DHPart1 and DHPart2 messages. This prevents an 619 eavesdropper from knowing which types of shared secrets are available 620 between the endpoints. 622 Section 4.3.1 refers to the auxiliary shared secret auxsecret. The 623 auxsecret shared secret may be defined by the VoIP user agent out-of- 624 band from the ZRTP protocol. In some cases it may be provided by the 625 signaling layer as srtps, which is defined in Section 8.2. If it is 626 not provided by the signaling layer, the auxsecret shared secret may 627 be manually provisioned in other application-specific ways that are 628 out-of-band, such as computed from a hashed pass phrase by prior 629 agreement between the two parties. Or it may be a family key used by 630 an institution that the two parties both belong to. It is a 631 generalized mechanism for providing a shared secret that is agreed to 632 between the two parties out of scope of the ZRTP protocol. It is 633 expected that most typical ZRTP endpoints will rarely use auxsecret. 635 For both the initiator and the responder, the shared secrets s1, s2, 636 and s3 will be calculated so that they can all be used later to 637 calculate s0 in Section 4.4.1.4. Here is how s1, s2, and s3 are 638 calculated by both parties: 640 The shared secret s1 will be either the initiator's rs1 or the 641 initiator's rs2, depending on which of them can be found in the 642 responder's cache. If the initiator's rs1 matches the responder's 643 rs1 or rs2, then s1 MUST be set to the initiator's rs1. If and only 644 if that match fails, then if the initiator's rs2 matches the 645 responder's rs1 or rs2, then s1 MUST be set to the initiator's rs2. 646 If that match also fails, then s1 MUST be set to null. The 647 complexity of the s1 calculation is to recover from any loss of cache 648 sync from an earlier aborted session, due to the Byzantine Generals' 649 Problem [Byzantine]. 651 The shared secret s2 MUST be set to the value of auxsecret if and 652 only if both parties have matching values for auxsecret, as 653 determined by comparing the hashes of auxsecret sent in the DH 654 messages. If they don't match, s2 MUST be set to null. 656 The shared secret s3 MUST be set to the value of pbxsecret if and 657 only if both parties have matching values for pbxsecret, as 658 determined by comparing the hashes of pbxsecret sent in the DH 659 messages. If they don't match, s3 MUST be set to null. 661 If s1, s2, or s3 have null values, they are assumed to have a zero 662 length for the purposes of hashing them later during the s0 663 calculation in Section 4.4.1.4. 665 The comparison of hashes of rs1, rs2, auxsecret, and pbxsecret is 666 described below in Section 4.3.1. 668 4.3.1. Calculation and comparison of hashes of shared secrets 670 Both parties calculate a set of keyed hashes (HMACs) of shared 671 secrets that may be present in each of their caches. These hashes 672 are truncated to the leftmost 64 bits: 674 rs1IDr = HMAC(rs1, "Responder") 675 rs2IDr = HMAC(rs2, "Responder") 676 auxsecretIDr = HMAC(auxsecret, "Responder") 677 pbxsecretIDr = HMAC(pbxsecret, "Responder") 678 rs1IDi = HMAC(rs1, "Initiator") 679 rs2IDi = HMAC(rs2, "Initiator") 680 auxsecretIDi = HMAC(auxsecret, "Initiator") 681 pbxsecretIDi = HMAC(pbxsecret, "Initiator") 683 The responder sends rs1IDr, rs2IDr, auxsecretIDr, and pbxsecretIDr in 684 the DHPart1 message. The initiator sends rs1IDi, rs2IDi, 685 auxsecretIDi, and pbxsecretIDi in the DHPart2 message. 687 The responder uses the locally computed rs1IDi, rs2IDi, auxsecretIDi, 688 and pbxsecretIDi to compare against the corresponding fields in the 689 received DHPart2 message. The initiator uses the locally computed 690 rs1IDr, rs2IDr, auxsecretIDr, and pbxsecretIDr to compare against the 691 corresponding fields in the received DHPart1 message. 693 From these comparisons, s1, s2, and s3 are calculated per the methods 694 described above in Section 4.3. The secrets corresponding to 695 matching HMACs are kept while the secrets corresponding to the non- 696 matching ones are replaced with a null, which is assumed to have a 697 zero length for the purposes of hashing them later. The resulting 698 s1, s2, and s3 values are used later to calculate s0 in 699 Section 4.4.1.4. 701 For example, consider two ZRTP endpoints who share secrets rs1 and 702 pbxsecret (defined in Section 7.3.1). During the comparison, rs1ID 703 and pbxsecretID will match but auxsecretID will not. As a result, s1 704 = rs1, s2 will be null, and s3 = pbxsecret. 706 4.3.2. Handling a Shared Secret Cache Mismatch 708 A shared secret cache mismatch is defined to mean that we expected a 709 cache match because rs1 exists in our local cache, but we computed a 710 null value for s1 (per the method described in Section 4.3). 712 If one party has a cached shared secret and the other party does not, 713 this indicates one of two possible situations. Either there is a 714 man-in-the-middle (MiTM) attack, or one of the legitimate parties has 715 lost their cached shared secret by some mishap. Perhaps they 716 inadvertently deleted their cache, or their cache was lost or 717 disrupted due to restoring their disk from an earlier backup copy. 718 The party that has the surviving cache entry can easily detect that a 719 cache mismatch has occurred, because they expect their own cached 720 secret to match the other party's cached secret, but it does not 721 match. It is possible for both parties to detect this condition if 722 both parties have surviving cached secrets that have fallen out of 723 sync, due perhaps to one party restoring from a disk backup. 725 If either party discovers a cache mismatch, the user agent who makes 726 this discovery must treat this as a possible security event and MUST 727 alert their own user that there is a heightened risk of a MiTM 728 attack, and that the user should verbally compare the SAS with the 729 other party to ascertain that no MiTM attack has occurred. If a 730 cache mismatch is detected and it is not possible to compare the SAS, 731 either because the user interface does not support it or because one 732 or both endpoints are unmanned devices, and no other SAS comparison 733 mechanism is available, the session MAY be terminated. 735 The session need not be terminated on a cache mismatch event if the 736 mechanism described in Section 8.1.1 is available, which allows 737 authentication of the DH exchange without human assistance. Or if 738 any mechanism is available to determine if the SAS matches. This 739 would require either circumstances that allow human verbal 740 comparisons of the SAS, or by using the OPTIONAL digital signature 741 feature on the SAS hash, as described in Section 7.2. Even if the 742 user interface does not permit an SAS comparison, the human user MUST 743 be warned, and may elect to proceed with the call at their own risk. 745 Here is a non-normative example of a cache-mismatch alert message 746 from a ZRTP user agent (specifically, Zfone [zfone]), designed for a 747 desktop PC graphical user interface environment. It is by no means 748 required that the alert be this detailed: 750 "We expected the other party to have a shared secret cached from a 751 previous call, but they don't have it. This may mean your partner 752 simply lost his cache of shared secrets, but it could also mean 753 someone is trying to wiretap you. To resolve this question you 754 must check the authentication string with your partner. If it 755 doesn't match, it indicates the presence of a wiretapper." 756 If the alert is rendered by a robot voice instead of a GUI, 757 brevity may be more important: "Something's wrong. You must check 758 the authentication string with your partner. If it doesn't match, 759 it indicates the presence of a wiretapper." 761 4.4. DH and non-DH key agreements 763 The next step is the generation of a secret for deriving SRTP keying 764 material. ZRTP uses Diffie-Hellman and two non-Diffie-Hellman modes, 765 described in the following sections. 767 4.4.1. Diffie-Hellman Mode 769 The purpose of the Diffie-Hellman (either Finite Field Diffie-Hellman 770 or Elliptic Curve Diffie-Hellman) exchange is for the two ZRTP 771 endpoints to generate a new shared secret, s0. In addition, the 772 endpoints discover if they have any cached or previously stored 773 shared secrets in common, and uses them as part of the calculation of 774 the session keys. 776 Because the DH exchange affects the state of the retained shared 777 secret cache, only one in-process ZRTP DH exchange may occur at a 778 time between two ZRTP endpoints. Otherwise, race conditions and 779 cache integrity problems will result. When multiple media streams 780 are established in parallel between the same pair of ZRTP endpoints 781 (determined by the ZIDs in the Hello Messages), only one can be 782 processed. Once that exchange completes with Confirm2 and Conf2ACK 783 messages, another ZRTP DH exchange can begin. This constraint does 784 not apply when Multistream mode key agreement is used since the 785 cached shared secrets are not affected. 787 4.4.1.1. Hash Commitment in Diffie-Hellman Mode 789 From the intersection of the algorithms in the sent and received 790 Hello messages, the initiator chooses a hash, cipher, auth tag, key 791 agreement type, and SAS type to be used. 793 A Diffie-Hellman mode is selected by setting the Key Agreement Type 794 to one of the DH or ECDH values in Table 5 in the Commit. In this 795 mode, the key agreement begins with the initiator choosing a fresh 796 random Diffie-Hellman (DH) secret value (svi) based on the chosen key 797 agreement type value, and computing the public value. (Note that to 798 speed up processing, this computation can be done in advance.) For 799 guidance on generating random numbers, see Section 4.8. The value 800 for the DH generator g, the DH prime p, and the length of the DH 801 secret value, svi, are defined in Section 5.1.5. 803 pvi = g^svi mod p 805 where g and p are determined by the key agreement type value. The 806 pvi value is formatted as a big-endian octet string, fixed to the 807 width of the DH prime, and leading zeros MUST NOT be truncated. 809 The hash commitment is performed by the initiator of the ZRTP 810 exchange. The hash value of the initiator, hvi, includes a hash of 811 the entire DHPart2 message as shown in Figure 9 (which includes the 812 Diffie-Hellman public value, pvi), and the responder's Hello message: 814 hvi = hash(initiator's DHPart2 message || responder's Hello 815 message) 817 Note that the Hello message includes the fields shown in Figure 3. 819 The information from the responder's Hello message is included in the 820 hash calculation to prevent a bid-down attack by modification of the 821 responder's Hello message. 823 The initiator sends hvi in the Commit message. 825 The use of hash commitment in the DH exchange constrains the attacker 826 to only one guess to generate the correct short authentication string 827 (SAS) (Section 7) in his attack, which means the SAS can be quite 828 short. A 16-bit SAS, for example, provides the attacker only one 829 chance out of 65536 of not being detected. 831 4.4.1.2. Responder Behavior in Diffie-Hellman Mode 833 Upon receipt of the Commit message, the responder generates its own 834 fresh random DH secret value, svr, and computes the public value. 835 (Note that to speed up processing, this computation can be done in 836 advance.) For guidance on random number generation, see Section 4.8. 837 The value for the DH generator g, the DH prime p, and the length of 838 the DH secret value, svr, are defined in Section 5.1.5. 840 pvr = g^svr mod p 842 The pvr value is formatted as a big-endian octet string, fixed to the 843 width of the DH prime, and leading zeros MUST NOT be truncated. 845 Upon receipt of the DHPart2 message, the responder checks that the 846 initiator's public DH value is not equal to 1 or p-1. An attacker 847 might inject a false DHPart2 packet with a value of 1 or p-1 for 848 g^svi mod p, which would cause a disastrously weak final DH result to 849 be computed. If pvi is 1 or p-1, the user should be alerted of the 850 attack and the protocol exchange MUST be terminated. Otherwise, the 851 responder computes its own value for the hash commitment using the 852 public DH value (pvi) received in the DHPart2 packet and its Hello 853 packet and compares the result with the hvi received in the Commit 854 packet. If they are different, a MiTM attack is taking place and the 855 user is alerted and the protocol exchange terminated. 857 The responder then calculates the Diffie-Hellman result: 859 DHResult = pvi^svr mod p 861 4.4.1.3. Initiator Behavior in Diffie-Hellman Mode 863 Upon receipt of the DHPart1 message, the initiator checks that the 864 responder's public DH value is not equal to 1 or p-1. An attacker 865 might inject a false DHPart1 packet with a value of 1 or p-1 for 866 g^svr mod p, which would cause a disastrously weak final DH result to 867 be computed. If pvr is 1 or p-1, the user should be alerted of the 868 attack and the protocol exchange MUST be terminated. 870 The initiator then sends a DHPart2 message containing the initiator's 871 public DH value and the set of calculated shared secret IDs as 872 defined in Section 4.3.1. 874 The initiator calculates the same Diffie-Hellman result using: 876 DHResult = pvr^svi mod p 878 4.4.1.4. Shared Secret Calculation for DH Mode 880 A hash of the received and sent ZRTP messages in the current ZRTP 881 exchange in the following order is calculated by both parties: 883 total_hash = hash(Hello of responder || Commit || DHPart1 || 884 DHPart2) 886 Note that only the ZRTP messages (Figure 3, Figure 5, Figure 8, and 887 Figure 9), not the entire ZRTP packets, are included in the 888 total_hash. 890 For both the initiator and responder, the DHResult is formatted as a 891 big-endian octet string, fixed to the width of the DH prime, and 892 leading zeros MUST NOT be truncated. For example, for a 3072-bit p, 893 DHResult would be a 384 octet value, with the first octet the most 894 significant. 896 The calculation of the final shared secret, s0, is in compliance with 897 the recommendations in sections 5.8.1 and 6.1.2.1 of NIST SP 800-56A 898 [SP800-56A]. This is done by hashing a concatenation of a number of 899 items, including the DHResult, the ZID's of the initiator (ZIDi) and 900 the responder (ZIDr), the total_hash, and the set of non-null shared 901 secrets as described in Section 4.3. 903 In section 5.8.1 of NIST SP 800-56A [SP800-56A], NIST requires 904 certain parameters to be hashed together in a particular order, which 905 NIST refers to as: Z, AlgorithmID, PartyUInfo, PartyVInfo, 906 SuppPubInfo, and SuppPrivInfo. In our implementation, our DHResult 907 corresponds to Z, "ZRTP-HMAC-KDF" corresponds to AlgorithmID, our 908 ZIDi and ZIDr correspond to PartyUInfo and PartyVInfo, our total_hash 909 corresponds to SuppPubInfo, and the set of three shared secrets s1, 910 s2, and s3 corresponds to SuppPrivInfo. NIST also requires a 32-bit 911 big-endian integer counter to be included in the hash each time the 912 hash is computed, which we have set to the fixed value of 1, because 913 we only compute the hash once. NIST refers to the final hash output 914 as DerivedKeyingMaterial, which corresponds to our s0 in this 915 calculation. 917 s0 = hash(counter || DHResult || "ZRTP-HMAC-KDF" || ZIDi || ZIDr 918 || total_hash || len(s1) || s1 || len(s2) || s2 || len(s3) || s3) 920 Note that temporary values s1, s2, and s3 were calculated per the 921 methods described above in Section 4.3, and they are erased from 922 memory immediately after they are used to calculate s0. 924 The length of the DHResult field was implicitly agreed to by the 925 negotiated DH prime size. The length of total_hash is implicitly 926 determined by the negotiated hash algorithm. All of the explicit 927 length fields, len(), in the above hash are 32-bit big-endian 928 integers, giving the length in octets of the field that follows. 929 Some members of the set of shared secrets (s1, s2, and s3) may have 930 lengths of zero if they are null (not shared), and are each preceded 931 by a 4-octet length field. For example, if s2 is null, len(s2) is 932 0x00000000, and s2 itself would be absent from the hash calculation, 933 which means len(s3) would immediately follow len(s2). While 934 inclusion of ZIDi and ZIDr may be redundant, because they are 935 implicitly included in the total_hash, we explicitly include them 936 here to follow NIST SP800-56A. The fixed-length string "ZRTP-HMAC- 937 KDF" (not null-terminated) identifies what purpose the resulting s0 938 will be used for, which is to serve as the key derivation key for the 939 ZRTP HMAC-based key derivation function (KDF) defined in 940 Section 4.5.1 and used in Section 4.5.3. 942 ZRTP DH mode is in full compliance with two relevant NIST documents 943 that cover key derivations. First, section 5.8.1 of NIST SP 800-56A 944 [SP800-56A] computes what NIST refers to as DerivedKeyingMaterial, 945 which ZRTP refers to as s0. This s0 then serves as the key 946 derivation key, which NIST refers to as KI in the key derivation 947 function described in sections 5 and 5.1 of NIST SP 800-108 948 [SP800-108], to derive all the rest of the subkeys needed by ZRTP. 950 The ZRTP key derivation function (KDF) (Section 4.5.1) requires the 951 use of a KDF Context field (per NIST SP 800-108 [SP800-108] 952 guidelines) which should include the ZIDi, ZIDr, and a nonce value 953 known to both parties. The total_hash qualifies as a nonce value, 954 because its computation included nonce material from the initiator's 955 Commit message and the responder's Hello message. 957 KDF_Context = (ZIDi || ZIDr || total_hash) 959 At this point in DH mode, the two endpoints proceed to the key 960 derivations of ZRTPSess and the rest of the keys in Section 4.5.2, 961 now that there is a defined s0. 963 4.4.2. Preshared Mode 965 The Preshared key agreement mode can be used to generate SRTP keys 966 and salts without a DH calculation, instead relying on a shared 967 secret from previous DH calculations between the endpoints. 969 This key agreement mode is useful to rapidly re-establish a secure 970 session between two parties who have recently started and ended a 971 secure session that has already performed a DH key agreement, without 972 performing another lengthy DH calculation, which may be desirable on 973 slow processors in resource-limited environments. Preshared mode 974 MUST NOT be used for adding additional media streams to an existing 975 call. Multistream mode MUST be used for this purpose. 977 In the most severe resource-limited environments, Preshared mode may 978 be useful with processors that cannot perform a DH calculation in an 979 ergonomically acceptable time limit. Shared key material may be 980 manually provisioned between two such endpoints in advance and still 981 allow a limited subset of functionality. Such a "better than 982 nothing" implementation would have to be regarded as non-compliant 983 with the ZRTP specification, but it could interoperate in Preshared 984 (and if applicable, Multistream) mode with a compliant ZRTP endpoint. 986 Because Preshared mode affects the state of the retained shared 987 secret cache, only one in-process ZRTP Preshared exchange may occur 988 at a time between two ZRTP endpoints. This rule is explained in more 989 detail in Section 4.4.1, and applies for the same reasons as in DH 990 mode. 992 Preshared mode is only included in this specification to meet the 993 R-REUSE requirement in the Media Security Requirements 994 [I-D.ietf-sip-media-security-requirements] document. A series of 995 preshared-keyed calls between two ZRTP endpoints should use a DH key 996 exchange periodically. Preshared mode is only used if a cached 997 shared secret has been established in an earlier session by a DH 998 exchange, as discussed in Section 4.9. 1000 4.4.2.1. Commitment in Preshared Mode 1002 Preshared mode is selected by setting the Key Agreement Type to 1003 Preshared in the Commit message. This results in the same call flow 1004 as Multistream mode. The principal difference between Multistream 1005 mode and Preshared mode is that Preshared mode uses a previously 1006 cached shared secret, rs1, instead of an active ZRTP Session key, 1007 ZRTPSess, as the initial keying material. 1009 Because Preshared mode depends on having a reliable shared secret in 1010 its cache, it is RECOMMENDED that Preshared mode only be used when 1011 the SAS Verified flag has been previously set. 1013 4.4.2.2. Initiator Behavior in Preshared Mode 1015 The Commit message (Figure 7) is sent by the initiator of the ZRTP 1016 exchange. From the intersection of the algorithms in the sent and 1017 received Hello messages, the initiator chooses a hash, cipher, auth 1018 tag, key agreement type, and SAS type to be used. 1020 To assemble a Preshared commit, we must first construct a temporary 1021 preshared_key, which is constructed from one of several possible 1022 combinations of cached key material, depending on what is available 1023 in the shared secret cache. If rs1 is not available in the 1024 initiator's cache, then Preshared mode MUST NOT be used. 1026 preshared_key = hash(len(rs1) || rs1 || len(auxsecret) || 1027 auxsecret || len(pbxsecret) || pbxsecret) 1029 All of the explicit length fields, len(), in the above hash are 32- 1030 bit big-endian integers, giving the length in octets of the field 1031 that follows. Some members of the set of shared secrets (rs1, 1032 auxsecret, and pbxsecret) may have lengths of zero if they are null 1033 (not available), and are each preceded by a 4-octet length field. 1034 For example, if auxsecret is null, len(auxsecret) is 0x00000000, and 1035 auxsecret itself would be absent from the hash calculation, which 1036 means len(pbxsecret) would immediately follow len(auxsecret). 1038 In place of hvi in the Commit message, two smaller fields are 1039 inserted by the initiator: 1041 - A random nonce of length 4-words (16 octets). 1042 - A keyID = HMAC(preshared_key, "Prsh") truncated to 64 bits. 1044 Note: Since the nonce is used to calculate different SRTP key and 1045 salt pairs for each session, a duplication will result in the same 1046 key and salt being generated for the two sessions, which would 1047 have disastrous security consequences. 1049 4.4.2.3. Responder Behavior in Preshared Mode 1051 The responder uses the received keyID to search for matching key 1052 material in its cache. It does this by computing a preshared_key 1053 value and keyID value using the same formula as the initiator, 1054 depending on what is available in the responder's local cache. If 1055 the locally computed keyID does not match the received keyID in the 1056 Commit, the responder recomputes a new preshared_key and keyID from a 1057 different subset of shared keys from the cache, dropping auxsecret or 1058 pbxsecret or both from the hash calculation, until a matching 1059 preshared_key is found or it runs out of possibilities. Note that 1060 rs2 is not included in the process. 1062 If it finds the appropriate matching shared key material, it is used 1063 to derive s0 and a new ZRTPSess key, as described in the next section 1064 on Shared Secret Calculation, Section 4.4.2.4. 1066 If the responder determines that it does not have a cached shared 1067 secret from a previous DH exchange, or it fails to match the keyID 1068 hash from the initiator with any combination of its shared keys, it 1069 SHOULD respond with its own DH Commit message. This would reverse 1070 the roles and the responder would become the initiator, because the 1071 DH Commit must always "trump" the Preshared Commit message as 1072 described in Section 4.2. The key exchange would then proceeds using 1073 DH mode. However, if a severely resource-limited responder lacks the 1074 computing resources to respond in a reasonable time with a DH Commit, 1075 it MAY respond with a ZRTP Error message (Section 5.9) indicating 1076 that no shared secret is available. 1078 If both sides send Preshared Commit messages initiating a secure 1079 session at the same time, the contention is resolved and the 1080 initiator/responder roles are settled according to Section 4.2, and 1081 the protocol proceeds. 1083 In Preshared mode, both the DHPart1 and DHPart2 messages are skipped. 1084 After receiving the Commit message from the initiator, the responder 1085 sends the Confirm1 message after calculating this stream's SRTP keys, 1086 as described below. 1088 4.4.2.4. Shared Secret Calculation for Preshared Mode 1090 Preshared mode requires that the s0 and ZRTPSess keys be derived from 1091 the preshared_key, and this must be done in a way that guarantees 1092 uniqueness for each session. This is done by using nonce material 1093 from both parties: the explicit nonce in the initiator's Preshared 1094 Commit message (Figure 7) and the H3 field in the responder's Hello 1095 message (Figure 3). Thus both parties force the resulting shared 1096 secret to be unique for each session. 1098 A hash of the received and sent ZRTP messages in the current ZRTP 1099 exchange for the current media stream is calculated: 1101 total_hash = hash(Hello of responder || Commit) 1103 Note that only the ZRTP messages (Figure 3 and Figure 7), not the 1104 entire ZRTP packets, are included in the total_hash. 1106 The ZRTP key derivation function (KDF) (Section 4.5.1) requires the 1107 use of a KDF Context field (per NIST SP 800-108 [SP800-108] 1108 guidelines) which should include the ZIDi, ZIDr, and a nonce value 1109 known to both parties. The total_hash qualifies as a nonce value, 1110 because its computation included nonce material from the initiator's 1111 Commit message and the responder's Hello message. 1113 KDF_Context = (ZIDi || ZIDr || total_hash) 1115 The s0 key is derived via the ZRTP key derivation function 1116 (Section 4.5.1) from preshared_key and the nonces implicitly included 1117 in the total_hash. The nonces also ensure KDF_Context is unique for 1118 each session, which is critical for security. 1120 s0 = KDF(preshared_key, "ZRTP PSK", KDF_Context, negotiated hash 1121 length) 1123 The preshared_key MUST be erased as soon as it has been used to 1124 calculate s0. 1126 At this point in Preshared mode, the two endpoints proceed to the key 1127 derivations of ZRTPSess and the rest of the keys in Section 4.5.2, 1128 now that there is a defined s0. 1130 4.4.3. Multistream Mode 1132 The Multistream key agreement mode can be used to generate SRTP keys 1133 and salts for additional media streams established between a pair of 1134 endpoints. Multistream mode cannot be used unless there is an active 1135 SRTP session established between the endpoints which means a ZRTP 1136 Session key is active. This ZRTP Session key can be used to generate 1137 keys and salts without performing another DH calculation. In this 1138 mode, the retained shared secret cache is not used or updated. As a 1139 result, multiple ZRTP Multistream mode exchanges can be processed in 1140 parallel between two endpoints. 1142 Multistream mode is also used to resume a secure call that has gone 1143 clear using a GoClear message as described in Section 4.7.2.1. 1145 When adding additional media streams to an existing call, Multistream 1146 mode MUST be used. The first media stream MUST use either DH mode or 1147 Preshared mode. Only one DH exchange or Preshared exchange is 1148 performed, just for the first media stream. The DH exchange or 1149 Preshared exchange MUST be completed for the first media stream 1150 before Multistream mode is used to add any other media streams. In a 1151 Multistream session, a ZRTP endpoint MUST use the same ZID for all 1152 media streams, matching the ZID used in the first media stream. 1154 4.4.3.1. Commitment in Multistream Mode 1156 Multistream mode is selected by the initiator setting the Key 1157 Agreement Type to "Mult" in the Commit message (Figure 6). The 1158 Cipher Type, Auth Tag Length, and Hash in Multistream mode SHOULD be 1159 set by the initiator to the same as the values as in the initial DH 1160 Mode Commit. The SAS Type is ignored as there is no SAS 1161 authentication in this mode. 1163 Note: This requirement is needed since some endpoints cannot 1164 support different SRTP algorithms for different media streams. 1165 However, in the case of Multstream mode being used to go secure 1166 after a GoClear, the requirement to use the same SRTP algorithms 1167 is relaxed if there are no other active SRTP sessions. 1169 In place of hvi in the Commit, a random nonce of length 4-words (16 1170 octets) is chosen. Its value MUST be unique for all nonce values 1171 chosen for active ZRTP sessions between a pair of endpoints. If a 1172 Commit is received with a reused nonce value, the ZRTP exchange MUST 1173 be immediately terminated. 1175 Note: Since the nonce is used to calculate different SRTP key and 1176 salt pairs for each media stream, a duplication will result in the 1177 same key and salt being generated for the two media streams, which 1178 would have disastrous security consequences. 1180 If a Commit is received selecting Multistream mode, but the responder 1181 does not have a ZRTP Session Key available, the exchange MUST be 1182 terminated. Otherwise, the responder proceeds to the next section on 1183 Shared Secret Calculation, Section 4.4.3.2. 1185 If both sides send Multistream Commit messages at the same time, the 1186 contention is resolved and the initiator/responder roles are settled 1187 according to Section 4.2, and the protocol proceeds. 1189 In Multistream mode, both the DHPart1 and DHPart2 messages are 1190 skipped. After receiving the Commit message from the initiator, the 1191 responder sends the Confirm1 message after calculating this stream's 1192 SRTP keys, as described below. 1194 4.4.3.2. Shared Secret Calculation for Multistream Mode 1196 In Multistream mode, each media stream requires that a set of keys be 1197 derived from the ZRTPSess key, and this must be done in a way that 1198 guarantees uniqueness for each media stream. This is done by using 1199 nonce material from both parties: the explicit nonce in the 1200 initiator's Multistream Commit message (Figure 6) and the H3 field in 1201 the responder's Hello message (Figure 3). Thus both parties force 1202 the resulting shared secret to be unique for each media stream. 1204 A hash of the received and sent ZRTP messages in the current ZRTP 1205 exchange for the current media stream is calculated: 1207 total_hash = hash(Hello of responder || Commit) 1209 This refers to the Hello and Commit messages for the current media 1210 stream which is using Multistream mode, not the original media stream 1211 that included a full DH key agreement. Note that only the ZRTP 1212 messages (Figure 3 and Figure 6), not the entire ZRTP packets, are 1213 included in the hash. 1215 The ZRTP key derivation function (KDF) (Section 4.5.1) requires the 1216 use of a KDF Context field (per NIST SP 800-108 [SP800-108] 1217 guidelines) which should include the ZIDi, ZIDr, and a nonce value 1218 known to both parties. The total_hash qualifies as a nonce value, 1219 because its computation included nonce material from the initiator's 1220 Commit message and the responder's Hello message. 1222 KDF_Context = (ZIDi || ZIDr || total_hash) 1224 The current stream's SRTP keys and salts for the initiator and 1225 responder are calculated using the ZRTP Session Key ZRTPSess and the 1226 nonces implicitly included in the total_hash. The nonces also ensure 1227 KDF_Context will be unique for each media stream, which is critical 1228 for security. For each additional media stream, a separate s0 is 1229 derived from ZRTPSess via the ZRTP key derivation function 1230 (Section 4.5.1): 1232 s0 = KDF(ZRTPSess, "ZRTP MSK", KDF_Context, negotiated hash 1233 length) 1235 Note that the ZRTPSess key was previously derived from material that 1236 also includes a different and more inclusive total_hash from the 1237 entire packet sequence that performed the original DH exchange for 1238 the first media stream in this ZRTP session. 1240 At this point in Multistream mode, the two endpoints begin key 1241 derivations in Section 4.5.3. 1243 4.5. Key Derivations 1245 4.5.1. The ZRTP Key Derivation Function 1247 To derive keys from a shared secret, ZRTP uses an HMAC-based key 1248 derivation function, or KDF. It is used throughout Section 4.5.3 and 1249 in other sections. The HMAC function for the KDF is based on the 1250 negotiated hash algorithm defined in Section 5.1.2. 1252 The ZRTP KDF is designed to provide key separation, which is a 1253 security requirement for the cryptographic keys derived from the same 1254 key derivation key. The keys shall be separate in the sense that the 1255 compromise of some derived keys will not degrade the security 1256 strength of any of the other derived keys, or the security strength 1257 of the key derivation key. Strong non-invertibility is required. 1259 The ZRTP KDF is in compliance with the recommendations in NIST SP 1260 800-108 [SP800-108], running the pseudorandom function (PRF) in 1261 counter mode, with only a single iteration of the counter. The NIST 1262 PRF is based on the HMAC function. The ZRTP KDF never has to 1263 generate more than 256 bits of output key material, so only a single 1264 invocation of the HMAC function is needed. 1266 The ZRTP KDF is defined in this manner, per sections 5 and 5.1 of 1267 NIST SP 800-108 [SP800-108]: 1269 KDF(KI, Label, Context, L) = HMAC(KI, i || Label || 0x00 || 1270 Context || L) 1272 The HMAC in the KDF is keyed by KI, which is a secret key derivation 1273 key that is unknown to the wiretapper (for example, s0). The HMAC is 1274 computed on a concatenated set of nonsecret fields that are defined 1275 as follows. The first field is a 32-bit big-endian integer counter 1276 (i) required by NIST to be included in the HMAC each time the HMAC is 1277 computed, which we have set to the fixed value of 0x000001, because 1278 we only compute the HMAC once. Label is a string of nonzero octets 1279 that identifies the purpose for the derived keying material. The 1280 octet 0x00 is a delimiter required by NIST. The NIST KDF formula has 1281 a "Context" field which includes ZIDi, ZIDr, and some optional nonce 1282 material known to both parties. L is a 32-bit big-endian positive 1283 integer, not to exceed the length in bits of the output of the HMAC. 1284 The output of the KDF is truncated to the leftmost L bits. If SHA- 1285 256 is the negotiated hash algorithm, the HMAC would be HMAC-SHA-256, 1286 thus the maximum value of L would be 256, the negotiated hash length. 1288 The ZRTP KDF is not to be confused with the SRTP KDF defined in 1289 [RFC3711]. 1291 4.5.2. Deriving ZRTPSess Key and SAS in DH or Preshared modes 1293 Both DH mode and Preshared mode (but not Multistream mode) come to 1294 this common point in the protocol to derive ZRTPSess and the SAS from 1295 s0, via the ZRTP Key Derivation Function (Section 4.5.1). At this 1296 point, s0 has been calculated, as well as KDF_Context. 1298 These calculations are done only for the first media stream, not for 1299 Multistream mode. KDF_Context is unique for each media stream, but 1300 only the first media stream is permitted to calculate ZRTPSess. 1302 ZRTPSess = KDF(s0, "ZRTP Session Key", KDF_Context, negotiated 1303 hash length) 1305 The ZRTPSess key is used only for these two purposes: 1) to generate 1306 the additional s0 keys (Section 4.4.3.2) for adding additional media 1307 streams to this session in Multistream mode, and 2) to generate the 1308 pbxsecret (Section 7.3.1) that may be cached for use in future calls. 1309 The ZRTPSess key is kept for the duration of the call signaling 1310 session between the two ZRTP endpoints. That is, if there are two 1311 separate calls between the endpoints (in SIP terms, separate SIP 1312 dialogs), then a ZRTP Session Key MUST NOT be used across the two 1313 call signaling sessions. ZRTPSess MUST be destroyed no later than 1314 the end of the call signaling session. 1316 There is only one Short Authentication String (SAS) (Section 7) 1317 computed per call, which is applicable to all media streams derived 1318 from a single DH key agreement in a ZRTP session. KDF_Context is 1319 unique for each media stream, but only the first media stream is 1320 permitted to calculate sashash. 1322 sashash = KDF(s0, "SAS", KDF_Context, negotiated hash length) 1323 sasvalue = sashash [truncated to leftmost 32 bits] 1325 The key separation properties of the KDF (Section 4.5.1) are 1326 especially important for the SAS calculation, because of the exposure 1327 of the SAS. 1329 At this point in DH mode or Preshared mode, the two endpoints proceed 1330 on to the key derivations in Section 4.5.3, now that there is a 1331 defined s0 and ZRTPSess key. 1333 4.5.3. Deriving the rest of the keys from s0 1335 DH mode, Multistream mode, and Preshared mode all come to this common 1336 point in the protocol to derive a set of keys from s0. It can be 1337 assumed that s0 has been calculated, as well the ZRTPSess key and 1338 KDF_Context. A separate s0 key is associated with each media stream. 1340 Subkeys are not drawn directly from s0, as done in NIST SP800-56A. To 1341 enhance key separation, ZRTP uses s0 to key a Key Derivation Function 1342 (Section 4.5.1) based on NIST SP 800-108 [SP800-108]. Since s0 1343 already included total_hash in its derivation, it is redundant to use 1344 total_hash again in the KDF Context in all the invocations of the KDF 1345 keyed by s0. Nonetheless, NIST SP 800-108 always requires KDF 1346 Context to be defined for the KDF, and nonce material is required in 1347 some KDF invocations (especially for Multistream mode and Preshared 1348 mode), so total_hash is included as a nonce in the KDF Context. 1350 Separate SRTP master keys and master salts are derived for use in 1351 each direction for each media stream. Unless otherwise specified, 1352 ZRTP uses SRTP with no MKI, 32 bit authentication using HMAC-SHA1, 1353 AES-CM 128 or 256 bit key length, 112 bit session salt key length, 1354 2^48 key derivation rate, and SRTP prefix length 0. 1356 The ZRTP initiator encrypts and the ZRTP responder decrypts packets 1357 by using srtpkeyi and srtpsalti, while the ZRTP responder encrypts 1358 and the ZRTP initiator decrypts packets by using srtpkeyr and 1359 srtpsaltr. The SRTP key and salt values are truncated (taking the 1360 leftmost bits) to the length determined by the chosen SRTP profile. 1361 These are generated by: 1363 srtpkeyi = KDF(s0, "Initiator SRTP master key", KDF_Context, 1364 negotiated AES key length) 1365 srtpsalti = KDF(s0, "Initiator SRTP master salt", KDF_Context, 1366 112) 1367 srtpkeyr = KDF(s0, "Responder SRTP master key", KDF_Context, 1368 negotiated AES key length) 1369 srtpsaltr = KDF(s0, "Responder SRTP master salt", KDF_Context, 1370 112) 1372 The HMAC keys are the same length as the output of the underlying 1373 hash function in the KDF, and are thus generated without truncation. 1374 They are used only by ZRTP and not by SRTP. Different HMAC keys are 1375 needed for the initiator and the responder to ensure that GoClear 1376 messages in each direction are unique and can not be cached by an 1377 attacker and reflected back to the endpoint. 1379 hmackeyi = KDF(s0, "Initiator HMAC key", KDF_Context, negotiated 1380 hash length) 1381 hmackeyr = KDF(s0, "Responder HMAC key", KDF_Context, negotiated 1382 hash length) 1384 ZRTP keys are generated for the initiator and responder to use to 1385 encrypt the Confirm1 and Confirm2 messages. They are truncated to 1386 the same size as the negotiated SRTP key size. 1388 zrtpkeyi = KDF(s0, "Initiator ZRTP key", KDF_Context, negotiated 1389 AES key length) 1390 zrtpkeyr = KDF(s0, "Responder ZRTP key", KDF_Context, negotiated 1391 AES key length) 1393 All key material is destroyed as soon as it is no longer needed, no 1394 later than the end of the call. s0 is erased in Section 4.6.1, and 1395 the rest of the session key material is erased in Section 4.7.2.1 and 1396 Section 4.7.3. 1398 4.6. Confirmation 1400 The Confirm1 and Confirm2 messages (Figure 10) contain the cache 1401 expiration interval (defined in Section 4.9) for the newly generated 1402 retained shared secret. The flagoctet is an 8 bit unsigned integer 1403 made up of these flags: the PBX Enrollment flag (E) defined in 1404 Section 7.3.1, SAS Verified flag (V) defined in Section 7.1, Allow 1405 Clear flag (A) defined in Section 4.7.2, and Disclosure flag (D) 1406 defined in Section 11. 1408 flagoctet = (E * 2^3) + (V * 2^2) + (A * 2^1) + (D * 2^0) 1410 Part of the Confirm1 and Confirm2 messages are encrypted using full- 1411 block Cipher Feedback Mode, and contain a 128-bit random CFB 1412 Initialization Vector (IV). The Confirm1 and Confirm2 messages also 1413 contain an HMAC covering the encrypted part of the Confirm1 or 1414 Confirm2 message which includes a string of zeros, the signature 1415 length, flag octet, cache expiration interval, signature type block 1416 (if present) and signature block (Section 7.2) (if present). For the 1417 responder: 1419 hmac = HMAC(hmackeyr, encrypted part of Confirm1) 1421 For the initiator: 1423 hmac = HMAC(hmackeyi, encrypted part of Confirm2) 1425 The hmackeyi and hmackeyr keys are computed in Section 4.5.3. 1427 The exchange is completed when the responder sends either the 1428 Conf2ACK message or the responder's first SRTP media packet (with a 1429 valid SRTP auth tag). The initiator MUST treat the first valid SRTP 1430 media from the responder as equivalent to receiving a Conf2ACK. The 1431 responder may respond to Confirm2 with either SRTP media or Conf2ACK, 1432 or both, in whichever order the responder chooses (or whichever order 1433 the "cloud" chooses to deliver them). 1435 4.6.1. Updating the Cache of Shared Secrets 1437 After receiving the Confirm messages, both parties must now update 1438 their retained shared secret rs1 in their respective caches, provided 1439 the following conditions hold: 1441 1) This key exchange is either DH or Preshared mode, not 1442 Multistream mode, which does not update the cache. 1443 2) Depending on the values of the cache expiration intervals that 1444 are received in the two Confirm messages, there are some scenarios 1445 that do not update the cache, as explained in Section 4.9. 1446 3) The responder MUST receive the initiator's Confirm2 message 1447 before updating the responder's cache. 1448 4) The initiator MUST receive either the responder's Conf2ACK 1449 message or the responder's SRTP media (with a valid SRTP auth tag) 1450 before updating the initiator's cache. 1452 For DH mode only, before updating the retained shared secret rs1 in 1453 the cache, each party first discards their old rs2 and copies their 1454 old rs1 to rs2. The old rs1 is saved to rs2 because of the risk of 1455 session interruption after one party has updated his own rs1 but 1456 before the other party has enough information to update her own rs1. 1457 If that happens, they may regain cache sync in the next session by 1458 using rs2 (per Section 4.3). This mitigates the well-known Byzantine 1459 Generals' Problem [Byzantine]. The old rs1 value is not saved in 1460 Preshared mode. 1462 For DH mode and Preshared mode, both parties compute a new rs1 value 1463 from s0 via the ZRTP key derivation function (Section 4.5.1): 1465 rs1 = KDF(s0, "retained secret", KDF_Context, negotiated hash 1466 length) 1468 Note that KDF_Context is unique for each media stream, but only the 1469 first media stream is permitted to update rs1. 1471 Each media stream has its own s0. At this point in the protocol for 1472 each media stream, the corresponding s0 MUST be erased. 1474 4.7. Termination 1476 A ZRTP session is normally terminated at the end of a call, but it 1477 may be terminated early by either the Error message or the GoClear 1478 message. 1480 4.7.1. Termination via Error message 1482 The Error message (Section 5.9) is used to terminate an in-progress 1483 ZRTP exchange due to an error. The Error message contains an integer 1484 Error Code for debugging purposes. The termination of a ZRTP key 1485 agreement exchange results in no updates to the cached shared secrets 1486 and deletion of all crypto context. 1488 The ZRTP Session key, ZRTPSess, is only deleted if the ZRTP session 1489 in which it was generated and all ZRTP sessions which are using it 1490 are terminated. 1492 4.7.2. Termination via GoClear message 1494 The GoClear message (Section 5.11) is used to switch from SRTP to 1495 RTP, usually because the user has chosen to do that by pressing a 1496 button. The GoClear uses an HMAC of the Message Type Block sent in 1497 the GoClear Message computed with the hmackey derived from the shared 1498 secret. This HMAC is truncated to the leftmost 64 bits. When sent 1499 by the initiator: 1501 clear_hmac = HMAC(hmackeyi, "GoClear ") 1503 When sent by the responder: 1505 clear_hmac = HMAC(hmackeyr, "GoClear ") 1507 A GoClear message which does not receive a ClearACK response must be 1508 resent. If a GoClear message is received with a bad HMAC, it must be 1509 ignored, and no ClearACK is sent. 1511 A ZRTP endpoint MAY choose to accept GoClear messages after the 1512 session has switched to SRTP, allowing the session to revert to RTP. 1513 This is indicated in the Confirm1 or Confirm2 messages (Figure 10) by 1514 setting the Allow Clear flag (A). If an endpoint sets the Allow 1515 Clear (A) flag in their Confirm message, it indicates that they 1516 support receiving GoClear messages. 1518 A ZRTP endpoint that receives a GoClear MUST authenticate the message 1519 by checking the clear_hmac. If the message authenticates, the 1520 endpoint stops sending SRTP packets, and generates a ClearACK in 1521 response. It MUST also delete all the crypto key material for all 1522 the SRTP media streams, as defined in Section 4.7.2.1. 1524 Until confirmation from the user is received (e.g. clicking a button, 1525 pressing a DTMF key, etc.), the ZRTP endpoint MUST NOT resume sending 1526 RTP packets. The endpoint then renders to the user an indication 1527 that the media session has switched to clear mode, and waits for 1528 confirmation from the user. This blocks the flow of sensitive 1529 discourse until the user is forced to take notice that he's no longer 1530 protected by encryption. To prevent pinholes from closing or NAT 1531 bindings from expiring, the ClearACK message MAY be resent at regular 1532 intervals (e.g. every 5 seconds) while waiting for confirmation from 1533 the user. After confirmation of the notification is received from 1534 the user, the sending of RTP packets may begin. 1536 After sending a GoClear message, the ZRTP endpoint stops sending SRTP 1537 packets. When a ClearACK is received, the ZRTP endpoint deletes the 1538 crypto context for the SRTP session, as defined in Section 4.7.2.1, 1539 and may then resume sending RTP packets. 1541 In the event a ClearACK is not received before the retransmissions of 1542 GoClear are exhausted, the key material is deleted, as defined in 1543 Section 4.7.2.1. 1545 After the users have transitioned from SRTP media back to RTP media 1546 (clear mode), they may decide later to return to secure mode by 1547 manual activation, usually by pressing a GO SECURE button. In that 1548 case, a new secure session is initiated by the party that presses the 1549 button, by sending a new Commit packet, leadng to a new session key 1550 negotiation. It is not necessary to send another Hello packet, as 1551 the two parties have already done that at the start of the call and 1552 thus have already discovered each other's ZRTP capabilities. It is 1553 possible for users to toggle back and forth between clear and secure 1554 modes multiple times in the same call, just as they could in the old 1555 days of secure PSTN phones. 1557 4.7.2.1. Key Destruction for GoClear message 1559 All SRTP session key material MUST be erased by the receiver of the 1560 GoClear message upon receiving a properly authenticated GoClear. The 1561 same key destruction MUST be done by the sender of GoClear message, 1562 upon receiving the ClearACK. This must be done for the key material 1563 for all of the media streams. 1565 All key material that would have been erased at the end of the SIP 1566 session MUST be erased, as described in Section 4.7.3, with the 1567 single exception of ZRTPSess. In this case, ZRTPSess is destroyed in 1568 a manner different from the other key material. Both parties replace 1569 ZRTPSess with a hash of itself, without truncation: 1571 ZRTPSess = hash(ZRTPSess) 1573 This meets the requirements of Perfect Forward Secrecy (PFS), but 1574 preserves a new version of ZRTPSess, so that the user can later re- 1575 initiate secure mode during the same call without performing another 1576 Diffie-Hellman calculation using Multistream mode which requires and 1577 assumes the existence of ZRTPSess with the same value at both ZRTP 1578 endpoints. A new key negotiation after a GoClear SHOULD use a 1579 Multistream Commit message. 1581 Note: Multistream mode is preferred over a Diffie-Hellman mode 1582 since this does not require the generation of a new hash chain and 1583 a new signaling exchange to exchange new hash values. 1585 Later, at the end of the entire call, ZRTPSess is finally destroyed 1586 along with the other key material, as described in Section 4.7.3. 1588 4.7.3. Key Destruction at Termination 1590 All SRTP session key material MUST be erased by both parties at the 1591 end of the call. In particular, the destroyed key material includes 1592 the SRTP session keys and salts, SRTP master keys and salts, and all 1593 material sufficient to reconstruct the SRTP keys and salts, including 1594 ZRTPSess and s0 (although s0 should have been destroyed earlier, in 1595 Section 4.6.1). This must be done for the key material for all of 1596 the media streams. The only exceptions are the cached shared secrets 1597 needed for future calls, including rs1, rs2, and pbxsecret. 1599 4.8. Random Number Generation 1601 The ZRTP protocol uses random numbers for cryptographic key material, 1602 notably for the DH secret exponents and nonces, which must be freshly 1603 generated with each session. Whenever a random number is needed, all 1604 of the following criteria must be satisfied: 1606 Random numbers MUST be freshly generated, meaning that it must not 1607 have been used in a previous calculation. 1609 When generating a random number k of L bits in length, k MUST be 1610 chosen with equal probability from the range of [1 < k < 2^L]. 1612 It MUST be derived from a physical entropy source, such as RF noise, 1613 acoustic noise, thermal noise, high resolution timings of 1614 environmental events, or other unpredictable physical sources of 1615 entropy. For a detailed explanation of cryptographic grade random 1616 numbers and guidance for collecting suitable entropy, see RFC 4086 1617 [RFC4086] and Chapter 10 of Practical Cryptography [Ferguson]. The 1618 raw entropy must be distilled and processed through a deterministic 1619 random bit generator (DRBG). Examples of DRBGs may be found in NIST 1620 SP 800-90 [SP800-90], and in [Ferguson]. Failure to use true entropy 1621 from the physical environment as a basis for generating random 1622 cryptographic key material would lead to a disastrous loss of 1623 security. 1625 4.9. ZID and Cache Operation 1627 Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID that 1628 is generated once at installation time. It is used to look up 1629 retained shared secrets in a local cache. A single global ZID for a 1630 single installation is the simplest way to implement ZIDs. However, 1631 it is specifically not precluded for an implementation to use 1632 multiple ZIDs, up to the limit of a separate one per callee. This 1633 then turns it into a long-lived "association ID" that does not apply 1634 to any other associations between a different pair of parties. It is 1635 a goal of this protocol to permit both options to interoperate 1636 freely. 1638 Each time a new s0 is calculated, a new retained shared secret rs1 is 1639 generated and stored in the cache, indexed by the ZID of the other 1640 endpoint. This cache updating is described in Section 4.6.1. For 1641 the new retained shared secret, each endpoint chooses a cache 1642 expiration value which is an unsigned 32 bit integer of the number of 1643 seconds that this secret should be retained in the cache. The time 1644 interval is relative to when the Confirm1 message is sent or 1645 received. 1647 The cache intervals are exchanged in the Confirm1 and Confirm2 1648 messages (Figure 10). The actual cache interval used by both 1649 endpoints is the minimum of the values from the Confirm1 and Confirm2 1650 messages. A value of 0 seconds means the newly-computed shared 1651 secret SHOULD NOT be stored in the cache, and if a cache entry 1652 already exists from an earlier call, the stored cache interval should 1653 be set to 0. This means if either Confirm message contains a null 1654 cache expiration interval, and there is no cache entry already 1655 defined, no new cache entry is created. A value of 0xffffffff means 1656 the secret should be cached indefinitely and is the recommended 1657 value. If the ZRTP exchange is Multistream Mode, the field in the 1658 Confirm1 and Confirm2 is set to 0xffffffff and ignored, and the cache 1659 is not updated. 1661 The expiration interval need not be used to force the deletion of a 1662 shared secret from the cache when the interval has expired. It just 1663 means the shared secret MAY be deleted from that cache at any point 1664 after the interval has expired without causing the other party to 1665 note it as an unexpected security event when the next key negotiation 1666 occurs between the same two parties. This means there need not be 1667 perfectly synchronized deletion of expired secrets from the two 1668 caches, and makes it easy to avoid a race condition that might 1669 otherwise be caused by clock skew. 1671 If the expiration interval is not properly agreed to by both 1672 endpoints, it may later result in false alarms of MiTM attacks, due 1673 to apparent cache mismatches (Section 4.3.2). 1675 4.9.1. Cacheless implementations 1677 It is possible to implement a simplified but nonetheless useful 1678 profile of the ZRTP protocol that does not support any caching of 1679 shared secrets. In this case the users would have to rely 1680 exclusively on the verbal SAS comparison for every call. That is, 1681 unless MiTM protection is provided by the mechanisms in Section 8.1.1 1682 or Section 7.2, which introduce their own forms of complexity. 1684 If a ZRTP endpoint does not support caching of shared secrets, it 1685 MUST set the cache expiration interval to zero, and MUST set the SAS 1686 Verified (V) flag (Section 7.1) to false. In addition, because the 1687 ZID serves mainly as a cache index, the ZID would not be required to 1688 maintain the same value across separate SIP sessions, although there 1689 is no reason why it should not. 1691 Cacheless operation would sacrifice the key continuity (Section 15.1) 1692 features, as well as Preshared mode (Section 4.4.2). There would 1693 also be no PBX trusted MiTM (Section 7.3) features, including the PBX 1694 security enrollment (Section 7.3.1) mechanism. 1696 5. ZRTP Messages 1698 All ZRTP messages use the message format defined in Figure 2. All 1699 word lengths referenced in this specification are 32 bits or 4 1700 octets. All integer fields are carried in network byte order, that 1701 is, most significant byte (octet) first, commonly known as big- 1702 endian. 1704 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1705 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1706 |0 0 0 1|Not Used (set to zero) | Sequence Number | 1707 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1708 | ZRTP Magic Cookie (0x5a525450) | 1709 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1710 | Source Identifier | 1711 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1712 | | 1713 | ZRTP Message (length depends on Message Type) | 1714 | . . . | 1715 | | 1716 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1717 | CRC (1 word) | 1718 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1720 Figure 2: ZRTP Packet Format 1722 The Sequence Number is a count that is incremented for each ZRTP 1723 packet sent. The count is initialized to a random value. This is 1724 useful in estimating ZRTP packet loss and also detecting when ZRTP 1725 packets arrive out of sequence. 1727 The ZRTP Magic Cookie is a 32 bit string that uniquely identifies a 1728 ZRTP packet, and has the value 0x5a525450. 1730 Source Identifier is the SSRC number of the RTP stream that this ZRTP 1731 packet relates to. For cases of forking or forwarding, RTP and hence 1732 ZRTP may arrive at the same port from several different sources - 1733 each of these sources will have a different SSRC and may initiate an 1734 independent ZRTP protocol session. 1736 This format is clearly identifiable as non-RTP due to the first two 1737 bits being zero which looks like RTP version 0, which is not a valid 1738 RTP version number. It is clearly distinguishable from STUN since 1739 the magic cookies are different. The 12 not used bits are set to 1740 zero and MUST be ignored when received. 1742 The ZRTP Messages are defined in Figure 3 to Figure 17 and are of 1743 variable length. 1745 The ZRTP protocol uses a 32 bit CRC checksum in each ZRTP packet as 1746 defined in RFC 3309 [RFC3309] to detect transmission errors. ZRTP 1747 packets are typically transported by UDP, which carries its own 1748 built-in 16-bit checksum for integrity, but ZRTP does not rely on it. 1749 This is because of the effect of an undetected transmission error in 1750 a ZRTP message. For example, an undetected error in the DH exchange 1751 could appear to be an active man-in-the-middle attack. The 1752 psychological effects of a false announcement of this by ZRTP clients 1753 can not be overstated. The probability of such a false alarm hinges 1754 on a mere 16-bit checksum that usually protects UDP packets, so more 1755 error detection is needed. For these reasons, this belt-and- 1756 suspenders approach is used to minimize the chance of a transmission 1757 error affecting the ZRTP key agreement. 1759 The CRC is calculated across the entire ZRTP packet shown in 1760 Figure 2, including the ZRTP Header and the ZRTP Message, but not 1761 including the CRC field. If a ZRTP message fails the CRC check, it 1762 is silently discarded. 1764 5.1. ZRTP Message Formats 1766 ZRTP messages are designed to simplify endpoint parsing requirements 1767 and to reduce the opportunities for buffer overflow attacks (a good 1768 goal of any security extension should be to not introduce new attack 1769 vectors). 1771 ZRTP uses a block of 8 octets (2 words) to encode the Message Type. 4 1772 octets (1 word) blocks are used to encode Hash Type, Cipher Type, and 1773 Key Agreement Type, and Authentication Tag Type. The values in the 1774 blocks are ASCII strings which are extended with spaces (0x20) to 1775 make them the desired length. Currently defined block values are 1776 listed in Tables 1-6 below. 1778 Additional block values may be defined and used. 1780 ZRTP uses this ASCII encoding to simplify debugging and make it 1781 "Wireshark (Ethereal) friendly". 1783 5.1.1. Message Type Block 1785 Currently 14 Message Type Blocks are defined - they represent the set 1786 of ZRTP message primitives. ZRTP endpoints MUST support the Hello, 1787 HelloACK, Commit, DHPart1, DHPart2, Confirm1, Confirm2, Conf2ACK, 1788 SASrelay, RelayACK, Error and ErrorACK message types. ZRTP endpoints 1789 MAY support the GoClear and ClearACK messages. Additional messages 1790 may be defined in extensions to ZRTP. 1792 Message Type Block | Meaning 1793 --------------------------------------------------- 1794 "Hello " | Hello Message 1795 --------------------------------------------------- 1796 "HelloACK" | HelloACK Message 1797 --------------------------------------------------- 1798 "Commit " | Commit Message 1799 --------------------------------------------------- 1800 "DHPart1 " | DHPart1 Message 1801 --------------------------------------------------- 1802 "DHPart2 " | DHPart2 Message 1803 --------------------------------------------------- 1804 "Confirm1" | Confirm1 Message 1805 --------------------------------------------------- 1806 "Confirm2" | Confirm2 Message 1807 --------------------------------------------------- 1808 "Conf2ACK" | Conf2ACK Message 1809 --------------------------------------------------- 1810 "Error " | Error Message 1811 --------------------------------------------------- 1812 "ErrorACK" | ErrorACK Message 1813 --------------------------------------------------- 1814 "GoClear " | GoClear Message 1815 --------------------------------------------------- 1816 "ClearACK" | ClearACK Message 1817 --------------------------------------------------- 1818 "SASrelay" | SASrelay Message 1819 --------------------------------------------------- 1820 "RelayACK" | RelayACK Message 1821 --------------------------------------------------- 1823 Table 1. Message Type Block Values 1825 5.1.2. Hash Type Block 1827 Only one Hash Type is currently defined, SHA-256 [FIPS-180-2], and 1828 all ZRTP endpoints MUST support this hash. Additional Hash Types can 1829 be registered and used, such as the NIST SHA-3 hash [SHA-3] when it 1830 becomes available. Note that the Hash Type refers to the hash 1831 algorithm that will be used throughout the ZRTP key exchange, not the 1832 hash algorithm to be used in the SRTP Authentication Tag. 1834 ZRTP makes use of HMAC message authentication codes based on the 1835 negotiated Hash Type. The HMAC function is defined in [FIPS-198-1]. 1836 Test vectors for HMAC-SHA-256 may be found in [RFC4231]. The HMAC 1837 function based on the negotiated Hash Type is also used in the ZRTP 1838 key derivation function (Section 4.5.1). 1840 Hash Type Block | Meaning 1841 --------------------------------------------------- 1842 "S256" | SHA-256 Hash defined in FIPS 180-2 1843 --------------------------------------------------- 1845 Table 2. Hash Type Block Values 1847 All hashes and HMACs used throughout the ZRTP protocol will use the 1848 negotiated Hash Type, except for the special cases noted in 1849 Section 5.1.2.1. 1851 5.1.2.1. Implicit Hash and HMAC algorithm 1853 While most of the HMACs used in ZRTP are defined by the negotiated 1854 Hash Type (Section 5.1.2), some hashes and HMACs must be precomputed 1855 prior to negotiations, and thus cannot have their algorithms 1856 negotiated during the ZRTP exchange. They are implicitly 1857 predetermined to use SHA-256 [FIPS-180-2] and HMAC-SHA-256. 1859 These are the hashes and HMACs that MUST use the Implicit hash and 1860 HMAC algorithm: 1862 The hash chain H0-H3 defined in Section 9. 1863 The HMACs that are keyed by this hash chain, as defined in 1864 Section 8.1.1. 1865 The Hello Hash in the a=zrtp-hash attribute defined in 1866 Section 8.1. 1868 ZRTP defines a method for negotiating different ZRTP protocol 1869 versions (Section 4.1.1). SHA-256 is the Implicit Hash for ZRTP 1870 protocol version 1.10. Future ZRTP protocol versions may, if 1871 appropriate, use another hash algorithm as the Implicit Hash, such as 1872 the NIST SHA-3 hash [SHA-3] when it becomes available. For example, 1873 a future SIP packet may list two a=zrtp-hash SDP attributes, one 1874 based on SHA-256 for ZRTP version 1.10, and another based on SHA-3 1875 for ZRTP version 2.00. 1877 5.1.3. Cipher Type Block 1879 All ZRTP endpoints MUST support AES-128 (AES1) and MAY support AES- 1880 256 (AES3) or other Cipher Types. The choice of the AES key length 1881 is coupled to the Key Agreement type, as explained in Section 5.1.5. 1883 The use of AES-128 in SRTP is defined by [RFC3711]. The use of AES- 1884 256 in SRTP is defined by [I-D.ietf-avt-srtp-big-aes]. 1886 Cipher Type Block | Meaning 1887 --------------------------------------------------- 1888 "AES1" | AES-CM with 128 bit keys 1889 | as defined in RFC 3711 1890 --------------------------------------------------- 1891 "AES3" | AES-CM with 256 bit keys 1892 | 1893 --------------------------------------------------- 1895 Table 3. Cipher Type Block Values 1897 5.1.4. Auth Tag Type Block 1899 All ZRTP endpoints MUST support HMAC-SHA1 authentication tags for 1900 SRTP, with both 32 bit and 80 bit length tags as defined in 1901 [RFC3711]. 1903 Auth Tag Type Block | Meaning 1904 --------------------------------------------------- 1905 "HS32" | HMAC-SHA1 32 bit authentication 1906 | tag as defined in RFC 3711 1907 --------------------------------------------------- 1908 "HS80" | HMAC-SHA1 80 bit authentication 1909 | tag as defined in RFC 3711 1910 --------------------------------------------------- 1912 Table 4. Auth Tag Type Values 1914 5.1.5. Key Agreement Type Block 1916 All ZRTP endpoints MUST support DH3k, SHOULD support Preshared, and 1917 MAY support EC25, EC38, EC52, and DH2k. 1919 If a ZRTP endpoint supports multiple concurrent media streams, such 1920 as audio and video, it MUST support Multistream (Section 4.4.3) mode. 1921 Also, if a ZRTP endpoint supports the GoClear message 1922 (Section 4.7.2), it SHOULD support Multistream, to be used if the two 1923 parties choose to return to the secure state after going Clear (as 1924 explained in Section 4.7.2.1). 1926 For Finite Field Diffie-Hellman, ZRTP endpoints MUST use the DH 1927 parameters defined in RFC 3526 [RFC3526], as follows. DH3k uses the 1928 3072-bit MODP group. DH2k uses the 2048-bit MODP group. The DH 1929 generator g is 2. The random Diffie-Hellman secret exponent SHOULD 1930 be twice as long as the AES key length. If AES-128 is used, the DH 1931 secret value SHOULD be 256 bits long. If AES-256 is used, the secret 1932 value SHOULD be 512 bits long. 1934 If Elliptic Curve DH is used, the ECDH algorithm and key generation 1935 is from NIST SP 800-56A [SP800-56A]. The curves used are from NSA 1936 Suite B [NSA-Suite-B], which uses the same curves as ECDSA defined by 1937 FIPS 186-3 [FIPS-186-3], and can also be found in RFC 4753 [RFC4753], 1938 sections 3.1 through 3.3. The validation procedures are from NIST SP 1939 800-56A [SP800-56A] section 5.6.2.6, method 3, ECC Partial 1940 Validation. Both the X and Y coordinates of the point on the curve 1941 are sent, in the first and second half of the ECDH public value, 1942 respectively. 1944 The choice of AES key length is coupled to the choice of key 1945 agreement type. If either EC38 or EC52 is chosen as the key 1946 agreement, AES-256 (AES3) SHOULD be used. If DH3K or EC25 is chosen, 1947 either AES-128 (AES1) or AES-256 (AES3) MAY be used. Note that SRTP 1948 as defined in RFC 3711 [RFC3711] only supports AES-128. 1950 DH2k is intended for low power applications, or for applications that 1951 require fast key negotiations, and may be used with AES-128. DH2k is 1952 not recommended for high security applications. Its security can be 1953 augmented by implementing ZRTP's key continuity features 1954 (Section 15.1). 1956 ECDH-521 is not recommended for most applications, due to 1957 inconvenient computational delays. It should not be used except when 1958 both endpoints are known to have very fast hardware. Note that ECDH- 1959 521 is not part of NSA Suite B. 1961 ZRTP also defines two non-DH modes, Multistream and Preshared, in 1962 which the SRTP key is derived from a shared secret and some nonce 1963 material. 1965 Table 5 lists the pv length in words and DHPart1 and DHPart2 message 1966 length in words for each Key Agreement Type Block. 1968 Key Agreement | pv | message | Meaning 1969 Type Block | words | words | 1970 ----------------------------------------------------------- 1971 "DH3k" | 96 | 117 | DH mode with p=3072 bit prime 1972 | | | per RFC 3526, section 4. 1973 ----------------------------------------------------------- 1974 "DH2k" | 64 | 85 | DH mode with p=2048 bit prime 1975 | | | per RFC 3526, section 3. 1976 ----------------------------------------------------------- 1977 "EC25" | 16 | 37 | Elliptic Curve DH, P-256 1978 | | | per RFC 4753, section 3.1 1979 ----------------------------------------------------------- 1980 "EC38" | 24 | 45 | Elliptic Curve DH, P-384 1981 | | | per RFC 4753, section 3.2 1982 ----------------------------------------------------------- 1983 "EC52" | 33 | 54 | Elliptic Curve DH, P-521 1984 | | | per RFC 4753, section 3.3 1985 ----------------------------------------------------------- 1986 "Prsh" | - | - | Preshared Non-DH mode 1987 | | | 1988 ----------------------------------------------------------- 1989 "Mult" | - | - | Multistream Non-DH mode 1990 | | | 1991 ----------------------------------------------------------- 1993 Table 5. Key Agreement Type Block Values 1995 5.1.6. SAS Type Block 1997 The SAS Type determines how the SAS is rendered to the user so that 1998 the user may verbally compare it with his partner over the voice 1999 channel. This allows detection of a man-in-the-middle (MiTM) attack. 2001 All ZRTP endpoints MUST support the base32 and MAY support the 2002 base256 rendering schemes for the Short Authentication String, and 2003 other SAS rendering schemes. The ZRTP SAS rendering schemes are 2004 described in Section 7. 2006 SAS Type Block | Meaning 2007 --------------------------------------------------- 2008 "B32 " | Short Authentication String using 2009 | base32 encoding 2010 --------------------------------------------------- 2011 "B256" | Short Authentication String using 2012 | base256 encoding (PGP Word List) 2013 --------------------------------------------------- 2015 Table 6. SAS Type Block Values 2017 5.1.7. Signature Type Block 2019 The signature type block is a 4 octet (1 word) block used to 2020 represent the signature algorithm discussed in Section 7.2. 2021 Suggested signature algorithms and key lengths are a future subject 2022 of standardization. 2024 5.2. Hello message 2026 The Hello message has the format shown in Figure 3. 2028 All ZRTP messages begin with the preamble value 0x505a, then a 16 bit 2029 length in 32 bit words. This length includes only the ZRTP message 2030 (including the preamble and the length) but not the ZRTP packet 2031 header or CRC. The 8-octet Message Type follows the length field. 2033 Next is a 4 character string containing the version (ver) of the ZRTP 2034 protocol which is "1.10" for this specification. Next is the Client 2035 Identifier string (cid) which is 4 words long and identifies the 2036 vendor and release of the ZRTP software. The 256-bit hash image H3 2037 is defined in Section 9. The next parameter is the ZID, the 96 bit 2038 long unique identifier for the ZRTP endpoint, defined in Section 4.9. 2040 The next four bits contains flag bits. The MiTM flag (M) is a 2041 Boolean that is set to true if and only if this Hello message is sent 2042 from a device, usually a PBX, that has the capability to send an 2043 SASrelay message (Section 5.13). The Passive flag (P) is a Boolean 2044 normally set to False. A ZRTP endpoint which is configured to never 2045 initiate secure sessions is regarded as passive, and would set the P 2046 bit to True. The next 8 bits are unused and SHOULD be set to zero 2047 when sent and MUST be ignored on receipt. 2049 Next is a list of supported Hash algorithms, Cipher algorithms, SRTP 2050 Auth Tag types, Key Agreement types, and SAS types. The number of 2051 listed algorithms are listed for each type: hc=hash count, cc=cipher 2052 count, ac=auth tag count, kc=key agreement count, and sc=sas count. 2053 The values for these algorithms are defined in Tables 2, 3, 4, 5, and 2054 6. A count of zero means that only the mandatory to implement 2055 algorithms are supported. Mandatory algorithms MAY be included in 2056 the list. The order of the list indicates the preferences of the 2057 endpoint. If a mandatory algorithm is not included in the list, it 2058 is added to the end of the list for preference. 2060 The 64-bit HMAC at the end of the message is computed across the 2061 whole message, not including the HMAC. The HMAC key is the sender's 2062 H2 (defined in Section 9), and thus the HMAC cannot be checked by the 2063 receiving party until the sender's H2 value is known to the receiving 2064 party later in the protocol. 2066 0 1 2 3 2067 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2068 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2069 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length | 2070 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2071 | Message Type Block="Hello " (2 words) | 2072 | | 2073 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2074 | version="1.10" (1 word) | 2075 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2076 | | 2077 | Client Identifier (4 words) | 2078 | | 2079 | | 2080 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2081 | | 2082 | Hash image H3 (8 words) | 2083 | . . . | 2084 | | 2085 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2086 | | 2087 | ZID (3 words) | 2088 | | 2089 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2090 |0|0|M|P| unused (zeros)| hc | cc | ac | kc | sc | 2091 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2092 | hash algorthms (0 to 7 values) | 2093 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2094 | cipher algorthms (0 to 7 values) | 2095 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2096 | auth tag types (0 to 7 values) | 2097 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2098 | key agreement types (0 to 7 values) | 2099 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2100 | SAS types (0 to 7 values) | 2101 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2102 | HMAC (2 words) | 2103 | | 2104 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2106 Figure 3: Hello message format 2108 5.3. HelloACK message 2110 The HelloACK message is used to stop retransmissions of a Hello 2111 message. A HelloACK is sent regardless if the version number in the 2112 Hello is supported or the algorithm list supported. The receipt of a 2113 HelloACK stops retransmission of the Hello message. The format is 2114 shown in the Figure below. Note that a Commit message can be sent in 2115 place of a HelloACK by an Initiator. 2117 0 1 2 3 2118 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2119 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2120 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 2121 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2122 | Message Type Block="HelloACK" (2 words) | 2123 | | 2124 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2126 Figure 4: HelloACK message format 2128 5.4. Commit message 2130 The Commit message is sent to initiate the key agreement process 2131 after both sides have received a Hello message, which means it can 2132 only be sent after receiving both a Hello message and a HelloACK 2133 message. There are three subtypes of Commit messages, whose formats 2134 are shown in Figure 5, Figure 6, and Figure 7. 2136 The Commit message contains the Message Type Block, then the 256-bit 2137 hash image H2 which is defined in Section 9. The next parameter is 2138 the initiator's ZID, the 96 bit long unique identifier for the ZRTP 2139 endpoint, which must have the same value as was used in the Hello 2140 message. 2142 Next is a list of algorithms selected by the initiator (hash, cipher, 2143 auth tag type, key agreement, sas type). For a DH Commit, the hash 2144 value hvi is a hash of the DHPart2 of the Initiator and the 2145 Responder's Hello message, as explained in Section 4.4.1.1. 2147 The 64-bit HMAC at the end of the message is computed across the 2148 whole message, not including the HMAC. The HMAC key is the sender's 2149 H1 (defined in Section 9), and thus the HMAC cannot be checked by the 2150 receiving party until the sender's H1 value is known to the receiving 2151 party later in the protocol. 2153 0 1 2 3 2154 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2155 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2156 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=29 words | 2157 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2158 | Message Type Block="Commit " (2 words) | 2159 | | 2160 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2161 | | 2162 | Hash image H2 (8 words) | 2163 | . . . | 2164 | | 2165 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2166 | | 2167 | ZID (3 words) | 2168 | | 2169 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2170 | hash algorihm | 2171 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2172 | cipher algorihm | 2173 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2174 | auth tag type | 2175 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2176 | key agreement type | 2177 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2178 | SAS type | 2179 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2180 | | 2181 | hvi (8 words) | 2182 | . . . | 2183 | | 2184 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2185 | HMAC (2 words) | 2186 | | 2187 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2189 Figure 5: DH Commit message format 2191 0 1 2 3 2192 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2193 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2194 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=25 words | 2195 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2196 | Message Type Block="Commit " (2 words) | 2197 | | 2198 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2199 | | 2200 | Hash image H2 (8 words) | 2201 | . . . | 2202 | | 2203 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2204 | | 2205 | ZID (3 words) | 2206 | | 2207 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2208 | hash algorihm | 2209 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2210 | cipher algorihm | 2211 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2212 | auth tag type | 2213 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2214 | key agreement type = "Mult" | 2215 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2216 | SAS type | 2217 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2218 | | 2219 | nonce (4 words) | 2220 | . . . | 2221 | | 2222 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2223 | HMAC (2 words) | 2224 | | 2225 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2227 Figure 6: Multistream Commit message format 2229 0 1 2 3 2230 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2231 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2232 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=27 words | 2233 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2234 | Message Type Block="Commit " (2 words) | 2235 | | 2236 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2237 | | 2238 | Hash image H2 (8 words) | 2239 | . . . | 2240 | | 2241 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2242 | | 2243 | ZID (3 words) | 2244 | | 2245 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2246 | hash algorihm | 2247 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2248 | cipher algorihm | 2249 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2250 | auth tag type | 2251 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2252 | key agreement type = "Prsh" | 2253 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2254 | SAS type | 2255 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2256 | | 2257 | nonce (4 words) | 2258 | . . . | 2259 | | 2260 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2261 | keyID (2 words) | 2262 | | 2263 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2264 | HMAC (2 words) | 2265 | | 2266 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2268 Figure 7: Preshared Commit message format 2270 5.5. DHPart1 message 2272 The DHPart1 message begins the DH exchange. The format is shown in 2273 Figure 8 below. The DHPart1 message is sent by the Responder if a 2274 valid Commit message is received from the Initiator. The length of 2275 the pvr value and the length of the DHPart1 message depends on the 2276 Key Agreement Type chosen. This information is contained in Table 5. 2278 Note that for both Multistream and Preshared modes, no DHPart1 or 2279 DHPart2 message will be sent. 2281 The 256-bit hash image H1 is defined in Section 9. 2283 The next four parameters are HMACs of potential shared secrets used 2284 in generating the ZRTP secret. The first two, rs1IDr and rs2IDr, are 2285 the HMACs of the responder's two retained shared secrets, truncated 2286 to 64 bits. Next is auxsecretIDr, the HMAC of the responder's 2287 auxsecret (defined in Section 4.3), truncated to 64 bits. The last 2288 parameter is the HMAC of the trusted MiTM PBX shared secret 2289 pbxsecret, defined in Section 7.3.1. The Message format for the 2290 DHPart1 message is shown in Figure 8. 2292 The 64-bit HMAC at the end of the message is computed across the 2293 whole message, not including the HMAC. The HMAC key is the sender's 2294 H0 (defined in Section 9), and thus the HMAC cannot be checked by the 2295 receiving party until the sender's H0 value is known to the receiving 2296 party later in the protocol. 2298 0 1 2 3 2299 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2300 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2301 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on KA Type | 2302 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2303 | Message Type Block="DHPart1 " (2 words) | 2304 | | 2305 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2306 | | 2307 | Hash image H1 (8 words) | 2308 | . . . | 2309 | | 2310 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2311 | rs1IDr (2 words) | 2312 | | 2313 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2314 | rs2IDr (2 words) | 2315 | | 2316 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2317 | auxsecretIDr (2 words) | 2318 | | 2319 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2320 | pbxsecretIDr (2 words) | 2321 | | 2322 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2323 | | 2324 | pvr (length depends on KA Type) | 2325 | . . . | 2326 | | 2327 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2328 | HMAC (2 words) | 2329 | | 2330 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2332 Figure 8: DHPart1 message format 2334 5.6. DHPart2 message 2336 The DHPart2 message completes the DH exchange. A DHPart2 message is 2337 sent by the Initiator if a valid DHPart1 message is received from the 2338 Responder. The length of the pvr value and the length of the DHPart2 2339 message depends on the Key Agreement Type chosen. This information 2340 is contained in Table 5. Note that for both Multistream and 2341 Preshared modes, no DHPart1 or DHPart2 message will be sent. 2343 The 256-bit hash image H1 is defined in Section 9. 2345 The next four parameters are HMACs of potential shared secrets used 2346 in generating the ZRTP secret. The first two, rs1IDi and rs2IDi, are 2347 the HMACs of the initiator's two retained shared secrets, truncated 2348 to 64 bits. Next is auxsecretIDi, the HMAC of the initiator's 2349 auxsecret (defined in Section 4.3), truncated to 64 bits. The last 2350 parameter is the HMAC of the trusted MiTM PBX shared secret 2351 pbxsecret, defined in Section 7.3.1. The message format for the 2352 DHPart2 message is shown in Figure 9. 2354 The 64-bit HMAC at the end of the message is computed across the 2355 whole message, not including the HMAC. The HMAC key is the sender's 2356 H0 (defined in Section 9), and thus the HMAC cannot be checked by the 2357 receiving party until the sender's H0 value is known to the receiving 2358 party later in the protocol. 2360 0 1 2 3 2361 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2362 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2363 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on KA Type | 2364 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2365 | Message Type Block="DHPart2 " (2 words) | 2366 | | 2367 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2368 | | 2369 | Hash image H1 (8 words) | 2370 | . . . | 2371 | | 2372 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2373 | rs1IDi (2 words) | 2374 | | 2375 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2376 | rs2IDi (2 words) | 2377 | | 2378 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2379 | auxsecretIDi (2 words) | 2380 | | 2381 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2382 | pbxsecretIDi (2 words) | 2383 | | 2384 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2385 | | 2386 | pvi (length depends on KA Type) | 2387 | . . . | 2388 | | 2389 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2390 | HMAC (2 words) | 2391 | | 2392 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2393 Figure 9: DHPart2 message format 2395 5.7. Confirm1 and Confirm2 messages 2397 The Confirm1 message is sent by the Responder in response to a valid 2398 DHPart2 message after the SRTP session key and parameters have been 2399 negotiated. The Confirm2 message is sent by the Initiator in 2400 response to a Confirm1 message. The format is shown in Figure 10 2401 below. The message contains the Message Type Block "Confirm1" or 2402 "Confirm2". Next is the HMAC, a keyed hash over encrypted part of 2403 the message (shown enclosed by "===" in Figure 10). This HMAC is 2404 keyed and computed according to Section 4.6. The next 16 octets 2405 contain the CFB Initialization Vector. The rest of the message is 2406 encrypted using CFB and protected by the HMAC. 2408 The first field inside the encrypted region is the hash pre-image H0, 2409 which is defined in detail in Section 9. 2411 The next 15 bits are not used and SHOULD be set to zero when sent and 2412 MUST be ignored when received in Confirm1 or Confirm2 messages. 2414 The next 9 bits contain the signature length. If no SAS signature 2415 (described in Section 7.2) is present, all bits are set to zero. The 2416 signature length is in words and includes the signature type block. 2417 If the calculated signature octet count is not a multiple of 4, zeros 2418 are added to pad it out to a word boundary. If no signature block is 2419 present, the overall length of the Confirm1 or Confirm2 Message will 2420 be set to 19 words. 2422 The next 8 bits are used for flags. Undefined flags are set to zero 2423 and ignored. Four flags are currently defined. The PBX Enrollment 2424 flag (E) is a Boolean bit defined in Section 7.3.1. The SAS Verified 2425 flag (V) is a Boolean bit defined in Section 7.1. The Allow Clear 2426 flag (A) is a Boolean bit defined in Section 4.7.2. The Disclosure 2427 Flag (D) is a Boolean bit defined in Section 11. The cache 2428 expiration interval is defined in Section 4.9. 2430 If the signature length (in words) is non-zero, a signature type 2431 block will be present along with a signature block. Next is the 2432 signature block. The signature block includes the key used to 2433 generate the signature (Section 7.2). 2435 CFB [SP800-38A] mode is applied with a feedback length of 128-bits, a 2436 full cipher block, and the final block is truncated to match the 2437 exact length of the encrypted data. The CFB Initialization Vector is 2438 a 128 bit random nonce. The block cipher algorithm and the key size 2439 is the same as what was negotiated for the media encryption. CFB is 2440 used to encrypt the part of the Confirm1 message beginning after the 2441 CFB IV to the end of the message (the encrypted region is enclosed by 2442 "======" in Figure 10). 2444 The responder uses the zrtpkeyr to encrypt the Confirm1 message. The 2445 initiator uses the zrtpkeyi to encrypt the Confirm2 message. 2447 0 1 2 3 2448 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2450 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=variable | 2451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2452 | Message Type Block="Confirm1" or "Confirm2" (2 words) | 2453 | | 2454 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2455 | HMAC (2 words) | 2456 | | 2457 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2458 | | 2459 | CFB Initialization Vector (4 words) | 2460 | | 2461 | | 2462 +===============================================================+ 2463 | | 2464 | Hash pre-image H0 (8 words) | 2465 | . . . | 2466 | | 2467 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2468 | Unused (15 bits of zeros) | sig len (9 bits)|0 0 0 0|E|V|A|D| 2469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2470 | cache expiration interval (1 word) | 2471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2472 | optional signature type block (1 word if present) | 2473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2474 | | 2475 | optional signature block (variable length) | 2476 | . . . | 2477 | | 2478 | | 2479 +===============================================================+ 2481 Figure 10: Confirm1 and Confirm2 message format 2483 5.8. Conf2ACK message 2485 The Conf2ACK message is sent by the Responder in response to a valid 2486 Confirm2 message. The message format for the Conf2ACK is shown in 2487 the Figure below. The receipt of a Conf2ACK stops retransmission of 2488 the Confirm2 message. Note that the first SRTP media (with a valid 2489 SRTP auth tag) from the responder also stops retransmission of the 2490 Confirm2 message. 2492 0 1 2 3 2493 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2494 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2495 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 2496 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2497 | Message Type Block="Conf2ACK" (2 words) | 2498 | | 2499 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2501 Figure 11: Conf2ACK message format 2503 5.9. Error message 2505 The Error message is sent to terminate an in-process ZRTP key 2506 agreement exchange due to an error. The format is shown in the 2507 Figure below. The use of the Error message is described in 2508 Section 4.7.1. 2510 0 1 2 3 2511 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2512 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2513 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=4 words | 2514 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2515 | Message Type Block="Error " (2 words) | 2516 | | 2517 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2518 | Integer Error Code (1 word) | 2519 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2521 Figure 12: Error message format 2523 Defined hexadecimal values for the Error Code are listed in Table 7. 2525 Error Code | Meaning 2526 ----------------------------------------------------------- 2527 0x10 | Malformed packet (CRC OK, but wrong structure) 2528 ----------------------------------------------------------- 2529 0x20 | Critical software error 2530 ----------------------------------------------------------- 2531 0x30 | Unsupported ZRTP version 2532 ----------------------------------------------------------- 2533 0x40 | Hello components mismatch 2534 ----------------------------------------------------------- 2535 0x51 | Hash type not supported 2536 ----------------------------------------------------------- 2537 0x52 | Cipher type not supported 2538 ----------------------------------------------------------- 2539 0x53 | Public key exchange not supported 2540 ----------------------------------------------------------- 2541 0x54 | SRTP auth. tag not supported 2542 ----------------------------------------------------------- 2543 0x55 | SAS scheme not supported 2544 ----------------------------------------------------------- 2545 0x56 | No shared secret available, DH mode required 2546 ----------------------------------------------------------- 2547 0x61 | DH Error: bad pvi or pvr ( == 1, 0, or p-1) 2548 ----------------------------------------------------------- 2549 0x62 | DH Error: hvi != hashed data 2550 ----------------------------------------------------------- 2551 0x63 | Received relayed SAS from untrusted MiTM 2552 ----------------------------------------------------------- 2553 0x70 | Auth. Error: Bad Confirm pkt HMAC 2554 ----------------------------------------------------------- 2555 0x80 | Nonce reuse 2556 ----------------------------------------------------------- 2557 0x90 | Equal ZIDs in Hello 2558 ----------------------------------------------------------- 2559 0xA0 | Service unavailable 2560 ----------------------------------------------------------- 2561 0x100 | GoClear packet received, but not allowed 2562 ----------------------------------------------------------- 2564 Table 7. ZRTP Error Codes 2566 5.10. ErrorACK message 2568 The ErrorACK message is sent in response to an Error message. The 2569 receipt of an ErrorACK stops retransmission of the Error message. 2570 The format is shown in the Figure below. 2572 0 1 2 3 2573 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2574 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2575 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 2576 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2577 | Message Type Block="ErrorACK" (2 words) | 2578 | | 2579 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2581 Figure 13: ErrorAck message format 2583 5.11. GoClear message 2585 Support for the GoClear message is OPTIONAL in the protocol, and it 2586 is sent to switch from SRTP to RTP. The format is shown in the 2587 Figure below. The clear_hmac is used to authenticate the GoClear 2588 message so that bogus GoClear messages introduced by an attacker can 2589 be detected and discarded. The use of GoClear is described in 2590 Section 4.7.2. 2592 0 1 2 3 2593 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2594 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2595 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=5 words | 2596 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2597 | Message Type Block="GoClear " (2 words) | 2598 | | 2599 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2600 | clear_hmac (2 words) | 2601 | | 2602 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2604 Figure 14: GoClear message format 2606 5.12. ClearACK message 2608 Support for the ClearACK message is OPTIONAL in the protocol, and it 2609 is sent to acknowledge receipt of a GoClear. A ClearACK is only sent 2610 if the clear_hmac from the GoClear message is authenticated. 2611 Otherwise, no response is returned. The format is shown in the 2612 Figure below. 2614 0 1 2 3 2615 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2616 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2617 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 2618 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2619 | Message Type Block="ClearACK" (2 words) | 2620 | | 2621 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2623 Figure 15: ClearAck message format 2625 5.13. SASrelay message 2627 The SASrelay message is sent by a trusted Man in The Middle (MiTM), 2628 most often a PBX. It is not sent as a response to a packet, but is 2629 sent as a self-initiated packet by the trusted MiTM. It can only be 2630 sent after the rest of the ZRTP key negotiations have completed, 2631 after the Confirm packets and their ACKs. It can only be sent after 2632 the trusted MiTM has finished key negotiations with the other party, 2633 because it is the other party's SAS that is being relayed. It is 2634 sent with retry logic until a RelayACK message (Section 5.14) is 2635 received or the retry schedule has been exhausted. 2637 If a device, usually a PBX, sends an SASrelay message, it MUST have 2638 previously declared itself as a MiTM device by setting the MiTM (M) 2639 flag in the Hello message (Section 5.2). If the receiver of the 2640 SASrelay message did not previously receive a Hello message with the 2641 MiTM (M) flag set, the Relayed SAS SHOULD NOT be rendered. A 2642 RelayACK is still sent, but no Error message is sent. 2644 The SASrelay message format is shown in Figure 16 below. The message 2645 contains the Message Type Block "SASrelay". Next is the HMAC, a 2646 keyed hash over encrypted part of the message (shown enclosed by 2647 "===" in Figure 16). This HMAC is keyed the same way as the HMAC in 2648 the Confirm messages (see Section 4.6). The next 16 octets contain 2649 the CFB Initialization Vector. The rest of the message is encrypted 2650 using CFB and protected by the HMAC. 2652 The next 15 bits are not used and SHOULD be set to zero when sent and 2653 MUST be ignored when received in SASrelay messages. 2655 The next 9 bits contain the signature length. The trusted MiTM MAY 2656 compute a digital signature on the SAS hash, as described in 2657 Section 7.2, using a persistant signing key owned by the trusted 2658 MiTM. If no SAS signature is present, all bits are set to zero. The 2659 signature length is in words and includes the signature type block. 2660 If the calculated signature octet count is not a multiple of 4, zeros 2661 are added to pad it out to a word boundary. If no signature block is 2662 present, the overall length of the SASrelay Message will be set to 12 2663 words. 2665 The next 8 bits are used for flags. Undefined flags are set to zero 2666 and ignored. Three flags are currently defined. The Disclosure Flag 2667 (D) is a Boolean bit defined in Section 11. The Allow Clear flag (A) 2668 is a Boolean bit defined in Section 4.7.2. The SAS Verified flag (V) 2669 is a Boolean bit defined in Section 7.1. These flags are updated 2670 values to the same flags provided earlier in the Confirm packet, but 2671 they are updated to reflect the new flag information relayed by the 2672 PBX from the other party. 2674 The next 32 bit word contains the rendering scheme for the relayed 2675 sasvalue, which will be the same rendering scheme used by the other 2676 party on the other side of the trusted MiTM. Section 7.3 describes 2677 how the PBX determines whether the ZRTP client regards the PBX as a 2678 trusted MiTM. If the PBX determines that the ZRTP client trusts the 2679 PBX, the next 32 bit word contains the binary sasvalue relayed from 2680 the other party. If this SASrelay packet is being sent to a ZRTP 2681 client that does not trust this MiTM, the next 32 bit word will be 2682 ignored by the recipient and should be set to zero by the PBX. 2684 If the signature length (in words) is non-zero, a signature type 2685 block will be present along with a signature block. Next is the 2686 signature block. The signature block includes the key used to 2687 generate the signature (Section 7.2). 2689 CFB [SP800-38A] mode is applied with a feedback length of 128-bits, a 2690 full cipher block, and the final block is truncated to match the 2691 exact length of the encrypted data. The CFB Initialization Vector is 2692 a 128 bit random nonce. The block cipher algorithm and the key size 2693 is the same as what was negotiated for the media encryption. CFB is 2694 used to encrypt the part of the SASrelay message beginning after the 2695 CFB IV to the end of the message (the encrypted region is enclosed by 2696 "======" in Figure 16). 2698 Depending on whether the trusted MiTM had taken the role of the 2699 initiator or the responder during the ZRTP key negotiation, the 2700 SASrelay message is encrypted with zrtpkeyi or zrtpkeyr. 2702 0 1 2 3 2703 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2704 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2705 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=variable | 2706 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2707 | Message Type Block="SASrelay" (2 words) | 2708 | | 2709 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2710 | HMAC (2 words) | 2711 | | 2712 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2713 | | 2714 | CFB Initialization Vector (4 words) | 2715 | | 2716 | | 2717 +===============================================================+ 2718 | Unused (15 bits of zeros) | sig len (9 bits)|0 0 0 0|0|V|A|D| 2719 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2720 | rendering scheme of relayed sasvalue (1 word) | 2721 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2722 | Trusted MiTM relayed sasvalue (1 word) | 2723 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2724 | optional signature type block (1 word if present) | 2725 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2726 | | 2727 | optional signature block (variable length) | 2728 | . . . | 2729 | | 2730 | | 2731 +===============================================================+ 2733 Figure 16: SASrelay message format 2735 5.14. RelayACK message 2737 The RelayACK message is sent in response to a valid SASrelay message. 2738 The message format for the RelayACK is shown in the Figure below. 2739 The receipt of a RelayACK stops retransmission of the SASrelay 2740 message. 2742 0 1 2 3 2743 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2744 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2745 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 2746 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2747 | Message Type Block="RelayACK" (2 words) | 2748 | | 2749 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2751 Figure 17: RelayACK message format 2753 6. Retransmissions 2755 ZRTP uses two retransmission timers T1 and T2. T1 is used for 2756 retransmission of Hello messages, when the support of ZRTP by the 2757 other endpoint may not be known. T2 is used in retransmissions of 2758 all the other ZRTP messages. 2760 All message retransmissions MUST be identical to the initial message 2761 including nonces, public values, etc; otherwise, hashes of the 2762 message sequences may not agree. 2764 Practical experience has shown that RTP packet loss at the start of 2765 an RTP session can be extremely high. Since the entire ZRTP message 2766 exchange occurs during this period, the defined retransmission scheme 2767 is defined to be aggressive. Since ZRTP packets with the exception 2768 of the DHPart1 and DHPart2 messages are small, this should have 2769 minimal effect on overall bandwidth utilization of the media session. 2771 ZRTP endpoints MUST NOT exceed the bandwidth of the resulting media 2772 session as determined by the offer/answer exchange in the signaling 2773 layer. 2775 Hello ZRTP messages are retransmitted at an interval that starts at 2776 T1 seconds and doubles after every retransmission, capping at 200ms. 2777 T1 has a recommended initial value of 50 ms. A Hello message is 2778 retransmitted 20 times before giving up, which means the entire retry 2779 schedule for Hello messages is exhausted after 3.75 seconds (50 + 100 2780 + 18*200 ms). Retransmission of a Hello ends upon receipt of a 2781 HelloACK or Commit message. 2783 The post-Hello ZRTP messages are retransmitted only by the session 2784 initiator - that is, only Commit, DHPart2, and Confirm2 are 2785 retransmitted if the corresponding message from the responder, 2786 DHPart1, Confirm1, and Conf2ACK, are not received. Note that the 2787 Confirm2 message retransmission can also be stopped by receiving the 2788 first SRTP media (with a valid SRTP auth tag) from the responder. 2790 The GoClear, Error, and SASrelay messages may be initiated and 2791 retransmitted by either party, and responded to by the other party, 2792 regardless of which party is the overall session initiator. They are 2793 retransmitted if the corresponding response message ClearACK, 2794 ErrorACK, and RelayACK, are not received. 2796 Non-Hello ZRTP messages are retransmitted at an interval that starts 2797 at T2 seconds and doubles after every retransmission, capping at 2798 600ms. T2 has a recommended initial value of 150 ms. Each non-Hello 2799 message is retransmitted 10 times before giving up, which means the 2800 entire retry schedule is exhausted after 5.25 seconds (150 + 300 + 2801 8*600 ms). Only the initiator performs retransmissions. Each 2802 message has a response message that stops retransmissions, as shown 2803 below in Table 8. The higher values of T2 means that retransmissions 2804 will likely only occur with packet loss. 2806 These recommended retransmission intervals are designed for a typical 2807 broadband Internet connection. In some high latency communication 2808 channels, such as those provided by some mobile phone environments or 2809 geostationary satellites, the initial value for the T1 or T2 2810 retransmission timer should be increased to be no less than the round 2811 trip time provided by the communications channel. It should take 2812 into account the time required to transmit the entire message and the 2813 entire reply, as well as a reasonable time estimate to perform the DH 2814 calculation. 2816 Message Acknowledgement Message 2817 ------- ----------------------- 2818 Hello HelloACK or Commit 2819 Commit DHPart1 or Confirm1 2820 DHPart2 Confirm1 2821 Confirm2 Conf2ACK or SRTP media 2822 GoClear ClearACK 2823 Error ErrorACK 2824 SASrelay RelayACK 2826 Table 8. Retransmitted ZRTP Messages and Responses 2828 7. Short Authentication String 2830 This section will discuss the implementation of the Short 2831 Authentication String, or SAS in ZRTP. The SAS can be verbally 2832 compared by the human users reading the string aloud, or by 2833 validating an OPTIONAL digital signature (described in Section 7.2) 2834 exchanged in the Confirm1 or Confirm2 messages. 2836 The use of hash commitment in the DH exchange (Section 4.4.1.1) 2837 constrains the attacker to only one guess to generate the correct SAS 2838 in his attack, which means the SAS can be quite short. A 16-bit SAS, 2839 for example, provides the attacker only one chance out of 65536 of 2840 not being detected. 2842 The rendering of the SAS value to the user depends on the SAS Type 2843 agreed upon in the Commit message. For the SAS Type of base32, the 2844 leftmost 20 bits of the 32-bit sasvalue are rendered as a form of 2845 base32 encoding known as z-base-32 [z-base-32]. The purpose of 2846 z-base-32 is to represent arbitrary sequences of octets in a form 2847 that is as convenient as possible for human users to manipulate. As 2848 a result, the choice of characters is slightly different from base32 2849 as defined in RFC 3548. The leftmost 20 bits of the sasvalue results 2850 in four base32 characters which are rendered to both ZRTP endpoints. 2851 For the SAS Type of base256, the leftmost 16 bits of the 32-bit 2852 sasvalue are rendered using the PGP Wordlist [pgpwordlist] 2853 [Juola1][Juola2]. Other SAS Types may be defined to render the SAS 2854 value in other ways. 2856 The SAS SHOULD be rendered to the user for authentication. 2858 The SAS is not treated as a secret value, but it must be compared to 2859 see if it matches at both ends of the communications channel. The 2860 two users read it aloud to their partners to see if it matches. This 2861 allows detection of a man-in-the-middle (MiTM) attack. 2863 There is only one SAS value computed per call. That is the SAS value 2864 for the first media stream established, which is calculated in 2865 Section 4.5.2. This SAS applies to all media streams for the same 2866 call. 2868 7.1. SAS Verified Flag 2870 The SAS Verified flag (V) is set based on the user indicating that 2871 SAS comparison has been successfully performed. The SAS Verified 2872 flag is exchanged securely in the Confirm1 and Confirm2 messages 2873 (Figure 10) of the next session. In other words, each party sends 2874 the SAS Verified flag from the previous session in the Confirm 2875 message of the current session. It is perfectly reasonable to have a 2876 ZRTP endpoint that never sets the SAS Verified flag, because it would 2877 require adding complexity to the user interface to allow the user to 2878 set it. The SAS Verified flag is not required to be set, but if it 2879 is available to the client software, it allows for the possibility 2880 that the client software could render to the user that the SAS verify 2881 procedure was carried out in a previous session. 2883 Regardless of whether there is a user interface element to allow the 2884 user to set the SAS Verified flag, it is worth caching a shared 2885 secret, because doing so reduces opportunities for an attacker in the 2886 next call. 2888 If at any time the users carry out the SAS comparison procedure, and 2889 it actually fails to match, then this means there is a very 2890 resourceful man-in-the-middle. If this is the first call, the MiTM 2891 was there on the first call, which is impressive enough. If it 2892 happens in a later call, it also means the MiTM must also know the 2893 cached shared secret, because you could not have carried out any 2894 voice traffic at all unless the session key was correctly computed 2895 and is also known to the attacker. This implies the MiTM must have 2896 been present in all the previous sessions, since the initial 2897 establishment of the first shared secret. This is indeed a 2898 resourceful attacker. It also means that if at any time he ceases 2899 his participation as a MiTM on one of your calls, the protocol will 2900 detect that the cached shared secret is no longer valid -- because it 2901 was really two different shared secrets all along, one of them 2902 between Alice and the attacker, and the other between the attacker 2903 and Bob. The continuity of the cached shared secrets make it possible 2904 for us to detect the MiTM when he inserts himself into the ongoing 2905 relationship, as well as when he leaves. Also, if the attacker tries 2906 to stay with a long lineage of calls, but fails to execute a DH MiTM 2907 attack for even one missed call, he is permanently excluded. He can 2908 no longer resynchronize with the chain of cached shared secrets. 2910 Some sort of user interface element (maybe a checkbox) is needed to 2911 allow the user to tell the software the SAS verify was successful, 2912 causing the software to set the SAS Verified flag (V), which 2913 (together with our cached shared secret) obviates the need to perform 2914 the SAS procedure in the next call. An additional user interface 2915 element can be provided to let the user tell the software he detected 2916 an actual SAS mismatch, which indicates a MiTM attack. The software 2917 can then take appropriate action, clearing the SAS Verified flag, and 2918 erase the cached shared secret from this session. It is up to the 2919 implementer to decide if this added user interface complexity is 2920 warranted. 2922 If the SAS matches, it means there is no MiTM, which also implies it 2923 is now safe to trust a cached shared secret for later calls. If 2924 inattentive users don't bother to check the SAS, it means we don't 2925 know whether there is or is not a MiTM, so even if we do establish a 2926 new cached shared secret, there is a risk that our potential attacker 2927 may have a subsequent opportunity to continue inserting himself in 2928 the call, until we finally get around to checking the SAS. If the 2929 SAS matches, it means no attacker was present for any previous 2930 session since we started propagating cached shared secrets, because 2931 this session and all the previous sessions were also authenticated 2932 with a continuous lineage of shared secrets. 2934 7.2. Signing the SAS 2936 In some applications, it may be hard to arrange for two human users 2937 to verbally compare the SAS. To handle these cases, ZRTP allows for 2938 an OPTIONAL signature feature, which allows the SAS to be checked 2939 without human participation. The SAS MAY be signed and the signature 2940 sent inside the Confirm1, Confirm2 (Figure 10), or SASrelay 2941 (Figure 16) messages. The signature algorithm, length of the 2942 signature and the key used to create the signature are all sent along 2943 with the signature. The key types and signature algorithms are for 2944 future study. The signature is calculated over the entire SAS hash 2945 result (sashash) that was truncated down to derive the sasvalue. The 2946 signatures exchanged in the encrypted Confirm1, Confirm2, or SASrelay 2947 messages MAY be used to authenticate the ZRTP exchange. 2949 7.3. Relaying the SAS through a PBX 2951 ZRTP is designed to use end-to-end encryption. The two parties' 2952 verbal comparison of the short authentication string (SAS) depends on 2953 this assumption. But in some PBX environments, such as Asterisk, 2954 there are usage scenarios that have the PBX acting as a trusted man- 2955 in-the-middle (MiTM), which means there are two back-to-back ZRTP 2956 connections with separate session keys and separate SAS's. 2958 For example, imagine that Bob has a ZRTP-enabled VoIP phone that has 2959 been registered with his company's PBX, so that it is regarded as an 2960 extension of the PBX. Alice, whose phone is not associated with the 2961 PBX, might dial the PBX from the outside, and a ZRTP connection is 2962 negotiated between her phone and the PBX. She then selects Bob's 2963 extension from the company directory in the PBX. The PBX makes a 2964 call to Bob's phone (which might be offsite, many miles away from the 2965 PBX through the Internet) and a separate ZRTP connection is 2966 negotiated between the PBX and Bob's phone. The two ZRTP sessions 2967 have different session keys and different SAS's, which would render 2968 the SAS useless for verbal comparison between Alice and Bob. They 2969 might even mistakenly believe that a wiretapper is present because of 2970 the SAS mismatch, causing undue alarm. 2972 ZRTP has a mechanism for solving this problem by having the PBX relay 2973 the Alice/PBX SAS to Bob, sending it through to Bob in a special 2974 SASrelay packet as defined in Section 5.13, which is sent after the 2975 PBX/Bob ZRTP negotiation is complete, after the Confirm packets. 2976 Only the PBX, acting as a special trusted MiTM (trusted by the 2977 recipient of the SAS relay packet), will relay the SAS. The SASrelay 2978 packet protects the relayed SAS from tampering via an included HMAC, 2979 similar to how the Confirm packet is protected. Bob's ZRTP-enabled 2980 phone accepts the relayed SAS for rendering only because Bob's phone 2981 had previously been configured to trust the PBX. This special 2982 trusted relationship with the PBX can be established through a 2983 special security enrollment procedure. After that enrollment 2984 procedure, the PBX is treated by Bob as a special trusted MiTM. This 2985 results in Alice's SAS being rendered to Bob, so that Alice and Bob 2986 may verbally compare them and thus prevent a MiTM attack by any other 2987 untrusted MiTM. 2989 A real bad-guy MiTM cannot exploit this protocol feature to mount a 2990 MiTM attack and relay Alice's SAS to Bob, because Bob has not 2991 previously carried out a special registration ritual with the bad 2992 guy. The relayed SAS would not be rendered by Bob's phone, because 2993 it did not come from a trusted PBX. The recognition of the special 2994 trust relationship is achieved with the prior establishment of a 2995 special shared secret between Bob and his PBX, which is called 2996 pbxsecret (defined in Section 7.3.1), also known as the trusted MiTM 2997 key. 2999 The trusted MiTM key can be stored in a special cache at the time of 3000 the initial enrollment (which is carried out only once for Bob's 3001 phone), and Bob's phone associates this key with the ZID of the PBX, 3002 while the PBX associates it with the ZID of Bob's phone. After the 3003 enrollment has established and stored this trusted MiTM key, it can 3004 be detected during subsequent ZRTP call negotiations between the PBX 3005 and Bob's phone, because the PBX and the phone MUST pass the hash of 3006 the trusted MiTM key in the DH packet. It is then used as part of 3007 the key agreement to calculate s0. 3009 During a key agreement with two other ZRTP endpoints, the PBX may 3010 have a shared trusted MiTM key with both endpoints, only one 3011 endpoint, or neither endpoint. If the PBX has a shared trusted MiTM 3012 key with neither endpoint, the PBX SHOULD NOT relay the SAS. If the 3013 PBX has a shared trusted MiTM key with only one endpoint, the PBX 3014 SHOULD relay the SAS from one party the other by sending an SASrelay 3015 message to the endpoint that it shares a trusted MiTM key. If the 3016 PBX has a shared trusted MiTM key with both endpoints, the PBX SHOULD 3017 relay the SAS from one party the other by sending an SASrelay message 3018 to only one of the endpoints. 3020 Note: In the case of sharing trusted MiTM key with both endpoints, 3021 it does not matter which endpoint receives the relayed SAS as long 3022 as only one endpoint receives it. 3024 The PBX can determine whether it is trusted by the ZRTP user agent of 3025 the caller or callee. The presence of a shared trusted MiTM key in 3026 the key negotiation sequence indicates that the phone has been 3027 enrolled with this PBX and therefore trusts it to act as a trusted 3028 MiTM. The PBX SHOULD relay the SAS from the other party in this 3029 case. 3031 The relayed SAS fields contain the SAS rendering type and the binary 3032 32-bit sasvalue. The receiver absolutely MUST NOT render the relayed 3033 SAS if it does not come from a specially trusted ZRTP endpoint. The 3034 security of the ZRTP protocol depends on not rendering a relayed SAS 3035 from an untrusted MiTM, because it may be relayed by a MiTM attacker. 3036 See the SASrelay packet definition (Figure 16) for further details. 3038 To ensure that both Alice and Bob will use the same SAS rendering 3039 scheme after the keys are negotiated, the PBX also sends the SASrelay 3040 message to the unenrolled party (which does not regard this PBX as a 3041 trusted MiTM), conveying the SAS rendering scheme, but not the SAS 3042 value, which it sets to zero. The unenrolled party will ignore the 3043 relayed SAS field, but will use the specified SAS rendering scheme. 3045 The next section describes the initial enrollment procedure that 3046 establishes a special shared secret between the PBX and Bob's phone, 3047 a trusted MiTM key, so that the phone will learn to recognize the PBX 3048 as a trusted MiTM. 3050 7.3.1. PBX Enrollment and the PBX Enrollment Flag 3052 Both the PBX and the endpoint need to know when enrollment is taking 3053 place. One way of doing this is to setup an enrollment extension on 3054 the PBX which a newly configured endpoint would call and establish a 3055 ZRTP session. The PBX would then play audio media that offers the 3056 user an opportunity to configure his phone to trust this PBX as a 3057 trusted MiTM. The PBX calculates and stores the trusted MiTM shared 3058 secret in its cache and associates it with this phone, indexed by the 3059 phone's ZID. The trusted MiTM PBX shared secret is derived from 3060 ZRTPSess via the ZRTP key derivation function (Section 4.5.1) in this 3061 manner: 3063 pbxsecret = KDF(ZRTPSess, "Trusted MiTM key", (ZIDi || ZIDr), 3064 negotiated hash length) 3066 The pbxsecret is calculated for the whole ZRTP session, not for each 3067 stream within a session, thus the KDF Context field in this case does 3068 not include any stream-specific nonce material. 3070 The PBX signals the enrollment process by setting the PBX Enrollment 3071 flag (E) in the Confirm message (Figure 10). This flag is used to 3072 trigger the ZRTP endpoint's user interface to prompt the user if they 3073 want to trust this PBX and calculate and store the pbxsecret in the 3074 cache. If the user decides to respond by activating the appropriate 3075 user interface element (a menu item, checkbox, or button), his ZRTP 3076 user agent calculates pbxsecret using the same formula and saves it 3077 in a special cache entry associated with this PBX. 3079 During a PBX enrollment, the GoClear features are disabled. If the 3080 (E) flag is set by the PBX, the PBX MUST NOT set the Allow Clear (A) 3081 flag. Thus, (E) implies not (A). If a received Confirm message has 3082 the (E) flag set, the (A) flag MUST be disregarded and treated as 3083 false. 3085 If the user elects not to enroll, perhaps because he dialed a wrong 3086 number or does not yet feel comfortable with this PBX, he can simply 3087 hang up and not save the pbxsecret in his cache. The PBX will have 3088 it saved in the PBX cache, but that will do no harm. The SASrelay 3089 scheme does not depend on the PBX trusting the phone. It only 3090 depends on the phone trusting the PBX. It is the phone (the user) 3091 who is at risk if the PBX abuses its MiTM privileges. 3093 An endpoint MUST NOT store the pbxsecret in the cache without 3094 explicit user authorization. 3096 After this enrollment process, the PBX and the ZRTP-enabled phone 3097 both share a secret that enables the phone to recognize the PBX as a 3098 trusted MiTM in future calls. This means that when a future call 3099 from an outside ZRTP-enabled caller is relayed through the PBX to 3100 this phone, the phone will render a relayed SAS from the PBX. If the 3101 SASrelay packet comes from a MiTM which does not know the pbxsecret, 3102 the phone treats it as a "bad guy" MiTM, and refuses to render the 3103 relayed SAS. Regardless of which party initiates any future phone 3104 calls through the PBX, the enrolled phone or the outside phone, the 3105 PBX will relay the SAS to the enrolled phone. 3107 There are other ways that ZRTP user agents can be configured to trust 3108 a PBX. Perhaps the pbxsecret can be configured into the phone by 3109 some automated provisioning process in large IT environments. This 3110 specification does not require that products be configured solely by 3111 this enrollment process. Any process that results in a pbxsecret to 3112 be computed and shared between the PBX and the phone will suffice. 3113 This is one such method that has been shown to work. 3115 8. Signaling Interactions 3117 This section discusses how ZRTP, SIP, and SDP work together. 3119 Note that ZRTP may be implemented without coupling with the SIP 3120 signaling. For example, ZRTP can be implemented as a "bump in the 3121 wire" or as a "bump in the stack" in which RTP sent by the SIP UA is 3122 converted to ZRTP. In these cases, the SIP UA will have no knowledge 3123 of ZRTP. As a result, the signaling path discovery mechanisms 3124 introduced in this section should not be definitive - they are a 3125 hint. Despite the absence of an indication of ZRTP support in an 3126 offer or answer, a ZRTP endpoint SHOULD still send Hello messages. 3128 ZRTP endpoints which have control over the signaling path include a 3129 ZRTP SDP attributes in their SDP offers and answers. The ZRTP 3130 attribute, a=zrtp-hash is used to indicate support for ZRTP and to 3131 convey a hash of the Hello message. The hash is computed according 3132 to Section 8.1. 3134 Aside from the advantages described in Section 8.1, there are a 3135 number of potential uses for this attribute. It is useful when 3136 signaling elements would like to know when ZRTP may be utilized by 3137 endpoints. It is also useful if endpoints support multiple methods 3138 of SRTP key management. The ZRTP attribute can be used to ensure 3139 that these key management approaches work together instead of against 3140 each other. For example, if only one endpoint supports ZRTP but both 3141 support another method to key SRTP, then the other method will be 3142 used instead. When used in parallel, an SRTP secret carried in an 3143 a=keymgt [RFC4567] or a=crypto [RFC4568] attribute can be used as a 3144 shared secret for the srtps computation defined in Section 8.2. The 3145 ZRTP attribute is also used to signal to an intermediary ZRTP device 3146 not to act as a ZRTP endpoint, as discussed in Section 10. 3148 The a=zrtp-hash attribute can only be included in the SDP at the 3149 media level since Hello messages sent in different media streams will 3150 have unique hashes. 3152 The ABNF for the ZRTP attribute is as follows: 3154 zrtp-attribute = "a=zrtp-hash:" zrtp-version zrtp-hash-value 3156 zrtp-version = token 3158 zrtp-hash-value = 1*(HEXDIG) 3160 Example of the ZRTP attribute in an initial SDP offer or answer used 3161 at the session level: 3163 v=0 3164 o=bob 2890844527 2890844527 IN IP4 client.biloxi.example.com 3165 s= 3166 c=IN IP4 client.biloxi.example.com 3167 t=0 0 3168 m=audio 3456 RTP/AVP 97 33 3169 a=rtpmap:97 iLBC/8000 3170 a=rtpmap:33 no-op/8000 3171 a=zrtp-hash:1.10 fe30efd02423cb054e50efd0248742ac7a52c8f91bc2df881ae642c371ba46df 3173 8.1. Binding the media stream to the signaling layer via the Hello Hash 3175 It is desirable to tie the media stream to the signaling channel to 3176 prevent a third party from inserting false media packets. If the 3177 signaling layer contains information that ties it to the media 3178 stream, false media streams can be rejected. 3180 To accomplish this, a 256-bit hash (using the hash algorithm defined 3181 in Section 5.1.2.1) is computed across the entire Hello message 3182 (including everything shown in Figure 3). The hash does not include 3183 ZRTP packet framing from Figure 2. This hash image is made available 3184 to the signaling layer, where it is transmitted as a hexadecimal 3185 value in the SIP channel using the SDP attribute, a=zrtp-hash defined 3186 in this specification. Each media stream (audio or video) will have 3187 a separate Hello packet, and thus will require a separate a=zrtp-hash 3188 in an SDP attribute. The recipient of the SIP/SDP message can then 3189 use this hash image to detect and reject false Hello packets in the 3190 media channel, as well as identify which media stream is associated 3191 with this SIP call. Each Hello packet hashes uniquely, because it 3192 contains the H3 field derived from a random nonce, defined in 3193 Section 9. 3195 The Hello Hash as an SDP attribute is an OPTIONAL feature, because 3196 some ZRTP endpoints do not have the ability to add SDP attributes to 3197 the signaling. For example, if ZRTP is implemented in a hardware 3198 bump-in-the-wire device, it might only have the ability to modify the 3199 media packets, not the SIP packets, especially if the SIP packets are 3200 integrity protected and thus cannot be modified on the wire. If the 3201 SDP has no hash image of the ZRTP Hello message, the recipient's ZRTP 3202 user agent cannot check it, and thus will not be able to reject Hello 3203 messages based on this hash. 3205 After the Hello Hash is used to properly identify the ZRTP Hello 3206 message as belonging to this particular SIP call, the rest of the 3207 ZRTP message sequence is protected from false packet injection by 3208 other protection mechanisms, such as the hash chaining mechanism 3209 defined in Section 9. 3211 An attacker who controls only the signaling layer, such as an 3212 uncooperative VoIP service provider, may be able to deny service by 3213 corrupting the hash of the Hello message in the SDP attribute, which 3214 would force ZRTP to reject perfectly good Hello messages. If there 3215 is reason to believe this is happening, the ZRTP endpoint MAY allow 3216 Hello messages to be accepted that do not match the hash image in the 3217 SDP attribute. 3219 Even in the absence of SIP integrity protection, the inclusion of the 3220 a=zrtp-hash SDP attribute, when coupled with the hash chaining 3221 mechanism defined in Section 9, meets the R-ASSOC requirement in the 3222 Media Security Requirements 3223 [I-D.ietf-sip-media-security-requirements], which requires: 3225 "...a mechanism for associating key management messages with both 3226 the signaling traffic that initiated the session and with 3227 protected media traffic. Allowing such an association also allows 3228 the SDP offerer to avoid performing CPU-consuming operations 3229 (e.g., Diffie-Hellman or public key operations) with attackers 3230 that have not seen the signaling messages." 3232 The a=zrtp-hash SDP attribute becomes especially useful if the SDP is 3233 integrity-protected end-to-end by SIP Identity (RFC 4474) [RFC4474] 3234 or better still, Dan Wing's SIP Identity using Media Path 3235 [I-D.wing-sip-identity-media]. This leads to an ability to stop MiTM 3236 attacks independent of ZRTP's SAS mechanism, as explained in 3237 Section 8.1.1 below. 3239 8.1.1. Integrity-protected signaling enables integrity-protected DH 3240 exchange 3242 If and only if the signaling path and the SDP is protected by some 3243 form of end-to-end integrity protection, such as one of the 3244 abovementioned mechanisms, so that it can guarantee delivery of the 3245 a=zrtp-hash attribute without any tampering by a third party, and if 3246 there is good reason to trust the signaling layer to protect the 3247 interests of the end user, it is possible to authenticate the key 3248 exchange and prevent a MiTM attack. This can be done without 3249 requiring the users to verbally compare the SAS, by using the hash 3250 chaining mechanism defined in Section 9 to provide a series of HMAC 3251 keys that protect the entire ZRTP key exchange. Thus, an end-to-end 3252 integrity-protected signaling layer automatically enables an 3253 integrity-protected Diffie-Hellman exchange in ZRTP, which in turn 3254 means immunity from a MiTM attack. Here's how it works. 3256 The integrity-protected SIP SDP contains a hash commitment to the 3257 entire Hello message. The Hello message contains H3, which provides 3258 a hash commitment for the rest of the hash chain H0-H2 (Section 9). 3259 The Hello message is protected by a 64-bit HMAC, keyed by H2. The 3260 Commit message is protected by a 64-bit HMAC keyed by H1. The 3261 DHPart1 or DHPart2 messages are protected by a 64-bit HMAC keyed by 3262 H0. The HMAC protecting the Confirm messages are computed by a 3263 different HMAC key derived from the resulting key agreement. Each 3264 message's HMAC is checked when the HMAC key is received in the next 3265 message. If a bad HMAC is discovered, it MUST be treated as a 3266 security exception indicating a MiTM attack, perhaps by logging or 3267 alerting the user, and MUST NOT be treated as a random error. Random 3268 errors are already discovered and quietly rejected by bad CRCs 3269 (Figure 2). 3271 The Hello message must be assembled before any hash algorithms are 3272 negotiated, so an implicit predetermined hash algorthm and HMAC 3273 algorthm (both defined in Section 5.1.2.1) must be used. All of the 3274 aforementioned HMACs keyed by the hashes in the aforementioned hash 3275 chain MUST be computed with the HMAC algorithm defined in 3276 Section 5.1.2.1, with the HMAC truncated to 64 bits. 3278 The Media Security Requirements 3279 [I-D.ietf-sip-media-security-requirements] R-EXISTING requirement can 3280 be fully met by leveraging a certificate-backed PKI in the signaling 3281 layer to integrity-protect the delivery of the a=zrtp-hash SDP 3282 attribute. This would thereby protect ZRTP against a MiTM attack, 3283 without requiring the user to check the SAS, without adding any 3284 explicit signatures or signature keys to the ZRTP key exchange, and 3285 without any extra public key operations or extra packets. 3287 Without an end-to-end integrity protection mechanism in the signaling 3288 layer to guarantee delivery of the a=zrtp-hash SDP attribute without 3289 modification by a third party, these HMACs alone will not prevent a 3290 MiTM attack. In that case, ZRTP's built-in SAS mechanism will still 3291 have to be used to authenticate the key exchange. At the time of 3292 this writing, very few deployed VoIP clients offer a fully 3293 implemented SIP stack that provides end-to-end integrity protection 3294 for the delivery of SDP attributes. Also, end-to-end signaling 3295 integrity becomes more problematic if E.164 numbers [RFC3824] are 3296 used in SIP. Thus, real-world implementations of ZRTP endpoints will 3297 continue to depend on SAS authentication for quite some time. Even 3298 after there is widespread availability of SIP user agents that offer 3299 integrity protected delivery of SDP attributes, many users will still 3300 be faced with the fact that the signaling path may be controlled by 3301 institutions that do not have the best interests of the end user in 3302 mind. In those cases, SAS authentication will remain the gold 3303 standard for the prudent user. 3305 Even without SIP integrity protection, the Media Security 3306 Requirements [I-D.ietf-sip-media-security-requirements] R-ACT-ACT 3307 requirement can be met by ZRTP's SAS mechanism. Although ZRTP may 3308 benefit from an integrity-protected SIP layer, it is fortunate that 3309 ZRTP's self-contained MiTM defenses do not actually require an 3310 integrity-protected SIP layer. ZRTP can bypass the delays and 3311 problems that SIP integrity faces, such as E.164 number usage, and 3312 the complexity of building and maintaining a PKI. 3314 In contrast, DTLS-SRTP [I-D.ietf-avt-dtls-srtp] appears to depend 3315 heavily on end-to-end integrity protection in the SIP layer. 3316 Further, DTLS-SRTP must bear the additional cost of a signature 3317 calculation of its own, in addition to the signature calculation the 3318 SIP layer uses to achieve its integrity protection. ZRTP needs no 3319 signature calculation of its own to leverage the signature 3320 calculation carried out in the SIP layer. 3322 8.2. Deriving the SRTP secret (srtps) from the signaling layer 3324 The shared secret calculations defined in Section 4.3 make use of the 3325 SRTP secret (srtps), if it is provided by the signaling layer. 3327 It is desirable for only one SRTP key negotiation protocol to be 3328 used, and that protocol should be ZRTP. But in the event the 3329 signaling layer negotiates its own SRTP master key and salt, using 3330 the SDES [RFC4568] or [RFC4567], it can be passed from the signaling 3331 to the ZRTP layer and mixed into ZRTP's own shared secret 3332 calculations, without compromising security by creating a dependency 3333 on the signaling for media encryption. 3335 ZRTP computes srtps from the SRTP master key and salt parameters 3336 provided by the signaling layer in this manner: 3338 srtps = hash(SRTP master key || SRTP master salt) 3340 It is expected that the srtps parameter will be rarely computed or 3341 used in typical ZRTP endpoints, because it is likely and desirable 3342 that ZRTP will be the sole means of negotiating SRTP keys, needing no 3343 help from SDES [RFC4568] or [RFC4567]. If srtps is computed, it will 3344 be stored in the auxiliary shared secret auxsecret, defined in 3345 Section 4.3, and used in Section 4.3.1. 3347 8.3. Codec Selection for Secure Media 3349 Codec selection is negotiated in the signaling layer. If the 3350 signaling layer determines that ZRTP is supported by both endpoints, 3351 this should provide guidance in codec selection to avoid variable 3352 bit-rate (VBR) codecs that leak information. 3354 When voice is compressed with a VBR codec, the packet lengths vary 3355 depending on the types of sounds being compressed. This leaks a lot 3356 of information about the content even if the packets are encrypted, 3357 regardless of what encryption protocol is used [Wright1]. It is 3358 RECOMMENDED that VBR codecs be avoided in encrypted calls. It is not 3359 a problem if the codec adapts the bit rate to the available channel 3360 bandwidth. The vulnerable codecs are the ones that change their bit 3361 rate depending on the type of sound being compressed. 3363 It also appears that voice activity detection (VAD) leaks information 3364 about the content of the conversation, but to a lesser extent than 3365 VBR. This effect can be ameliorated by lengthening the VAD hangover 3366 time by about 1 to 2 seconds, if this is feasible in your 3367 application. This is a topic that requires further study. 3369 9. False ZRTP Packet Rejection 3371 An attacker who is not in the media path may attempt to inject false 3372 ZRTP protocol packets, possibly to effect a denial of service attack, 3373 or to inject his own media stream into the call. VoIP by its nature 3374 invites various forms of denial of service attacks and requires 3375 protocol features to reject such attacks. While bogus SRTP packets 3376 may be easily rejected via the SRTP auth tag field, that can only be 3377 applied after a key agreement is completed. During the ZRTP key 3378 negotiation phase, other false packet rejection mechanisms are 3379 needed. One such mechanism is the use of the total_hash in the final 3380 shared secret calculation, but that can only detect false packets 3381 after performing the computationally expensive Diffie-Hellman 3382 calculation. 3384 The VoIP developer community expects to see a lot of denial of 3385 service attacks, especially from attackers who are not in the media 3386 path. Such an attacker might inject false ZRTP packets to force a 3387 ZRTP endpoint to engage in an endless series of pointless and 3388 expensive DH calculations. To detect and reject false packets 3389 cheaply and rapidly as soon as they are received, ZRTP uses a hash 3390 chain, which is a series of successive hash images. Before each 3391 session, the following values are computed: 3393 H0 = 256-bit random nonce (different for each party) 3394 H1 = hash (H0) 3395 H2 = hash (H1) 3396 H3 = hash (H2) 3398 The hash chain MUST use the hash algorithm defined in 3399 Section 5.1.2.1. Each 256-bit hash image is the pre-image of the 3400 next, and the sequence of images is sent in reverse order in the ZRTP 3401 packet sequence. The hash image H3 is sent in the Hello packet, H2 3402 is sent in the Commit packet, H1 is sent in the DHPart1 or DHPart2 3403 packets, and H0 is sent in the Confirm1 or Confirm2 packets. The 3404 initial random H0 nonces that each party generates MUST be 3405 unpredictable to an attacker and unique within a ZRTP call, which 3406 thereby forces the derived hash images H1-H3 to also be unique and 3407 unpredictable. 3409 The recipient checks if the packet has the correct hash pre-image, by 3410 hashing it and comparing the result with the hash image for the 3411 preceding packet. Packets which contain an incorrect hash pre-image 3412 MUST NOT be used by the recipient, but MAY be processed as security 3413 exceptions, perhaps by logging or alerting the user. As long as 3414 these bogus packets are not used, and correct packets are still being 3415 received, the protocol SHOULD be allowed to run to completion, 3416 thereby rendering ineffective this denial of service attack. 3418 Because these hash images alone do not protect the rest of the 3419 contents of the packet they reside in, this scheme assumes the 3420 attacker cannot modify the packet contents from a legitimate party, 3421 which is a reasonable assumption for an attacker who is not in the 3422 media path. This covers an important range of denial-of-service 3423 attacks. For dealing with the remaining set of attacks that involve 3424 packet modification, other mechanisms are used, such as the 3425 total_hash in the final shared secret calculation, and the hash 3426 commitment in the Commit packet. 3428 False Hello packets may be detected and rejected by the mechanism 3429 defined in Section 8.1. This mechanism requires that each Hello 3430 packet be unique, and the inclusion of the H3 hash image meets that 3431 requirement. 3433 If and only if an integrity-protected signaling channel is available, 3434 this hash chaining scheme can be used to key HMACs to authenticate 3435 the entire ZRTP key exchange, and thereby prevent a MiTM attack, 3436 without relying on the users verbally comparing the SAS. See 3437 Section 8.1.1 for details. 3439 Some ZRTP user agents allow the user to manually switch to clear mode 3440 (via the GoClear packet) in the middle of a secure call, and then 3441 later initiate secure mode again. Many consumer client products will 3442 omit this feature, but those that allow it may return to secure mode 3443 again in the same media stream. Although the same chain of hash 3444 images will be re-used and thus rendered ineffective the second time, 3445 no real harm is done because the new SRTP session keys will be 3446 derived in part from a cached shared secret, which was safely 3447 protected from the MiTM in the previous DH exchange earlier in the 3448 same call. 3450 10. Intermediary ZRTP Devices 3452 This section discusses the operation of a ZRTP endpoint which is 3453 actually an intermediary. For example, consider a device which 3454 proxies both signaling and media between endpoints. There are three 3455 possible ways in which such a device could support ZRTP. 3457 An intermediary device can act transparently to the ZRTP protocol. 3458 To do this, a device MUST pass RTP header extensions and payloads (to 3459 allow the ZRTP Flag) and non-RTP protocols multiplexed on the same 3460 port as RTP (to allow ZRTP and STUN). This is the RECOMMENDED 3461 behavior for intermediaries as ZRTP and SRTP are best when done end- 3462 to-end. 3464 An intermediary device could implement the ZRTP protocol and act as a 3465 ZRTP endpoint on behalf of non-ZRTP endpoints behind the intermediary 3466 device. The intermediary could determine on a call-by-call basis 3467 whether the endpoint behind it supports ZRTP based on the presence or 3468 absence of the ZRTP SDP attribute flag (a=zrtp-hash). For non-ZRTP 3469 endpoints, the intermediary device could act as the ZRTP endpoint 3470 using its own ZID and cache. This approach SHOULD only be used when 3471 there is some other security method protecting the confidentiality of 3472 the media between the intermediary and the inside endpoint, such as 3473 IPSec or physical security. 3475 The third mode, which is NOT RECOMMENDED, is for the intermediary 3476 device to attempt to back-to-back the ZRTP protocol. The only 3477 exception to this case is where the intermediary device is a trusted 3478 element providing services to one of the endpoints - e.g. a Private 3479 Branch Exchange or PBX. In this mode, the intermediary would attempt 3480 to act as a ZRTP endpoint towards both endpoints of the media 3481 session. This approach MUST NOT be used except as described in 3482 Section 7.3 as it will always result in a detected man-in-the-middle 3483 attack and will generate alarms on both endpoints and likely result 3484 in the immediate termination of the session. 3486 In cases where centralized media mixing is taking place, the SAS will 3487 not match when compared by the humans. However, this situation is 3488 known in the SIP signaling by the presence of the isfocus feature tag 3489 [RFC4579]. As a result, when the isfocus feature tag is present, the 3490 DH exchange can be authenticated by the mechanism defined in 3491 Section 8.1.1 or by validating signatures (Section 7.2) in the 3492 Confirm or SASrelay messages. For example, consider a audio 3493 conference call with three participants Alice, Bob, and Carol hosted 3494 on a conference bridge in Dallas. There will be three ZRTP encrypted 3495 media streams, one encrypted stream between each participant and 3496 Dallas. Each will have a different SAS. Each participant will be 3497 able to validate their SAS with the conference bridge by using 3498 signatures optionally present in the Confirm messages (described in 3499 Section 7.2). Or, if the signaling path has end-to-end integrity 3500 protection, each DH exchange will have automatic MiTM protection by 3501 using the mechanism in Section 8.1.1. 3503 SIP feature tags can also be used to detect if a session is 3504 established with an automaton such as an IVR, voicemail system, or 3505 speech recognition system. The display of SAS strings to users 3506 should be disabled in these cases. 3508 It is possible that an intermediary device acting as a ZRTP endpoint 3509 might still receive ZRTP Hello and other messages from the inside 3510 endpoint. This could occur if there is another inline ZRTP device 3511 which does not include the ZRTP SDP attribute flag. An intermediary 3512 acting as a ZRTP endpoint receiving ZRTP Hello and other messages 3513 from the inside endpoint MUST NOT pass these ZRTP messages. 3515 11. The ZRTP Disclosure flag 3517 There are no back doors defined in the ZRTP protocol specification. 3518 The designers of ZRTP would like to discourage back doors in ZRTP- 3519 enabled products. However, despite the lack of back doors in the 3520 actual ZRTP protocol, it must be recognized that a ZRTP implementer 3521 might still deliberately create a rogue ZRTP-enabled product that 3522 implements a back door outside the scope of the ZRTP protocol. For 3523 example, they could create a product that discloses the SRTP session 3524 key generated using ZRTP out-of-band to a third party. They may even 3525 have a legitimate business reason to do this for some customers. 3527 For example, some environments have a need to monitor or record 3528 calls, such as stock brokerage houses who want to discourage insider 3529 trading, or special high security environments with special needs to 3530 monitor their own phone calls. We've all experienced automated 3531 messages telling us that "This call may be monitored for quality 3532 assurance". A ZRTP endpoint in such an environment might 3533 unilaterally disclose the session key to someone monitoring the call. 3534 ZRTP-enabled products that perform such out-of-band disclosures of 3535 the session key can undermine public confidence in the ZRTP protocol, 3536 unless we do everything we can in the protocol to alert the other 3537 user that this is happening. 3539 If one of the parties is using a product that is designed to disclose 3540 their session key, ZRTP requires them to confess this fact to the 3541 other party through a protocol message to the other party's ZRTP 3542 client, which can properly alert that user, perhaps by rendering it 3543 in a graphical user interface. The disclosing party does this by 3544 sending a Disclosure flag (D) in Confirm1 and Confirm2 messages as 3545 described in Section 5.7. 3547 Note that the intention here is to have the Disclosure flag identify 3548 products that are designed to disclose their session keys, not to 3549 identify which particular calls are compromised on a call-by-call 3550 basis. This is an important legal distinction, because most 3551 government sanctioned wiretap regulations require a VoIP service 3552 provider to not reveal which particular calls are wiretapped. But 3553 there is nothing illegal about revealing that a product is designed 3554 to be wiretap-friendly. The ZRTP protocol mandates that such a 3555 product "out" itself. 3557 You might be using a ZRTP-enabled product with no back doors, but if 3558 your own graphical user interface tells you the call is (mostly) 3559 secure, except that the other party is using a product that is 3560 designed in such a way that it may have disclosed the session key for 3561 monitoring purposes, you might ask him what brand of secure telephone 3562 he is using, and make a mental note not to purchase that brand 3563 yourself. If we create a protocol environment that requires such 3564 back-doored phones to confess their nature, word will spread quickly, 3565 and the "invisible hand" of the free market will act. The free 3566 market has effectively dealt with this in the past. 3568 Of course, a ZRTP implementer can lie about his product having a back 3569 door, but the ZRTP standard mandates that ZRTP-compliant products 3570 MUST adhere to the requirement that a back door be confessed by 3571 sending the Disclosure flag to the other party. 3573 There will be inevitable comparisons to Steve Bellovin's 2003 April 3574 fool's joke, when he submitted RFC 3514 [RFC3514] which defined the 3575 "Evil bit" in the IPV4 header, for packets with "evil intent". But 3576 we submit that a similar idea can actually have some merit for 3577 securing VoIP. Sure, one can always imagine that some implementer 3578 will not be fazed by the rules and will lie, but they would have lied 3579 anyway even without the Disclosure flag. There are good reasons to 3580 believe that it will improve the overall percentage of 3581 implementations that at least tell us if they put a back door in 3582 their products, and may even get some of them to decide not to put in 3583 a back door at all. From a civic hygiene perspective, we are better 3584 off with having the Disclosure flag in the protocol. 3586 If an endpoint stores or logs SRTP keys or information that can be 3587 used to reconstruct or recover SRTP keys after they are no longer in 3588 use (i.e. the session is active), or otherwise discloses or passes 3589 SRTP keys or information that can be used to reconstruct or recover 3590 SRTP keys to another application or device, the Disclosure flag D 3591 MUST be set in the Confirm1 or Confirm2 message. 3593 11.1. Guidelines on Proper Implementation of the Disclosure Flag 3595 Some implementers have asked for guidance on implementing the 3596 Disclosure Flag. Some people have incorrectly thought that a 3597 connection secured with ZRTP cannot be used in a call center, with 3598 voluntary voice recording, or even with a voicemail system. 3599 Similarly, some potential users of ZRTP have over considered the 3600 protection that ZRTP can give them. These guidelines clarify both 3601 concerns. 3603 The ZRTP Disclosure Flag only governs the ZRTP/SRTP stream itself. 3604 It does not govern the underlying RTP media stream, nor the actual 3605 media itself. Consequently, a PBX that uses ZRTP may provide 3606 conference calls, call monitoring, call recording, voicemail, or 3607 other PBX features and still say that it does not disclose the ZRTP 3608 key material. A video system may provide DVR features and still say 3609 that it does not disclose the ZRTP key material. The ZRTP Disclosure 3610 Flag, when not set, means only that the ZRTP cryptographic key 3611 material stays within the bounds of the ZRTP subsystem. 3613 If an application has a need to disclose the ZRTP cryptographic key 3614 material, the easiest way to comply with the protocol is to set the 3615 flag to the proper value. The next easiest way is to overestimate 3616 disclosure. For example, a call center that commonly records calls 3617 might choose to set the disclosure flag even though all recording is 3618 an analog recording of a call (and thus outside the ZRTP scope) 3619 because it sets an expectation with clients that their calls might be 3620 recorded. 3622 Note also that the ZRTP Disclosure Flag does not require an 3623 implementation to preclude hacking or malware. Malware that leaks 3624 ZRTP cryptographic key material does not create a liability for the 3625 implementor from non-compliance with the ZRTP specification. 3627 A user of ZRTP should note that ZRTP is not a panacea against 3628 unauthorized recording. ZRTP does not and cannot protect against an 3629 untrustworthy partner who holds a microphone up to the speaker. It 3630 does not protect against someone else being in the room. It does not 3631 protect against analog wiretaps in the phone or in the room. It does 3632 not mean your partner has not been hacked with spyware. It does not 3633 mean that the software has no flaws. It means that the ZRTP 3634 subsystem is not knowingly leaking ZRTP cryptographic key material. 3636 12. RTP Header Extension Flag for ZRTP 3638 This specification defines a new RTP header extension used only for 3639 discovery of support for ZRTP. No ZRTP data is transported in the 3640 extension. When used, the X bit is set in the RTP header to indicate 3641 the presence of the RTP header extension. 3643 Section 5.3.1 in RFC 3550 [RFC3550] defines the format of an RTP 3644 Header extension. The Header extension is appended to the RTP 3645 header. The first 16 bits are an identifier for the header 3646 extension, and the following 16 bits are length of the extension 3647 header in 32 bit words. The ZRTP flag RTP header extension has the 3648 value of 0x505A and a length of 0. The format of the header 3649 extension is as shown in the Figure below. 3651 0 1 2 3 3652 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3653 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3654 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0|0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0| 3655 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3657 Figure 18: RTP Extension header format for ZRTP Flag 3659 ZRTP endpoints MAY include the ZRTP Flag in RTP packets sent at the 3660 start of a session. For example, an endpoint may decide to include 3661 the flag in the first 2 seconds of RTP packets sent. The inclusion 3662 of the flag MAY be ended if a ZRTP message (such as Hello) is 3663 received. 3665 13. IANA Considerations 3667 This specification defines a new SDP [RFC4566] attribute in 3668 Section 8. 3670 Contact name: Philip Zimmermann 3672 Attribute name: "zrtp-hash". 3674 Type of attribute: Media level. 3676 Subject to charset: Not. 3678 Purpose of attribute: The 'zrtp-hash' indicates that a UA supports the 3679 ZRTP protocol and provides a hash of the ZRTP Hello 3680 message. The ZRTP protocol version number is also 3681 specified. 3683 Allowed attribute values: Hex. 3685 14. Appendix - Media Security Requirements 3687 This section discuses how ZRTP meets all RTP security requirements 3688 discussed in the Media Security Requirements 3689 [I-D.ietf-sip-media-security-requirements] document without any 3690 dependencies on other protocols or extensions, unlike DTLS-SRTP 3691 [I-D.ietf-avt-dtls-srtp] which requires additional protocols and 3692 mechanisms. 3694 R-FORK-RETARGET is met since ZRTP is a media path key agreement 3695 protocol. 3697 R-DISTINCT is met since ZRTP uses ZIDs and allows multiple 3698 independent ZRTP exchanges to proceed. 3700 R-REUSE is met using the Multistream and Preshared modes. 3702 R-AVOID-CLIPPING is met since ZRTP is a media path key agreement 3703 protocol 3705 R-RTP-VALID is met since the ZRTP packet format does not pass the 3706 RTP validity check 3708 R-ASSOC is met using the a=zrtp-hash SDP attribute in INVITEs and 3709 responses. 3711 R-NEGOTIATE is met using the Commit message. 3713 R-PSTN is met since ZRTP can be implemented in Gateways. 3715 R-PFS is met using ZRTP Diffie-Hellman key agreement methods. 3717 R-COMPUTE is met using the Hello/Commit ZRTP exchange. 3719 R-CERTS is met using the optional signature field in ZRTP Confirm 3720 messages. 3722 R-FIPS is met since ZRTP uses algorithms that allow FIPS 3723 certification. 3725 R-DOS is met since ZRTP does not introduce any new denial of 3726 service attacks. 3728 R-EXISTING is met since ZRTP can support the use of certificates 3729 or keys. 3731 R-AGILITY is met since the set of hash, cipher, authentication tag 3732 length, key agreement method, SAS type, and signature type can all 3733 be extended and negotiated. 3735 R-DOWNGRADE is met since ZRTP has protection against downgrade 3736 attacks. 3738 R-PASS-MEDIA is met since ZRTP prevents a passive adversary with 3739 access to the media path from gaining access to keying material 3740 used to protect SRTP media packets. 3742 R-PASS-SIG is met since ZRTP prevents a passive adversary with 3743 access to the signaling path from gaining access to keying 3744 material used to protect SRTP media packets. 3746 R-SIG-MEDIA is met using the a=zrtp-hash SDP attribute in INVITEs 3747 and responses. 3749 R-ID-BINDING is met using the a=zrtp-hash SDP attribute. 3751 R-ACT-ACT is met using the a=zrtp-hash SDP attribute in INVITEs 3752 and responses. 3754 R-BEST-SECURE is met since ZRTP utilizes the RTP/AVP profile and 3755 hence best effort SRTP in every case. 3757 R-OTHER-SIGNALING is met since ZRTP can utilize modes in which 3758 there is no dependency on the signaling path. 3760 R-RECORDING is met using the ZRTP Disclosure flag. 3762 R-TRANSCODER is met if the transcoder operates as a trusted MitM 3763 (i.e. a PBX). 3765 R-ALLOW-RTP is met due to ZRTP's best effort encryption. 3767 15. Security Considerations 3769 This document is all about securely keying SRTP sessions. As such, 3770 security is discussed in every section. 3772 Most secure phones rely on a Diffie-Hellman exchange to agree on a 3773 common session key. But since DH is susceptible to a man-in-the- 3774 middle (MiTM) attack, it is common practice to provide a way to 3775 authenticate the DH exchange. In some military systems, this is done 3776 by depending on digital signatures backed by a centrally-managed PKI. 3777 A decade of industry experience has shown that deploying centrally 3778 managed PKIs can be a painful and often futile experience. PKIs are 3779 just too messy, and require too much activation energy to get them 3780 started. Setting up a PKI requires somebody to run it, which is not 3781 practical for an equipment provider. A service provider like a 3782 carrier might venture down this path, but even then you have to deal 3783 with cross-carrier authentication, certificate revocation lists, and 3784 other complexities. It is much simpler to avoid PKIs altogether, 3785 especially when developing secure commercial products. It is 3786 therefore more common for commercial secure phones in the PSTN world 3787 to augment the DH exchange with a Short Authentication String (SAS) 3788 combined with a hash commitment at the start of the key exchange, to 3789 shorten the length of SAS material that must be read aloud. No PKI 3790 is required for this approach to authenticating the DH exchange. The 3791 AT&T TSD 3600, Eric Blossom's COMSEC secure phones [comsec], PGPfone 3792 [pgpfone], and CryptoPhone [cryptophone] are all examples of products 3793 that took this simpler lightweight approach. 3795 The main problem with this approach is inattentive users who may not 3796 execute the voice authentication procedure, or unattended secure 3797 phone calls to answering machines that cannot execute it. 3799 Additionally, some people worry about voice spoofing. But it is a 3800 mistake to think this is simply an exercise in voice impersonation 3801 (perhaps this could be called the "Rich Little" attack). Although 3802 there are digital signal processing techniques for changing a 3803 person's voice, that does not mean a man-in-the-middle attacker can 3804 safely break into a phone conversation and inject his own short 3805 authentication string (SAS) at just the right moment. He doesn't 3806 know exactly when or in what manner the users will choose to read 3807 aloud the SAS, or in what context they will bring it up or say it, or 3808 even which of the two speakers will say it, or if indeed they both 3809 will say it. In addition, some methods of rendering the SAS involve 3810 using a list of words such as the PGP word list[Juola2], in a manner 3811 analogous to how pilots use the NATO phonetic alphabet to convey 3812 information. This can make it even more complicated for the 3813 attacker, because these words can be worked into the conversation in 3814 unpredictable ways. Remember that the attacker places a very high 3815 value on not being detected, and if he makes a mistake, he doesn't 3816 get to do it over. Some people have raised the question that even if 3817 the attacker lacks voice impersonation capabilities, it may be unsafe 3818 for people who don't know each other's voices to depend on the SAS 3819 procedure. This is not as much of a problem as it seems, because it 3820 isn't necessary that they recognize each other by their voice, it is 3821 only necessary that they detect that the voice used for the SAS 3822 procedure matches the voice in the rest of the phone conversation. 3824 A popular and field-proven approach is used by SSH (Secure Shell) 3825 [RFC4251], which Peter Gutmann likes to call the "baby duck" security 3826 model. SSH establishes a relationship by exchanging public keys in 3827 the initial session, when we assume no attacker is present, and this 3828 makes it possible to authenticate all subsequent sessions. A 3829 successful MiTM attacker has to have been present in all sessions all 3830 the way back to the first one, which is assumed to be difficult for 3831 the attacker. ZRTP's key continuity features are actually better 3832 than SSH, at least for VoIP, for reasons described in Section 15.1. 3833 All this is accomplished without resorting to a centrally-managed 3834 PKI. 3836 We use an analogous baby duck security model to authenticate the DH 3837 exchange in ZRTP. We don't need to exchange persistent public keys, 3838 we can simply cache a shared secret and re-use it to authenticate a 3839 long series of DH exchanges for secure phone calls over a long period 3840 of time. If we read aloud just one SAS, and then cache a shared 3841 secret for later calls to use for authentication, no new voice 3842 authentication rituals need to be executed. We just have to remember 3843 we did one already. 3845 If one party ever loses this cached shared secret, it is no longer 3846 available for authentication of DH exchanges. This cache mismatch 3847 situation is easy to detect by the party that still has a surviving 3848 shared secret cache entry. If it fails to match, either there is a 3849 MiTM attack or one side has lost their shared secret cache entry. 3850 The user agent that discovers the cache mismatch must alert the user 3851 that a cache mismatch has been detected, and that he must do a verbal 3852 comparison of the SAS to distinguish if the mismatch is because of a 3853 MiTM attack or because of the other party losing her cache. From 3854 that point on, the two parties start over with a new cached shared 3855 secret. Then they can go back to omitting the voice authentication 3856 on later calls. 3858 A particularly compelling reason why this approach is attractive is 3859 that SAS is easiest to implement when a graphical user interface or 3860 some sort of display is available, which raises the question of what 3861 to do when a display is less conveniently available. For example, 3862 some devices that implement ZRTP might have a graphical user 3863 interface that is only visible through a web browser, such as a PBX 3864 or some other nearby device that implements ZRTP as a "bump-in-the- 3865 wire". If we take an approach that greatly reduces the need for a 3866 SAS in each and every call, we can operate in products without a 3867 graphical user interface with greater ease. Then the SAS can be 3868 compared less frequently through a web browser, or it might even be 3869 presented as needed to the local user through a locally generated 3870 voice prompt, which the local user hears and verbally repeats and 3871 compares with the remote party. Using a voice prompt in this way is 3872 purely for the local ZRTP user agent to render the SAS to the local 3873 user, and is not to be confused with the verbal comparison of the SAS 3874 between two human users. 3876 It is a good idea to force your opponent to have to solve multiple 3877 problems in order to mount a successful attack. Some examples of 3878 widely differing problems we might like to present him with are: 3879 Stealing a shared secret from one of the parties, being present on 3880 the very first session and every subsequent session to carry out an 3881 active MiTM attack, and solving the discrete log problem. We want to 3882 force the opponent to solve more than one of these problems to 3883 succeed. 3885 ZRTP can use different kinds of shared secrets. Each type of shared 3886 secret is determined by a different method. All of the shared 3887 secrets are hashed together to form a session key to encrypt the 3888 call. An attacker must defeat all of the methods in order to 3889 determine the session key. 3891 First, there is the shared secret determined entirely by a Diffie- 3892 Hellman key agreement. It changes with every call, based on random 3893 numbers. An attacker may attempt a classic DH MiTM attack on this 3894 secret, but we can protect against this by displaying and reading 3895 aloud an SAS, combined with adding a hash commitment at the beginning 3896 of the DH exchange. 3898 Second, there is an evolving shared secret, or ongoing shared secret 3899 that is automatically changed and refreshed and cached with every new 3900 session. We will call this the cached shared secret, or sometimes 3901 the retained shared secret. Each new image of this ongoing secret is 3902 a non-invertable function of its previous value and the new secret 3903 derived by the new DH agreement. It is possible that no cached 3904 shared secret is available, because there were no previous sessions 3905 to inherit this value from, or because one side loses its cache. 3907 There are other approaches for key agreement for SRTP that compute a 3908 shared secret using information in the signaling. For example, 3909 [RFC4567] describes how to carry a MIKEY (Multimedia Internet KEYing) 3910 [RFC3830] payload in SDP [RFC4566]. Or RFC 4568 (SDES) [RFC4568] 3911 describes directly carrying SRTP keying and configuration information 3912 in SDP. ZRTP does not rely on the signaling to compute a shared 3913 secret, but if a client does produce a shared secret via the 3914 signaling, and makes it available to the ZRTP protocol, ZRTP can make 3915 use of this shared secret to augment the list of shared secrets that 3916 will be hashed together to form a session key. This way, any 3917 security weaknesses that might compromise the shared secret 3918 contributed by the signaling will not harm the final resulting 3919 session key. 3921 The shared secret provided by the signaling (if available), the 3922 shared secret computed by DH, and the cached shared secret are all 3923 hashed together to compute the session key for a call. If the cached 3924 shared secret is not available, it is omitted from the hash 3925 computation. If the signaling provides no shared secret, it is also 3926 omitted from the hash computation. 3928 No DH MiTM attack can succeed if the ongoing shared secret is 3929 available to the two parties, but not to the attacker. This is 3930 because the attacker cannot compute a common session key with either 3931 party without knowing the cached secret component, even if he 3932 correctly executes a classic DH MiTM attack. 3934 15.1. Self-healing Key Continuity Feature 3936 The key continuity features of ZRTP are analogous to those provided 3937 by SSH (Secure Shell) [RFC4251], but they differ in one respect. SSH 3938 caches public signature keys that never change, and uses a permanent 3939 private signature key that must be guarded from disclosure. If 3940 someone steals your SSH private signature key, they can impersonate 3941 you in all future sessions and mount a successful MiTM attack any 3942 time they want. 3944 ZRTP caches symmetric key material used to compute secret session 3945 keys, and these values change with each session. If someone steals 3946 your ZRTP shared secret cache, they only get one chance to mount a 3947 MiTM attack, in the very next session. If they miss that chance, the 3948 retained shared secret is refreshed with a new value, and the window 3949 of vulnerability heals itself, which means they are locked out of any 3950 future opportunities to mount a MiTM attack. This gives ZRTP a 3951 "self-healing" feature if any cached key material is compromised. 3953 A MiTM attacker must always be in the media path. This presents a 3954 significant operational burden for the attacker in many VoIP usage 3955 scenarios, because being in the media path for every call is often 3956 harder than being in the signaling path. This will likely create 3957 coverage gaps in the attacker's opportunities to mount a MiTM attack. 3958 ZRTP's self-healing key continuity features are better than SSH at 3959 exploiting any temporary gaps in MiTM attack coverage. Thus, ZRTP 3960 quickly recovers from any disclosure of cached key material. 3962 The infamous Debian OpenSSL weak key vulnerability [dsa-1571] 3963 (discovered and patched in May 2008) offers a real-world example of 3964 why ZRTP's self-healing scheme is a good way to do key continuity. 3965 The Debian bug resulted in the production of a lot of weak SSH (and 3966 TLS/SSL) keys, which continued to compromise security even after the 3967 bug had been patched. In contrast, ZRTP's key continuity scheme adds 3968 new entropy to the cached key material with every call, so old 3969 deficiencies in entropy are washed away with each new session. 3971 It should be noted that the addition of shared secret entropy from 3972 previous sessions can extend the strength of the new session key to 3973 AES-256 levels, even if the new session uses Diffie-Hellman keys no 3974 larger than DH-3072 or ECDH-256, provided the cached shared secrets 3975 were initially established when the wiretapper was not present. This 3976 is why AES-256 MAY be used with the smaller DH key sizes in 3977 Section 5.1.5, despite the key strength comparisons in Table 2 of 3978 [SP800-57-Part1]. 3980 Caching shared symmetric key material is also less CPU intensive 3981 compared with using digital signatures, which may be important for 3982 low-power mobile platforms. 3984 16. Acknowledgments 3986 The authors would like to thank Bryce Wilcox-O'Hearn and Colin Plumb 3987 for their contributions to the design of this protocol, and to thank 3988 Hal Finney, Viktor Krikun, Werner Dittmann, Jon Peterson, Dan Wing, 3989 Sagar Pai, Lily Chen, Colin Perkins, David McGrew, and Roni Even for 3990 their helpful comments and suggestions. 3992 The use of hash chains to key HMACs in ZRTP is similar to Adrian 3993 Perrig's TESLA protocol [TESLA]. 3995 17. References 3997 17.1. Normative References 3999 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 4000 Requirement Levels", BCP 14, RFC 2119, March 1997. 4002 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 4003 Jacobson, "RTP: A Transport Protocol for Real-Time 4004 Applications", STD 64, RFC 3550, July 2003. 4006 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 4007 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 4008 RFC 3711, March 2004. 4010 [RFC3526] Kivinen, T. and M. Kojo, "More Modular Exponential (MODP) 4011 Diffie-Hellman groups for Internet Key Exchange (IKE)", 4012 RFC 3526, May 2003. 4014 [RFC3309] Stone, J., Stewart, R., and D. Otis, "Stream Control 4015 Transmission Protocol (SCTP) Checksum Change", RFC 3309, 4016 September 2002. 4018 [RFC4231] Nystrom, M., "Identifiers and Test Vectors for HMAC-SHA- 4019 224, HMAC-SHA-256, HMAC-SHA-384, and HMAC-SHA-512", 4020 RFC 4231, December 2005. 4022 [SP800-90] 4023 Barker, E. and J. Kelsey, "Recommendation for Random 4024 Number Generation Using Deterministic Random Bit 4025 Generators", NIST Special Publication 800-90 (Revised) 4026 March 2007. 4028 [SP800-56A] 4029 Barker, E., Johnson, D., and M. Smid, "Recommendation for 4030 Pair-Wise Key Establishment Schemes Using Discrete 4031 Logarithm Cryptography", NIST Special Publication 800- 4032 56A Revision 1, March 2007. 4034 [SP800-108] 4035 Chen, L., "Recommendation for Key Derivation Using 4036 Pseudorandom Functions", NIST Special Publication 800- 4037 108 November 2008. 4039 [FIPS-180-2] 4040 "Secure Hash Signature Standard (SHS)", NIST FIPS PUB 180- 4041 2 August 2002. 4043 [FIPS-198-1] 4044 "The Keyed-Hash Message Authentication Code (HMAC)", NIST 4045 FIPS PUB 198-1 July 2008. 4047 [NSA-Suite-B] 4048 "Fact Sheet NSA Suite B Cryptography", NSA Information 4049 Assurance Directorate Fact Sheet NSA Suite B. 4051 [RFC4753] Fu, D. and J. Solinas, "ECP Groups For IKE and IKEv2", 4052 RFC 4753, January 2007. 4054 [FIPS-186-3] 4055 "Digital Signature Standard (DSS)", NIST FIPS PUB 186- 4056 3 Draft, November 2008. 4058 [SP800-38A] 4059 Dworkin, M., "Recommendation for Block Cipher: Methods and 4060 Techniques", NIST Special Publication 800-38A 2001 4061 Edition. 4063 [z-base-32] 4064 Wilcox, B., "Human-oriented base-32 encoding", 4065 http://zooko.com/repos/z-base-32/base32/DESIGN . 4067 [pgpwordlist] 4068 "PGP Words", http://en.wikipedia.org/wiki/PGP_Words . 4070 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 4071 Description Protocol", RFC 4566, July 2006. 4073 17.2. Informative References 4075 [I-D.ietf-sip-media-security-requirements] 4076 Wing, D., Fries, S., Tschofenig, H., and F. Audet, 4077 "Requirements and Analysis of Media Security Management 4078 Protocols", draft-ietf-sip-media-security-requirements-09 4079 (work in progress), January 2009. 4081 [SP800-57-Part1] 4082 Barker, E., Barker, W., Burr, W., Polk, W., and M. Smid, 4083 "Recommendation for Key Management - Part 1: General 4084 (Revised)", NIST Special Publication 800-57 - Part 4085 1 Revised March 2007. 4087 [Ferguson] 4088 Ferguson, N. and B. Schneier, "Practical Cryptography", 4089 Wiley Publishing 2003. 4091 [RFC4086] Eastlake, D., Schiller, J., and S. Crocker, "Randomness 4092 Requirements for Security", BCP 106, RFC 4086, June 2005. 4094 [Juola1] Juola, P. and P. Zimmermann, "Whole-Word Phonetic 4095 Distances and the PGPfone Alphabet", Proceedings of the 4096 International Conference of Spoken Language Processing 4097 (ICSLP-96) 1996. 4099 [Juola2] Juola, P., "Isolated Word Confusion Metrics and the 4100 PGPfone Alphabet", Proceedings of New Methods in Language 4101 Processing 1996. 4103 [pgpfone] Zimmermann, P., "PGPfone", 4104 http://philzimmermann.com/docs/pgpfone10b7.pdf . 4106 [zfone] Zimmermann, P., "Zfone", 4107 http://www.philzimmermann.com/zfone . 4109 [Byzantine] 4110 "The Two Generals' Problem", 4111 http://en.wikipedia.org/wiki/Two_Generals%27_Problem . 4113 [TESLA] Perrig, A., Canetti, R., Tygar, J., and D. Song, "The 4114 TESLA Broadcast Authentication Protocol", http:// 4115 www.ece.cmu.edu/~adrian/projects/tesla-cryptobytes/ 4116 tesla-cryptobytes.pdf . 4118 [SHA-3] "Cryptographic Hash Algorithm Competition", NIST Computer 4119 Security Resource Center Cryptographic Hash Project. 4121 [comsec] Blossom, E., "The VP1 Protocol for Voice Privacy Devices 4122 Version 1.2", http://www.comsec.com/vp1-protocol.pdf . 4124 [cryptophone] 4125 "CryptoPhone", http://www.cryptophone.de/ . 4127 [Wright1] Wright, C., Ballard, L., Coull, S., Monrose, F., and G. 4128 Masson, "Spot me if you can: Uncovering spoken phrases in 4129 encrypted VoIP conversations", Proceedings of the 2008 4130 IEEE Symposium on Security and Privacy 2008. 4132 [dsa-1571] 4133 "Debian Security Advisory - OpenSSL predictable random 4134 number generator", 4135 http://www.debian.org/security/2008/dsa-1571 . 4137 [I-D.ietf-avt-srtp-big-aes] 4138 McGrew, D., "The use of AES-192 and AES-256 in Secure 4139 RTP", http://www1.tools.ietf.org/html/ 4140 draft-ietf-avt-srtp-big-aes . 4142 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 4143 A., Peterson, J., Sparks, R., Handley, M., and E. 4144 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 4145 June 2002. 4147 [RFC4251] Ylonen, T. and C. Lonvick, "The Secure Shell (SSH) 4148 Protocol Architecture", RFC 4251, January 2006. 4150 [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session 4151 Description Protocol (SDP) Security Descriptions for Media 4152 Streams", RFC 4568, July 2006. 4154 [RFC4567] Arkko, J., Lindholm, F., Naslund, M., Norrman, K., and E. 4155 Carrara, "Key Management Extensions for Session 4156 Description Protocol (SDP) and Real Time Streaming 4157 Protocol (RTSP)", RFC 4567, July 2006. 4159 [RFC3830] Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K. 4160 Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830, 4161 August 2004. 4163 [RFC3514] Bellovin, S., "The Security Flag in the IPv4 Header", 4164 RFC 3514, April 1 2003. 4166 [RFC4474] Peterson, J. and C. Jennings, "Enhancements for 4167 Authenticated Identity Management in the Session 4168 Initiation Protocol (SIP)", RFC 4474, August 2006. 4170 [I-D.ietf-mmusic-ice] 4171 Rosenberg, J., "Interactive Connectivity Establishment 4172 (ICE): A Protocol for Network Address Translator (NAT) 4173 Traversal for Offer/Answer Protocols", 4174 draft-ietf-mmusic-ice-19 (work in progress), October 2007. 4176 [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol 4177 (SIP) Call Control - Conferencing for User Agents", 4178 BCP 119, RFC 4579, August 2006. 4180 [I-D.wing-sip-identity-media] 4181 Wing, D. and H. Kaplan, "SIP Identity using Media Path", 4182 draft-wing-sip-identity-media-02 (work in progress), 4183 February 2008. 4185 [RFC3824] Peterson, J., Liu, H., Yu, J., and B. Campbell, "Using 4186 E.164 numbers with the Session Initiation Protocol (SIP)", 4187 RFC 3824, June 2004. 4189 [I-D.ietf-avt-dtls-srtp] 4190 McGrew, D. and E. Rescorla, "Datagram Transport Layer 4191 Security (DTLS) Extension to Establish Keys for Secure 4192 Real-time Transport Protocol (SRTP)", 4193 draft-ietf-avt-dtls-srtp-06 (work in progress), 4194 October 2008. 4196 Authors' Addresses 4198 Philip Zimmermann 4199 Zfone Project 4201 Email: prz@mit.edu 4203 Alan Johnston (editor) 4204 Avaya 4205 St. Louis, MO 63124 4207 Email: alan@sipstation.com 4209 Jon Callas 4210 PGP Corporation 4212 Email: jon@pgp.com