idnits 2.17.1 draft-zimmermann-rfc6189bis-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 8, 2016) is 2848 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '4' on line 2512 ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446) == Outdated reference: A later version (-11) exists of draft-ietf-avtcore-rfc5764-mux-fixes-10 -- Obsolete informational reference (is this intentional?): RFC 1981 (Obsoleted by RFC 8201) -- Obsolete informational reference (is this intentional?): RFC 4474 (Obsoleted by RFC 8224) -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) -- Obsolete informational reference (is this intentional?): RFC 5245 (Obsoleted by RFC 8445, RFC 8839) Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Zimmermann 3 Internet-Draft Silent Circle 4 Intended status: Informational A. Johnston, Ed. 5 Expires: January 9, 2017 Unaffiliated 6 T. Cross 7 OfficeTone 8 July 8, 2016 10 ZRTP: Media Path Key Agreement for Unicast Secure RTP 11 draft-zimmermann-rfc6189bis-00 13 Abstract 15 This document defines ZRTP, a protocol for media path Diffie-Hellman 16 exchange to agree on a session key and parameters for establishing 17 unicast Secure Real-time Transport Protocol (SRTP) sessions for Voice 18 over IP (VoIP) applications. The ZRTP protocol is media path keying 19 because it is multiplexed on the same port as RTP and does not 20 require support in the signaling protocol. ZRTP does not assume a 21 Public Key Infrastructure (PKI) or require the complexity of 22 certificates in end devices. For the media session, ZRTP provides 23 confidentiality, protection against man-in-the-middle (MiTM) attacks, 24 and, in cases where the signaling protocol provides end-to-end 25 integrity protection, authentication. ZRTP can utilize a Session 26 Description Protocol (SDP) attribute to provide discovery and 27 authentication through the signaling channel. To provide best effort 28 SRTP, ZRTP utilizes normal RTP/AVP (Audio-Visual Profile) profiles. 29 ZRTP secures media sessions that include a voice media stream and can 30 also secure media sessions that do not include voice by using an 31 optional digital signature. 33 Status of This Memo 35 This Internet-Draft is submitted in full conformance with the 36 provisions of BCP 78 and BCP 79. 38 Internet-Drafts are working documents of the Internet Engineering 39 Task Force (IETF). Note that other groups may also distribute 40 working documents as Internet-Drafts. The list of current Internet- 41 Drafts is at http://datatracker.ietf.org/drafts/current/. 43 Internet-Drafts are draft documents valid for a maximum of six months 44 and may be updated, replaced, or obsoleted by other documents at any 45 time. It is inappropriate to use Internet-Drafts as reference 46 material or to cite them other than as "work in progress." 48 This Internet-Draft will expire on January 9, 2017. 50 Copyright Notice 52 Copyright (c) 2016 IETF Trust and the persons identified as the 53 document authors. All rights reserved. 55 This document is subject to BCP 78 and the IETF Trust's Legal 56 Provisions Relating to IETF Documents 57 (http://trustee.ietf.org/license-info) in effect on the date of 58 publication of this document. Please review these documents 59 carefully, as they describe your rights and restrictions with respect 60 to this document. Code Components extracted from this document must 61 include Simplified BSD License text as described in Section 4.e of 62 the Trust Legal Provisions and are provided without warranty as 63 described in the Simplified BSD License. 65 Table of Contents 67 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 68 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 69 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 6 70 3.1. Key Agreement Modes . . . . . . . . . . . . . . . . . . . 7 71 3.1.1. Diffie-Hellman Mode Overview . . . . . . . . . . . . 7 72 3.1.2. Preshared Mode Overview . . . . . . . . . . . . . . . 9 73 3.1.3. Multistream Mode Overview . . . . . . . . . . . . . . 9 74 4. Protocol Description . . . . . . . . . . . . . . . . . . . . 10 75 4.1. Discovery . . . . . . . . . . . . . . . . . . . . . . . . 10 76 4.1.1. Protocol Version Negotiation . . . . . . . . . . . . 11 77 4.1.2. Algorithm Negotiation . . . . . . . . . . . . . . . . 13 78 4.2. Commit Contention . . . . . . . . . . . . . . . . . . . . 14 79 4.3. Matching Shared Secret Determination . . . . . . . . . . 15 80 4.3.1. Calculation and Comparison of Hashes of Shared 81 Secrets . . . . . . . . . . . . . . . . . . . . . . . 17 82 4.3.2. Handling a Shared Secret Cache Mismatch . . . . . . . 18 83 4.4. DH and Non-DH Key Agreements . . . . . . . . . . . . . . 19 84 4.4.1. Diffie-Hellman Mode . . . . . . . . . . . . . . . . . 19 85 4.4.1.1. Hash Commitment in Diffie-Hellman Mode . . . . . 20 86 4.4.1.2. Responder Behavior in Diffie-Hellman Mode . . . . 21 87 4.4.1.3. Initiator Behavior in Diffie-Hellman Mode . . . . 22 88 4.4.1.4. Shared Secret Calculation for DH Mode . . . . . . 22 89 4.4.2. Preshared Mode . . . . . . . . . . . . . . . . . . . 25 90 4.4.2.1. Commitment in Preshared Mode . . . . . . . . . . 25 91 4.4.2.2. Initiator Behavior in Preshared Mode . . . . . . 26 92 4.4.2.3. Responder Behavior in Preshared Mode . . . . . . 26 93 4.4.2.4. Shared Secret Calculation for Preshared Mode . . 27 94 4.4.3. Multistream Mode . . . . . . . . . . . . . . . . . . 28 95 4.4.3.1. Commitment in Multistream Mode . . . . . . . . . 29 96 4.4.3.2. Shared Secret Calculation for Multistream Mode . 29 97 4.5. Key Derivations . . . . . . . . . . . . . . . . . . . . . 31 98 4.5.1. The ZRTP Key Derivation Function . . . . . . . . . . 31 99 4.5.2. Deriving ZRTPSess Key and SAS in DH or Preshared 100 Modes . . . . . . . . . . . . . . . . . . . . . . . . 32 101 4.5.3. Deriving the Rest of the Keys from s0 . . . . . . . . 33 102 4.6. Confirmation . . . . . . . . . . . . . . . . . . . . . . 35 103 4.6.1. Updating the Cache of Shared Secrets . . . . . . . . 35 104 4.6.1.1. Cache Update Following a Cache Mismatch . . . . . 37 105 4.6.1.2. Cache Update for a PBX Following a Cache Mismatch 38 106 4.7. Termination . . . . . . . . . . . . . . . . . . . . . . . 38 107 4.7.1. Termination via Error Message . . . . . . . . . . . . 39 108 4.7.2. Termination via GoClear Message . . . . . . . . . . . 39 109 4.7.2.1. Key Destruction for GoClear Message . . . . . . . 40 110 4.7.3. Key Destruction at Termination . . . . . . . . . . . 41 111 4.8. Random Number Generation . . . . . . . . . . . . . . . . 41 112 4.9. ZID and Cache Operation . . . . . . . . . . . . . . . . . 42 113 4.9.1. Cacheless Implementations . . . . . . . . . . . . . . 43 114 5. ZRTP Messages . . . . . . . . . . . . . . . . . . . . . . . . 44 115 5.1. ZRTP Message Formats . . . . . . . . . . . . . . . . . . 46 116 5.1.1. Message Type Block . . . . . . . . . . . . . . . . . 46 117 5.1.2. Hash Type Block . . . . . . . . . . . . . . . . . . . 47 118 5.1.2.1. Negotiated Hash and MAC Algorithm . . . . . . . . 48 119 5.1.2.2. Implicit Hash and MAC Algorithm . . . . . . . . . 49 120 5.1.3. Cipher Type Block . . . . . . . . . . . . . . . . . . 49 121 5.1.4. Auth Tag Type Block . . . . . . . . . . . . . . . . . 50 122 5.1.5. Key Agreement Type Block . . . . . . . . . . . . . . 51 123 5.1.6. SAS Type Block . . . . . . . . . . . . . . . . . . . 53 124 5.1.7. Signature Type Block . . . . . . . . . . . . . . . . 54 125 5.2. Hello Message . . . . . . . . . . . . . . . . . . . . . . 55 126 5.3. HelloACK Message . . . . . . . . . . . . . . . . . . . . 57 127 5.4. Commit Message . . . . . . . . . . . . . . . . . . . . . 58 128 5.5. DHPart1 Message . . . . . . . . . . . . . . . . . . . . . 61 129 5.6. DHPart2 Message . . . . . . . . . . . . . . . . . . . . . 63 130 5.7. Confirm1 and Confirm2 Messages . . . . . . . . . . . . . 65 131 5.8. Conf2ACK Message . . . . . . . . . . . . . . . . . . . . 67 132 5.9. Error Message . . . . . . . . . . . . . . . . . . . . . . 68 133 5.10. ErrorACK Message . . . . . . . . . . . . . . . . . . . . 70 134 5.11. GoClear Message . . . . . . . . . . . . . . . . . . . . . 70 135 5.12. ClearACK Message . . . . . . . . . . . . . . . . . . . . 70 136 5.13. SASrelay Message . . . . . . . . . . . . . . . . . . . . 71 137 5.14. RelayACK Message . . . . . . . . . . . . . . . . . . . . 73 138 5.15. Ping Message . . . . . . . . . . . . . . . . . . . . . . 74 139 5.15.1. Rationale for Ping messages . . . . . . . . . . . . 75 140 5.16. PingACK Message . . . . . . . . . . . . . . . . . . . . . 75 141 6. Retransmissions . . . . . . . . . . . . . . . . . . . . . . . 77 142 7. Short Authentication String . . . . . . . . . . . . . . . . . 80 143 7.1. SAS Verified Flag . . . . . . . . . . . . . . . . . . . . 80 144 7.2. Signing the SAS . . . . . . . . . . . . . . . . . . . . . 82 145 7.2.1. OpenPGP Signatures . . . . . . . . . . . . . . . . . 84 146 7.2.2. ECDSA Signatures with X.509v3 Certs . . . . . . . . . 85 147 7.2.3. Signing the SAS without a PKI . . . . . . . . . . . . 86 148 7.3. Relaying the SAS through a PBX . . . . . . . . . . . . . 87 149 7.3.1. PBX Enrollment and the PBX Enrollment Flag . . . . . 90 150 7.4. Automated Methods of Authenticating the DH Exchange . . . 92 151 8. Signaling Interactions . . . . . . . . . . . . . . . . . . . 93 152 8.1. Binding the Media Stream to the Signaling Layer via the 153 Hello Hash . . . . . . . . . . . . . . . . . . . . . . . 95 154 8.1.1. Integrity-Protected Signaling Enables Integrity- 155 Protected DH Exchange . . . . . . . . . . . . . . . . 96 156 8.2. Combining ZRTP With SDP Security Descriptions (SDES) . . 98 157 8.2.1. Deriving auxsecret from SDP Security Descriptions Key 158 Material . . . . . . . . . . . . . . . . . . . . . . 99 159 8.3. Codec Selection for Secure Media . . . . . . . . . . . . 101 160 9. False ZRTP Packet Rejection . . . . . . . . . . . . . . . . . 101 161 10. Intermediary ZRTP Devices . . . . . . . . . . . . . . . . . . 103 162 10.1. On Reducing PBX MiTM Behavior . . . . . . . . . . . . . 105 163 11. The ZRTP Disclosure Flag . . . . . . . . . . . . . . . . . . 107 164 11.1. Guidelines on Proper Implementation of the Disclosure 165 Flag . . . . . . . . . . . . . . . . . . . . . . . . . . 108 166 12. Mapping between ZID and AOR (SIP URI) . . . . . . . . . . . . 109 167 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 111 168 14. Media Security Requirements . . . . . . . . . . . . . . . . . 111 169 15. Changes From RFC 6189 . . . . . . . . . . . . . . . . . . . . 113 170 16. Security Considerations . . . . . . . . . . . . . . . . . . . 114 171 16.1. Self-Healing Key Continuity Feature . . . . . . . . . . 117 172 17. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 118 173 18. References . . . . . . . . . . . . . . . . . . . . . . . . . 118 174 18.1. Normative References . . . . . . . . . . . . . . . . . . 118 175 18.2. Informative References . . . . . . . . . . . . . . . . . 122 176 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 125 178 1. Introduction 180 ZRTP is a key agreement protocol that performs a Diffie-Hellman key 181 exchange during call setup in the media path and is transported over 182 the same port as the Real-time Transport Protocol (RTP) [RFC3550] 183 media stream which has been established using a signaling protocol 184 such as Session Initiation Protocol (SIP) [RFC3261]. This generates 185 a shared secret, which is then used to generate keys and salt for a 186 Secure RTP (SRTP) [RFC3711] session. ZRTP borrows ideas from 187 [PGPfone]. A reference implementation of ZRTP is available in 188 [Zfone]. This document updates and revises RFC 6189 [RFC6189]. 190 The ZRTP protocol has some nice cryptographic features lacking in 191 many other approaches to media session encryption. Although it uses 192 a public key algorithm, it does not rely on a public key 193 infrastructure (PKI). In fact, it does not use persistent public 194 keys at all. It uses ephemeral Diffie-Hellman (DH) with hash 195 commitment and allows the detection of man-in-the-middle (MiTM) 196 attacks by displaying a short authentication string (SAS) for the 197 users to read and verbally compare over the phone. It has Perfect 198 Forward Secrecy, meaning the keys are destroyed at the end of the 199 call, which precludes retroactively compromising the call by future 200 disclosures of key material. But even if the users are too lazy to 201 bother with short authentication strings, we still get reasonable 202 authentication against a MiTM attack, based on a form of key 203 continuity. It does this by caching some key material to use in the 204 next call, to be mixed in with the next call's DH shared secret, 205 giving it key continuity properties analogous to Secure SHell (SSH). 206 All this is done without reliance on a PKI, key certification, trust 207 models, certificate authorities, or key management complexity that 208 bedevils the email encryption world. It also does not rely on SIP 209 signaling for the key management, and in fact, it does not rely on 210 any servers at all. It performs its key agreements and key 211 management in a purely peer-to-peer manner over the RTP packet 212 stream. 214 ZRTP can be used and discovered without being declared or indicated 215 in the signaling path. This provides a best effort SRTP capability. 216 Also, this reduces the complexity of implementations and minimizes 217 interdependency between the signaling and media layers. However, 218 when ZRTP is indicated in the signaling via the zrtp-hash SDP 219 attribute, ZRTP has additional useful properties. By sending a hash 220 of the ZRTP Hello message in the signaling, ZRTP provides a useful 221 binding between the signaling and media paths, which is explained in 222 Section 8.1. When this is done through a signaling path that has 223 end-to-end integrity protection, the DH exchange is automatically 224 protected from a MiTM attack, which is explained in Section 8.1.1. 226 ZRTP is designed for unicast media sessions in which there is a voice 227 media stream. For multiparty secure conferencing, separate ZRTP 228 sessions may be negotiated between each party and the conference 229 bridge. For sessions lacking a voice media stream, MiTM protection 230 may be provided by the mechanisms in Sections 8.1.1 or 7.2. In terms 231 of the RTP topologies defined in [RFC5117], ZRTP is designed for 232 Point-to-Point topologies only. 234 2. Terminology 236 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 237 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 238 "OPTIONAL" in this document are to be interpreted as described in 239 [RFC2119]. 241 In this document, a "call" is synonymous with a "session". 243 3. Overview 245 This section provides a description of how ZRTP works. This 246 description is non-normative in nature but is included to build 247 understanding of the protocol. 249 ZRTP is negotiated the same way a conventional RTP session is 250 negotiated in an offer/answer exchange using the standard RTP/AVP 251 profile. The ZRTP protocol begins after two endpoints have utilized 252 a signaling protocol, such as SIP, and are ready to exchange media. 253 If Interactive Connectivity Establishment (ICE) [RFC5245] is being 254 used, ZRTP begins after ICE has completed its connectivity checks. 256 ZRTP is multiplexed on the same ports as RTP. It uses a unique 257 header that makes it clearly differentiable from RTP or Session 258 Traversal Utilities for NAT (STUN). 260 ZRTP support can be discovered in the signaling path by the presence 261 of a ZRTP SDP attribute. However, even in cases where this is not 262 received in the signaling, an endpoint can still send ZRTP Hello 263 messages to see if a response is received. If a response is not 264 received, no more ZRTP messages will be sent during this session. 265 This is safe because ZRTP has been designed to be clearly different 266 from RTP and have a similar structure to STUN packets received 267 (sometimes by non-supporting endpoints) during an ICE exchange. 269 Both ZRTP endpoints begin the ZRTP exchange by sending a ZRTP Hello 270 message to the other endpoint. The purpose of the Hello message is 271 to confirm that the endpoint supports the protocol and to see what 272 algorithms the two ZRTP endpoints have in common. 274 The Hello message contains the SRTP configuration options and the 275 ZID. Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID 276 that is generated once at installation time. ZIDs are discovered 277 during the Hello message exchange. The received ZID is used to look 278 up retained shared secrets from previous ZRTP sessions with the 279 endpoint. 281 A response to a ZRTP Hello message is a ZRTP HelloACK message. The 282 HelloACK message simply acknowledges receipt of the Hello. Since RTP 283 commonly uses best effort UDP transport, ZRTP has retransmission 284 timers in case of lost datagrams. There are two timers, both with 285 exponential backoff mechanisms. One timer is used for 286 retransmissions of Hello messages and the other is used for 287 retransmissions of all other messages after receipt of a HelloACK. 289 If an integrity-protected signaling channel is available, a hash of 290 the Hello message can be sent. This allows rejection of false ZRTP 291 Hello messages injected by an attacker. 293 Hello and other ZRTP messages also contain a hash image that is used 294 to link the messages together. This allows rejection of false ZRTP 295 messages injected during an exchange. 297 3.1. Key Agreement Modes 299 After both endpoints exchange Hello and HelloACK messages, the key 300 agreement exchange can begin with the ZRTP Commit message. ZRTP 301 supports a number of key agreement modes including both Diffie- 302 Hellman and non-Diffie-Hellman modes as described in the following 303 sections. 305 The Commit message may be sent immediately after both endpoints have 306 completed the Hello/HelloACK discovery handshake, or it may be 307 deferred until later in the call, after the participants engage in 308 some unencrypted conversation. The Commit message may be manually 309 activated by a user interface element, such as a GO SECURE button, 310 which becomes enabled after the Hello/HelloACK discovery phase. This 311 emulates the user experience of a number of secure phones in the 312 Public Switched Telephone Network (PSTN) world [comsec]. However, it 313 is expected that most simple ZRTP user agents will omit such buttons 314 and proceed directly to secure mode by sending a Commit message 315 immediately after the Hello/HelloACK handshake. 317 3.1.1. Diffie-Hellman Mode Overview 319 An example ZRTP call flow is shown in Figure 1. Note that the order 320 of the Hello/HelloACK exchanges in F1/F2 and F3/F4 may be reversed. 321 That is, either Alice or Bob might send the first Hello message. 322 Note that the endpoint that sends the Commit message is considered 323 the initiator of the ZRTP session and drives the key agreement 324 exchange. The Diffie-Hellman public values are exchanged in the 325 DHPart1 and DHPart2 messages. SRTP keys and salts are then 326 calculated. 328 The initiator needs to generate its ephemeral key pair before sending 329 the Commit, and the responder generates its key pair before sending 330 DHPart1. 332 Alice Bob 333 | | 334 | Alice and Bob establish a media session. | 335 | They initiate ZRTP on media ports | 336 | | 337 | F1 Hello (version, options, Alice's ZID) | 338 |-------------------------------------------------->| 339 | HelloACK F2 | 340 |<--------------------------------------------------| 341 | Hello (version, options, Bob's ZID) F3 | 342 |<--------------------------------------------------| 343 | F4 HelloACK | 344 |-------------------------------------------------->| 345 | | 346 | Bob acts as the initiator. | 347 | | 348 | Commit (Bob's ZID, options, hash value) F5 | 349 |<--------------------------------------------------| 350 | F6 DHPart1 (pvr, shared secret hashes) | 351 |-------------------------------------------------->| 352 | DHPart2 (pvi, shared secret hashes) F7 | 353 |<--------------------------------------------------| 354 | | 355 | Alice and Bob generate SRTP session key. | 356 | | 357 | F8 Confirm1 (MAC, D,A,V,E flags, sig) | 358 |-------------------------------------------------->| 359 | Confirm2 (MAC, D,A,V,E flags, sig) F9 | 360 |<--------------------------------------------------| 361 | F10 Conf2ACK | 362 |-------------------------------------------------->| 363 | SRTP begins | 364 |<=================================================>| 365 | | 367 Figure 1: Establishment of an SRTP Session Using ZRTP 369 ZRTP authentication uses a Short Authentication String (SAS), which 370 is ideally displayed for the human user. Alternatively, the SAS can 371 be authenticated by exchanging an optional digital signature (sig) 372 over the SAS in the Confirm1 or Confirm2 messages (described in 373 Section 7.2). 375 The ZRTP Confirm1 and Confirm2 messages are sent for a number of 376 reasons, not the least of which is that they confirm that all the key 377 agreement calculations were successful and thus the encryption will 378 work. They also carry other information such as the Disclosure flag 379 (D), the Allow Clear flag (A), the SAS Verified flag (V), and the 380 Private Branch Exchange (PBX) Enrollment flag (E). All flags are 381 encrypted to shield them from a passive observer. 383 3.1.2. Preshared Mode Overview 385 In the Preshared mode, endpoints can skip the DH calculation if they 386 have a shared secret from a previous ZRTP session. Preshared mode is 387 indicated in the Commit message and results in the same call flow as 388 Multistream mode. The principal difference between Multistream mode 389 and Preshared mode is that Preshared mode uses a previously cached 390 shared secret, rs1, instead of an active ZRTP Session key as the 391 initial keying material. 393 This mode could be useful for slow processor endpoints so that a DH 394 calculation does not need to be performed every session. Or, this 395 mode could be used to rapidly re-establish an earlier session that 396 was recently torn down or interrupted without the need to perform 397 another DH calculation. 399 Preshared mode has forward secrecy properties. If a phone's cache is 400 captured by an opponent, the cached shared secrets cannot be used to 401 recover earlier encrypted calls, because the shared secrets are 402 replaced with new ones in each new call, as in DH mode. However, the 403 captured secrets can be used by a passive wiretapper in the media 404 path to decrypt the next call, if the next call is in Preshared mode. 405 This differs from DH mode, which requires an active MiTM wiretapper 406 to exploit captured secrets in the next call. However, if the next 407 call is missed by the wiretapper, he cannot wiretap any further 408 calls. Thus, it preserves most of the self-healing properties 409 (Section 16.1) of key continuity enjoyed by DH mode. 411 3.1.3. Multistream Mode Overview 413 Multistream mode is an alternative key agreement method used when two 414 endpoints have an established SRTP media stream between them with an 415 active ZRTP Session key. ZRTP can derive multiple SRTP keys from a 416 single DH exchange. For example, an established secure voice call 417 that adds a video stream uses Multistream mode to quickly initiate 418 the video stream without a second DH exchange. 420 When Multistream mode is indicated in the Commit message, a call flow 421 similar to Figure 1 is used, but no DH calculation is performed by 422 either endpoint and the DHPart1 and DHPart2 messages are omitted. 423 The Confirm1, Confirm2, and Conf2ACK messages are still sent. Since 424 the cache is not affected during this mode, multiple Multistream ZRTP 425 exchanges can be performed in parallel between two endpoints. 427 When adding additional media streams to an existing call, only 428 Multistream mode is used. Only one DH operation is performed, just 429 for the first media stream. Consequently, all the media streams in 430 the session share the same SAS (Section 7). 432 4. Protocol Description 434 This section begins the normative description of the protocol. 436 ZRTP MUST be multiplexed on the same ports as the RTP media packets. 438 To support best effort encryption from the Media Security 439 Requirements [RFC5479], ZRTP uses normal RTP/AVP profile (AVP) media 440 lines in the initial offer/answer exchange. The ZRTP SDP attribute 441 a=zrtp-hash defined in Section 8 SHOULD be used in all offers and 442 answers to indicate support for the ZRTP protocol. 444 ZRTP can be utilized by endpoints that do not have a common 445 signaling protocol but both support SRTP and are relying on a 446 gateway for conversion. As such, it is not always possible for 447 the signaling protocol to relay the zrtp-hash as can be done using 448 SIP. 450 The Secure RTP/AVP (SAVP) profile MAY be used in subsequent offer/ 451 answer exchanges after a successful ZRTP exchange has resulted in an 452 SRTP session, or if it is known that the other endpoint supports this 453 profile. Other profiles MAY also be used. 455 The use of the RTP/SAVP profile has caused failures in negotiating 456 best effort SRTP due to the limitations on negotiating profiles 457 using SDP. This is why ZRTP supports the RTP/AVP profile and 458 includes its own discovery mechanisms. 460 In all key agreement modes, the initiator SHOULD NOT send RTP media 461 after sending the Commit message, and it MUST NOT send SRTP media 462 before receiving either the Conf2ACK or the first SRTP media (with a 463 valid SRTP auth tag) from the responder. The responder SHOULD NOT 464 send RTP media after receiving the Commit message, and MUST NOT send 465 SRTP media before receiving the Confirm2 message. 467 4.1. Discovery 469 During the ZRTP discovery phase, a ZRTP endpoint discovers if the 470 other endpoint supports ZRTP and the supported algorithms and 471 options. This information is transported in a Hello message, which 472 is described in Section 5.2. 474 ZRTP endpoints SHOULD include the SDP attribute a=zrtp-hash in offers 475 and answers, as defined in Section 8. 477 The Hello message includes the ZRTP version, Hash Type, Cipher Type, 478 SRTP authentication tag type, Key Agreement Type, and Short 479 Authentication String (SAS) algorithms that are supported. The Hello 480 message also includes a hash image as described in Section 9. In 481 addition, each endpoint sends and discovers ZIDs. The received ZID 482 is used later in the protocol as an index into a cache of shared 483 secrets that were previously negotiated and retained between the two 484 parties. 486 A Hello message can be sent at any time, but it is usually sent at 487 the start of an RTP session to determine if the other endpoint 488 supports ZRTP and also if the SRTP implementations are compatible. A 489 Hello message is retransmitted using timer T1 and an exponential 490 backoff mechanism detailed in Section 6 until the receipt of a 491 HelloACK message or a Commit message. 493 The use of the a=zrtp-hash SDP attribute to authenticate the Hello 494 message is described in Section 8.1. 496 If a Hello message, or any other ZRTP message, indicates that there 497 is a synchronization source (SSRC) collision, an Error message 498 (Section 5.9) MUST be sent with the Error Code indicating SSRC 499 collision, and the ZRTP negotiation MUST be terminated. The 500 procedures of RFC 3550, Section 8.2 [RFC3550], SHOULD be followed by 501 both endpoints to resolve this condition, and if it is resolved, a 502 new ZRTP secure session SHOULD be negotiated. 504 4.1.1. Protocol Version Negotiation 506 This specification defines ZRTP version 1.10. Since new versions of 507 ZRTP may be developed in the future, this specification defines a 508 protocol version negotiation in this section. 510 Each party declares what version of the ZRTP protocol they support 511 via the version field in the Hello message (Section 5.2). If both 512 parties have the same version number in their Hello messages, they 513 can proceed with the rest of the protocol. To facilitate both 514 parties reaching this state of protocol version agreement in their 515 Hello messages, ZRTP should use information provided in the signaling 516 layer, if available. If a ZRTP endpoint supports more than one 517 version of the protocol, it SHOULD declare them all in a list of SIP 518 SDP a=zrtp-hash attributes (defined in Section 8), listing separate 519 hashes, with separate ZRTP version numbers in each item in the list. 521 Both parties should inspect the list of ZRTP version numbers supplied 522 by the other party in the SIP SDP a=zrtp-hash attributes. Both 523 parties SHOULD choose the highest version number that appears in both 524 parties' list of a=zrtp-hash version numbers, and use that version 525 for their Hello messages. If both parties use the SIP signaling in 526 this manner, their initial Hello messages will have the same ZRTP 527 version number, provided they both have at least one supported 528 protocol version in common. Before the ZRTP key agreement can 529 proceed, an endpoint MUST have sent and received Hellos with the same 530 protocol version. 532 It is best if the signaling layer is used to negotiate the protocol 533 version number. However, the a=zrtp-hash SDP attribute is not always 534 present in the SIP packet, as explained in Section 8.1. In the 535 absence of any guidance from the signaling layer, an endpoint MUST 536 send the highest supported version in initial Hello messages. If the 537 two parties send different protocol version numbers in their Hello 538 messages, they can reach an agreement to use a common version, if one 539 exists. They iteratively apply the following rules until they both 540 have matching version fields in their Hello messages and the key 541 agreement can proceed: 543 o If an endpoint receives a Hello message with an unsupported 544 version number that is higher than the endpoint's current Hello 545 message version, the received Hello message MUST be ignored. The 546 endpoint continues to retransmit Hello messages on the standard 547 retry schedule (Section 6). 549 o If an endpoint receives a Hello message with a version number that 550 is lower than the endpoint's current Hello message, and the 551 endpoint supports a version that is less than or equal to the 552 received version number, the endpoint MUST stop retransmitting the 553 old version number and MUST start sending a Hello message with the 554 highest supported version number that is less than or equal to the 555 received version number. 557 o If an endpoint receives a Hello message with an unsupported 558 version number that is lower than the endpoint's current Hello 559 message, the endpoint MUST send an Error message (Section 5.9) 560 indicating failure to support this ZRTP version. 562 The above comparisons are iterated until the version numbers match, 563 or until it exits on a failure to match. 565 For example, assume that Alice supports protocol versions 1.10 and 566 2.00, and Bob supports versions 1.10 and 1.20. Alice initially 567 sends a Hello with version 2.00, and Bob initially sends a Hello 568 with version 1.20. Bob ignores Alice's 2.00 Hello and continues 569 to send his 1.20 Hellos. Alice detects that Bob does not support 570 2.00 and she stops sending her 2.00 Hellos and starts sending a 571 stream of 1.10 Hellos. Bob sees the 1.10 Hello from Alice and 572 stops sending his 1.20 Hellos and switches to sending 1.10 Hellos. 573 At that point, they have converged on using version 1.10 and the 574 protocol proceeds on that basis. 576 When comparing protocol versions, a ZRTP endpoint MUST include only 577 the first three octets of the version field in the comparison. The 578 final octet is ignored, because it is not significant for 579 interoperability. For example, "1.1 ", "1.10", "1.11", or "1.1a" are 580 all regarded as a version match, because they would all be 581 interoperable versions. 583 Changes in protocol version numbers are expected to be infrequent 584 after version 1.10. Supporting multiple versions adds code 585 complexity and may introduce security weaknesses in the 586 implementation. The old adage about keeping it simple applies 587 especially to implementing security protocols. Endpoints SHOULD NOT 588 support protocol versions earlier than version 1.10. 590 4.1.2. Algorithm Negotiation 592 A method is provided to allow the two parties to mutually and 593 deterministically choose the same DH key size and algorithm before a 594 Commit message is sent. 596 Each Hello message lists the algorithms in the order of preference 597 for that ZRTP endpoint. Endpoints eliminate the non-intersecting 598 choices from each of their own lists, resulting in each endpoint 599 having a list of algorithms in common that might or might not be 600 ordered the same as the other endpoint's list. Each endpoint 601 compares the first item on their own list with the first item on the 602 other endpoint's list and SHOULD choose the faster of the two 603 algorithms. For example: 605 o Alice's full list: DH2k, DH3k, EC25 607 o Bob's full list: EC38, EC25, DH3k 609 o Alice's intersecting list: DH3k, EC25 611 o Bob's intersecting list: EC25, DH3k 613 o Alice's first choice is DH3k, and Bob's first choice is EC25. 615 o Thus, both parties choose EC25 (ECDH-256) because it's faster. 617 To decide which DH algorithm is faster, the following ranking, from 618 fastest to slowest is defined: DH-2048, ECDH-256, DH-3072, ECDH-384, 619 ECDH-521. These are all defined in Section 5.1.5. 621 If both endpoints follow this method, they may each start their DH 622 calculations as soon as they receive the Hello message, and there 623 will be no need for either endpoint to discard their DH calculation 624 if the other endpoint becomes the initiator. 626 This method is used only to negotiate DH key size. For the rest of 627 the algorithm choices, it's simply whatever the initiator selects 628 from the algorithms in common. Note that the DH key size influences 629 the Hash Type and the size of the symmetric cipher key, as explained 630 in Section 5.1.5. 632 Unfavorable choices will never be made by this method, because each 633 endpoint will omit from their respective lists choices that are too 634 slow or not secure enough to meet their security policy. 636 4.2. Commit Contention 638 After both parties have received compatible Hello messages, a Commit 639 message (Section 5.4) can be sent to begin the ZRTP key exchange. 640 The endpoint that sends the Commit is known as the initiator, while 641 the receiver of the Commit is known as the responder. 643 If both sides send Commit messages initiating a secure session at the 644 same time, the following rules are used to break the tie: 646 o If one Commit is for a DH mode while the other is for Preshared 647 mode, then the Preshared Commit MUST be discarded and the DH 648 Commit proceeds. 650 o If the two Commits are both Preshared mode, and one party has set 651 the MiTM (M) flag in the Hello message and the other has not, the 652 Commit message from the party who set the (M) flag MUST be 653 discarded, and the one who has not set the (M) flag becomes the 654 initiator, regardless of the nonce values. In other words, for 655 Preshared mode, the phone is the initiator and the PBX is the 656 responder. 658 o If the two Commits are either both DH modes or both non-DH modes, 659 then the Commit message with the lowest hvi (hash value of 660 initiator) value (for DH Commits), or lowest nonce value (for non- 661 DH Commits), MUST be discarded and the other side is the 662 initiator, and the protocol proceeds with the initiator's Commit. 663 The two hvi or nonce values are compared as large unsigned 664 integers in network byte order. 666 If one Commit is for Multistream mode while the other is for non- 667 Multistream (DH or Preshared) mode, a software error has occurred and 668 the ZRTP negotiation should be terminated. This should never occur 669 because of the constraints on Multistream mode described in 670 Section 4.4.3. 672 In the event that Commit messages are sent by both ZRTP endpoints at 673 the same time, but are received in different media streams, the same 674 resolution rules apply as if they were received on the same stream. 675 The media stream in which the Commit was received or sent will 676 proceed through the ZRTP exchange while the media stream with the 677 discarded Commit must wait for the completion of the other ZRTP 678 exchange. 680 If a commit contention forces a DH Commit message to be discarded, 681 the responder's DH public value should only be discarded if it does 682 not match the initiator's DH key size. This will not happen if both 683 endpoints choose a common key size via the method described in 684 Section 4.1.2. 686 4.3. Matching Shared Secret Determination 688 The following sections describe how ZRTP endpoints generate and/or 689 use the set of shared secrets s1, auxsecret, and pbxsecret through 690 the exchange of the DHPart1 and DHPart2 messages. This doesn't cover 691 the Diffie-Hellman calculations. It only covers the method whereby 692 the two parties determine if they already have shared secrets in 693 common in their caches. 695 Each ZRTP endpoint maintains a long-term cache of shared secrets that 696 it has previously negotiated with the other party. The ZID of the 697 other party, received in the other party's Hello message, is used as 698 an index into this cache to find the set of shared secrets, if any 699 exist. This cache entry may contain previously retained shared 700 secrets, rs1 and rs2, which give ZRTP its key continuity features. 701 If the other party is a PBX, the cache may also contain a trusted 702 MiTM PBX shared secret, called pbxsecret, defined in Section 7.3.1. 704 The DHPart1 and DHPart2 messages contain a list of hashes of these 705 shared secrets to allow the two endpoints to compare the hashes with 706 what they have in their caches to detect whether the two sides share 707 any secrets that can be used in the calculation of the session key. 708 The use of this shared secret cache is described in Section 4.9. 710 If no secret of a given type is available, a random value is 711 generated and used for that secret to ensure a mismatch in the hash 712 comparisons in the DHPart1 and DHPart2 messages. This prevents an 713 eavesdropper from knowing which types of shared secrets are available 714 between the endpoints. 716 Section 4.3.1 refers to the auxiliary shared secret auxsecret. The 717 auxsecret shared secret may be defined by the VoIP user agent out-of- 718 band from the ZRTP protocol. It may be manually provisioned in 719 application-specific ways, such as computed from a hashed pass phrase 720 by prior agreement between the two parties or supplied by a hardware 721 token. Or, it may be a family key used by an institution to which 722 the two parties both belong. It is a generalized mechanism for 723 providing a shared secret that is agreed to between the two parties 724 out of scope of the ZRTP protocol. It is expected that most typical 725 ZRTP endpoints will rarely use auxsecret. However, in some use 726 cases, auxsecret can be used to authenticate another protocol, such 727 as a WebRTC DTLS-SRTP exchange [I-D.johnston-rtcweb-zrtp]. 729 For both the initiator and the responder, the shared secrets s1, s2, 730 and s3 will be calculated so that they can all be used later to 731 calculate s0 in Section 4.4.1.4. Here is how s1, s2, and s3 are 732 calculated by both parties. 734 The shared secret s1 will be either the initiator's rs1 or the 735 initiator's rs2, depending on which of them can be found in the 736 responder's cache. If the initiator's rs1 matches the responder's 737 rs1 or rs2, then s1 MUST be set to the initiator's rs1. If and only 738 if that match fails, then if the initiator's rs2 matches the 739 responder's rs1 or rs2, then s1 MUST be set to the initiator's rs2. 740 If that match also fails, then s1 MUST be set to null. The 741 complexity of the s1 calculation is to recover from any loss of cache 742 sync from an earlier aborted session, due to the Two Generals' 743 Problem [Byzantine]. 745 The shared secret s2 MUST be set to the value of auxsecret if and 746 only if both parties have matching values for auxsecret, as 747 determined by comparing the hashes of auxsecret sent in the DH 748 messages. If they don't match, s2 MUST be set to null. 750 The shared secret s3 MUST be set to the value of pbxsecret if and 751 only if both parties have matching values for pbxsecret, as 752 determined by comparing the hashes of pbxsecret sent in the DH 753 messages. If they don't match, s3 MUST be set to null. 755 If s1, s2, or s3 have null values, they are assumed to have a zero 756 length for the purposes of hashing them later during the s0 757 calculation in Section 4.4.1.4. 759 The comparison of hashes of rs1, rs2, auxsecret, and pbxsecret is 760 described in Section 4.3.1. 762 4.3.1. Calculation and Comparison of Hashes of Shared Secrets 764 Both parties calculate a set of non-invertible hashes (implemented 765 via the MAC defined in Section 5.1.2.1) of shared secrets that may be 766 present in each of their caches. These hashes are truncated to the 767 leftmost 64 bits: 769 rs1IDr = MAC(rs1, "Responder") 771 rs2IDr = MAC(rs2, "Responder") 773 auxsecretIDr = MAC(auxsecret, Responder's H3) 775 pbxsecretIDr = MAC(pbxsecret, "Responder") 777 rs1IDi = MAC(rs1, "Initiator") 779 rs2IDi = MAC(rs2, "Initiator") 781 auxsecretIDi = MAC(auxsecret, Initiator's H3) 783 pbxsecretIDi = MAC(pbxsecret, "Initiator") 785 The responder sends rs1IDr, rs2IDr, auxsecretIDr, and pbxsecretIDr in 786 the DHPart1 message. The initiator sends rs1IDi, rs2IDi, 787 auxsecretIDi, and pbxsecretIDi in the DHPart2 message. 789 The responder uses the locally computed rs1IDi, rs2IDi, auxsecretIDi, 790 and pbxsecretIDi to compare against the corresponding fields in the 791 received DHPart2 message. The initiator uses the locally computed 792 rs1IDr, rs2IDr, auxsecretIDr, and pbxsecretIDr to compare against the 793 corresponding fields in the received DHPart1 message. 795 From these comparisons, s1, s2, and s3 are calculated per the methods 796 described in Section 4.3. The secrets corresponding to matching 797 hashes are kept while the secrets corresponding to the non-matching 798 ones are replaced with a null, which is assumed to have a zero length 799 for the purposes of hashing them later. The resulting s1, s2, and s3 800 values are used later to calculate s0 in Section 4.4.1.4. 802 For example, consider two ZRTP endpoints who share secrets rs1 and 803 pbxsecret (defined in Section 7.3.1). During the comparison, rs1ID 804 and pbxsecretID will match but auxsecretID will not. As a result, s1 805 = rs1, s2 will be null, and s3 = pbxsecret. 807 4.3.2. Handling a Shared Secret Cache Mismatch 809 A shared secret cache mismatch is defined to mean that we expected a 810 cache match because rs1 exists in our local cache, but we computed a 811 null value for s1 (per the method described in Section 4.3). 813 If one party has a cached shared secret and the other party does not, 814 this indicates one of two possible situations. Either there is a 815 MiTM attack or one of the legitimate parties has lost their cached 816 shared secret by some mishap. Perhaps they inadvertently deleted 817 their cache or their cache was lost or disrupted due to restoring 818 their disk from an earlier backup copy. The party that has the 819 surviving cache entry can easily detect that a cache mismatch has 820 occurred, because they expect their own cached secret to match the 821 other party's cached secret, but it does not match. It is possible 822 for both parties to detect this condition if both parties have 823 surviving cached secrets that have fallen out of sync, due perhaps to 824 one party restoring from a disk backup. 826 If either party discovers a cache mismatch, the user agent who makes 827 this discovery must treat this as a possible security event and MUST 828 alert their own user that there is a heightened risk of a MiTM 829 attack, and that the user should verbally compare the SAS with the 830 other party to ascertain that no MiTM attack has occurred. If a 831 cache mismatch is detected and it is not possible to compare the SAS, 832 either because the user interface does not support it or because one 833 or both endpoints are unmanned devices, and no other SAS comparison 834 mechanism is available, the session MAY be terminated. 836 The session need not be terminated on a cache mismatch event if: 838 o the mechanism described in Section 8.1.1 is available, which 839 allows authentication of the DH exchange without human assistance, 840 or 842 o any mechanism is available to determine if the SAS matches. This 843 would require either circumstances that allow human verbal 844 comparisons of the SAS or by use of the OPTIONAL digital signature 845 feature on the SAS hash, as described in Section 7.2. 847 Even if the user interface does not permit an SAS comparison, the 848 human user MUST be warned and may elect to proceed with the call at 849 their own risk. 851 If and only if a cache mismatch event occurs, the cache update 852 mechanism in Section 4.6.1 is affected, requiring the user to verify 853 the SAS before the cache is updated. The user will thus be alerted 854 of this security condition on every call until the SAS is verified. 855 This is described in Section 4.6.1.1. 857 Here is a non-normative example of a cache-mismatch alert message 858 from a ZRTP user agent (specifically, [Zfone]), designed for a 859 desktop PC graphical user interface environment. It is by no means 860 required that the alert be this detailed: 862 We expected the other party to have a shared secret cached from a 863 previous call, but they don't have it. This may mean your partner 864 simply lost his cache of shared secrets, but it could also mean 865 someone is trying to wiretap you. To resolve this question you 866 must check the authentication string with your partner. If it 867 doesn't match, it indicates the presence of a wiretapper. 869 If the alert is rendered by a robot voice instead of a GUI, brevity 870 may be more important: 872 Something's wrong. You must check the authentication string with 873 your partner. If it doesn't match, it indicates the presence of a 874 wiretapper. 876 A mismatch of auxsecret is handled differently than a mismatch of 877 rs1. An auxsecret mismatch is defined to mean that auxsecret exists 878 locally, but we computed a null value for s2 (per the method 879 described in Section 4.3). This mismatch should be made visible to 880 whichever user has auxsecret defined. The mismatch should be made 881 visible to both users if they both have auxsecret defined but they 882 fail to match. The severity of the user notification is 883 implementation dependent. Aborting the session is not required. If 884 auxsecret matches, it should not excuse a mismatch of rs1, which 885 still requires a strong warning to the user. 887 4.4. DH and Non-DH Key Agreements 889 The next step is the generation of a secret for deriving SRTP keying 890 material. ZRTP uses Diffie-Hellman and two non-Diffie-Hellman modes, 891 described in the following subsections. 893 4.4.1. Diffie-Hellman Mode 895 The purpose of the Diffie-Hellman (either Finite Field Diffie-Hellman 896 or Elliptic Curve Diffie-Hellman) exchange is for the two ZRTP 897 endpoints to generate a new shared secret, s0. In addition, the 898 endpoints discover if they have any cached or previously stored 899 shared secrets in common, and it uses them as part of the calculation 900 of the session keys. 902 Because the DH exchange affects the state of the retained shared 903 secret cache, only one in-process ZRTP DH exchange may occur at a 904 time between two ZRTP endpoints. Otherwise, race conditions and 905 cache integrity problems will result. When multiple media streams 906 are established in parallel between the same pair of ZRTP endpoints 907 (determined by the ZIDs in the Hello messages), only one can be 908 processed. Once that exchange completes with Confirm2 and Conf2ACK 909 messages, another ZRTP DH exchange can begin. This constraint does 910 not apply when Multistream mode key agreement is used since the 911 cached shared secrets are not affected. 913 4.4.1.1. Hash Commitment in Diffie-Hellman Mode 915 From the intersection of the algorithms in the sent and received 916 Hello messages, the initiator chooses a hash, cipher, auth tag, Key 917 Agreement Type, and SAS Type to be used. 919 A Diffie-Hellman mode is selected by setting the Key Agreement Type 920 in the Commit to one of the DH or Elliptic Curve Diffie-Hellman 921 (ECDH) values from the table in Section 5.1.5. In this mode, the key 922 agreement begins with the initiator choosing a fresh random Diffie- 923 Hellman (DH) secret value (svi) based on the chosen Key Agreement 924 Type value, and computing the public value. (Note that to speed up 925 processing, this computation can be done in advance.) For guidance 926 on generating random numbers, see Section 4.8. 928 For Finite Field Diffie-Hellman, the value for the DH generator g, 929 the DH prime p, and the length of the DH secret value, svi, are 930 defined in Section 5.1.5. 932 pvi = g^svi mod p 934 where g and p are determined by the Key Agreement Type value. The DH 935 public value pvi value is formatted as a big-endian octet string and 936 fixed to the bit-length of the DH prime; leading zeros MUST NOT be 937 truncated. 939 For Elliptic Curve DH, pvi is calculated and formatted according to 940 the ECDH specification in Section 5.1.5, which refers in detail to 941 certain sections of NIST SP 800-56A [NIST-SP800-56A]. 943 The hash commitment is performed by the initiator of the ZRTP 944 exchange. The hash value of the initiator, hvi, includes a hash of 945 the entire DHPart2 message as shown in Figure 9 (which includes the 946 Diffie-Hellman public value, pvi), and the responder's Hello message 947 (where '||' means concatenation). The hvi hash is truncated to 256 948 bits: 950 hvi = hash(initiator's DHPart2 message || 951 responder's Hello message) 953 Note that the Hello message includes the fields shown in Figure 3. 955 The information from the responder's Hello message is included in the 956 hash calculation to prevent a bid-down attack by modification of the 957 responder's Hello message. 959 The initiator sends the hvi in the Commit message. 961 The use of hash commitment in the DH exchange constrains the attacker 962 to only one guess to generate the correct Short Authentication String 963 (SAS) (Section 7) in his attack, which means the SAS can be quite 964 short. A 16-bit SAS, for example, provides the attacker only one 965 chance out of 65536 of not being detected. Without this hash 966 commitment feature, a MiTM attacker would acquire both the pvi and 967 pvr public values from the two parties before having to choose his 968 own two DH public values for his MiTM attack. He could then use that 969 information to quickly perform a bunch of trial DH calculations for 970 both sides until he finds two with a matching SAS. To raise the cost 971 of this birthday attack, the SAS would have to be much longer. The 972 Short Authentication String would have to become a Long 973 Authentication String, which would be unacceptable to the user. A 974 hash commitment precludes this attack by forcing the MiTM to choose 975 his own two DH public values before learning the public values of 976 either of the two parties. 978 4.4.1.2. Responder Behavior in Diffie-Hellman Mode 980 Upon receipt of the Commit message, the responder generates its own 981 fresh random DH secret value, svr, and computes the public value. 982 (Note that to speed up processing, this computation can be done in 983 advance, with no need to discard this computation if both endpoints 984 chose the same algorithm via Section 4.1.2.) For guidance on random 985 number generation, see Section 4.8. 987 For Finite Field Diffie-Hellman, the value for the DH generator g, 988 the DH prime p, and the length of the DH secret value, svr, are 989 defined in Section 5.1.5. 991 pvr = g^svr mod p 993 The pvr value is formatted as a big-endian octet string, fixed to the 994 bit-length of the DH prime; leading zeros MUST NOT be truncated. 996 For Elliptic Curve DH, pvr is calculated and formatted according to 997 the ECDH specification in Section 5.1.5, which refers in detail to 998 certain sections of NIST SP 800-56A. 1000 Upon receipt of the DHPart2 message, the responder checks that the 1001 initiator's DH public value is not equal to 1 or p-1. An attacker 1002 might inject a false DHPart2 message with a value of 1 or p-1 for 1003 g^svi mod p, which would cause a disastrously weak final DH result to 1004 be computed. If pvi is 1 or p-1, the user SHOULD be alerted of the 1005 attack and the protocol exchange MUST be terminated. Otherwise, the 1006 responder computes its own value for the hash commitment using the DH 1007 public value (pvi) received in the DHPart2 message and its own Hello 1008 message and compares the result with the hvi received in the Commit 1009 message. If they are different, a MiTM attack is taking place and 1010 the user is alerted and the protocol exchange terminated. 1012 The responder then calculates the Diffie-Hellman result: 1014 DHResult = pvi^svr mod p 1016 4.4.1.3. Initiator Behavior in Diffie-Hellman Mode 1018 Upon receipt of the DHPart1 message, the initiator checks that the 1019 responder's DH public value is not equal to 1 or p-1. An attacker 1020 might inject a false DHPart1 message with a value of 1 or p-1 for 1021 g^svr mod p, which would cause a disastrously weak final DH result to 1022 be computed. If pvr is 1 or p-1, the user should be alerted of the 1023 attack and the protocol exchange MUST be terminated. 1025 The initiator then sends a DHPart2 message containing the initiator's 1026 DH public value and the set of calculated shared secret IDs as 1027 defined in Section 4.3.1. 1029 The initiator calculates the same Diffie-Hellman result using: 1031 DHResult = pvr^svi mod p 1033 4.4.1.4. Shared Secret Calculation for DH Mode 1035 A hash of the received and sent ZRTP messages in the current ZRTP 1036 exchange in the following order is calculated by both parties: 1038 total_hash = hash(Hello of responder || Commit || DHPart1 || 1039 DHPart2) 1041 Note that only the ZRTP messages (Figures 3, 5, 8, and 9), not the 1042 entire ZRTP packets, are included in the total_hash. 1044 For both the initiator and responder, the DHResult is formatted as a 1045 big-endian octet string and fixed to the width of the DH prime; 1046 leading zeros MUST NOT be truncated. For example, for a 3072-bit p, 1047 DHResult would be a 384 octet value, with the first octet the most 1048 significant. DHResult may also be the result of an ECDH calculation, 1049 which is discussed in Section 5.1.5. 1051 Key | Size of 1052 Agreement | DHResult 1053 ------------------------ 1054 DH-3072 | 384 octets 1055 ------------------------ 1056 DH-2048 | 256 octets 1057 ------------------------ 1058 ECDH P-256 | 32 octets 1059 ------------------------ 1060 ECDH P-384 | 48 octets 1061 ------------------------ 1063 The authors believe the calculation of the final shared secret, s0, 1064 is in compliance with the recommendations in Sections 5.8.1 and 1065 6.1.2.1 of NIST SP 800-56A [NIST-SP800-56A]. This is done by hashing 1066 a concatenation of a number of items, including the DHResult, the 1067 ZID's of the initiator (ZIDi) and the responder (ZIDr), the 1068 total_hash, and the set of non-null shared secrets as described in 1069 Section 4.3. 1071 In Section 5.8.1 of [NIST-SP800-56A], NIST requires certain 1072 parameters to be hashed together in a particular order, which NIST 1073 refers to as: Z, AlgorithmID, PartyUInfo, PartyVInfo, SuppPubInfo, 1074 and SuppPrivInfo. In our implementation, our DHResult corresponds to 1075 Z, "ZRTP-HMAC-KDF" corresponds to AlgorithmID, our ZIDi and ZIDr 1076 correspond to PartyUInfo and PartyVInfo, our total_hash corresponds 1077 to SuppPubInfo, and the set of three shared secrets s1, s2, and s3 1078 corresponds to SuppPrivInfo. NIST also requires a 32-bit big-endian 1079 integer counter to be included in the hash each time the hash is 1080 computed, which we have set to the fixed value of 1 because we only 1081 compute the hash once. NIST refers to the final hash output as 1082 DerivedKeyingMaterial, which corresponds to our s0 in this 1083 calculation. 1085 s0 = hash(counter || DHResult || "ZRTP-HMAC-KDF" || ZIDi || 1086 ZIDr || total_hash || len(s1) || s1 || len(s2) || 1087 s2 || len(s3) || s3) 1089 Note that temporary values s1, s2, and s3 were calculated per the 1090 methods described in Section 4.3. DHResult, s1, s2, and s3 MUST all 1091 be erased from memory immediately after they are used to calculate 1092 s0. 1094 The length of the DHResult field was implicitly agreed to by the 1095 negotiated DH prime size. The length of total_hash is implicitly 1096 determined by the negotiated hash algorithm. All of the explicit 1097 length fields, len(), in the above hash are 32-bit big-endian 1098 integers, giving the length in octets of the field that follows. 1099 Some members of the set of shared secrets (s1, s2, and s3) may have 1100 lengths of zero if they are null (not shared) and are each preceded 1101 by a 4-octet length field. For example, if s2 is null, len(s2) is 1102 0x00000000, and s2 itself would be absent from the hash calculation, 1103 which means len(s3) would immediately follow len(s2). While 1104 inclusion of ZIDi and ZIDr may be redundant, because they are 1105 implicitly included in the total_hash, we explicitly include them 1106 here to follow NIST SP 800-56A. The fixed-length string "ZRTP-HMAC- 1107 KDF" (not null-terminated) identifies for what purpose the resulting 1108 s0 will be used, which is to serve as the key derivation key for the 1109 ZRTP HMAC-based key derivation function (KDF) defined in 1110 Section 4.5.1 and used in Section 4.5.3. 1112 The authors believe ZRTP DH mode is in full compliance with two 1113 relevant NIST documents that cover key derivations. First, 1114 Section 5.8.1 of [NIST-SP800-56A] computes what NIST refers to as 1115 DerivedKeyingMaterial, which ZRTP refers to as s0. This s0 then 1116 serves as the key derivation key, which NIST refers to as KI in the 1117 key derivation function described in Sections 5 and 5.1 of 1118 [NIST-SP800-108], to derive all the rest of the subkeys needed by 1119 ZRTP. For ECDH mode, the authors believe the s0 calculation is also 1120 in compliance with Section 3.1 of the National Security Agency's 1121 (NSA's) Suite B Implementer's Guide to NIST SP 800-56A 1122 [NSA-Suite-B-Guide-56A]. 1124 The ZRTP key derivation function (KDF) (Section 4.5.1) requires the 1125 use of a KDF Context field (per [NIST-SP800-108] guidelines), which 1126 should include the ZIDi, ZIDr, and a nonce value known to both 1127 parties. The total_hash qualifies as a nonce value, because its 1128 computation included nonce material from the initiator's Commit 1129 message and the responder's Hello message. 1131 KDF_Context = (ZIDi || ZIDr || total_hash) 1133 At this point in DH mode, the two endpoints proceed to the key 1134 derivations of ZRTPSess and the rest of the keys in Section 4.5.2, 1135 now that there is a defined s0. 1137 4.4.2. Preshared Mode 1139 The Preshared key agreement mode can be used to generate SRTP keys 1140 and salts without a DH calculation, instead relying on a shared 1141 secret from previous DH calculations between the endpoints. 1143 This key agreement mode is useful to rapidly re-establish a secure 1144 session between two parties who have recently started and ended a 1145 secure session that has already performed a DH key agreement, without 1146 performing another lengthy DH calculation, which may be desirable on 1147 slow processors in resource-limited environments. Preshared mode 1148 MUST NOT be used for adding additional media streams to an existing 1149 call. Multistream mode MUST be used for this purpose. 1151 In the most severe resource-limited environments, Preshared mode may 1152 be useful with processors that cannot perform a DH calculation in an 1153 ergonomically acceptable time limit. Shared key material may be 1154 manually provisioned between two such endpoints in advance and still 1155 allow a limited subset of functionality. Such a "better than 1156 nothing" implementation would have to be regarded as non-compliant 1157 with the ZRTP specification, but it could interoperate in Preshared 1158 (and if applicable, Multistream) mode with a compliant ZRTP endpoint. 1160 Because Preshared mode affects the state of the retained shared 1161 secret cache, only one in-process ZRTP Preshared exchange may occur 1162 at a time between two ZRTP endpoints. This rule is explained in more 1163 detail in Section 4.4.1, and applies for the same reasons as in DH 1164 mode. 1166 Preshared mode is only included in this specification to meet the 1167 R-REUSE requirement in the Media Security Requirements [RFC5479] 1168 document. A series of preshared-keyed calls between two ZRTP 1169 endpoints should use a DH key exchange periodically. Preshared mode 1170 is only used if a cached shared secret has been established in an 1171 earlier session by a DH exchange, as discussed in Section 4.9. 1173 4.4.2.1. Commitment in Preshared Mode 1175 Preshared mode is selected by setting the Key Agreement Type to 1176 Preshared in the Commit message. This results in the same call flow 1177 as Multistream mode. The principal difference between Multistream 1178 mode and Preshared mode is that Preshared mode uses a previously 1179 cached shared secret, rs1, instead of an active ZRTP Session key, 1180 ZRTPSess, as the initial keying material. 1182 Preshared mode depends on having a reliable shared secret in its 1183 cache. Before Preshared mode is used, the initial DH exchange that 1184 gave rise to the shared secret SHOULD have used at least one of these 1185 anti-MiTM mechanisms: 1) A verbal comparison of the SAS, evidenced by 1186 the SAS Verified flag, or 2) an end-to-end integrity-protected 1187 delivery of the a=zrtp-hash in the signaling (Section 8.1.1), or 3) a 1188 digital signature on the sashash (Section 7.2). 1190 4.4.2.2. Initiator Behavior in Preshared Mode 1192 The Commit message (Figure 7) is sent by the initiator of the ZRTP 1193 exchange. From the intersection of the algorithms in the sent and 1194 received Hello messages, the initiator chooses a hash, cipher, auth 1195 tag, Key Agreement Type, and SAS Type to be used. 1197 To assemble a Preshared commit, we must first construct a temporary 1198 preshared_key, which is constructed from one of several possible 1199 combinations of cached key material, depending on what is available 1200 in the shared secret cache. If rs1 is not available in the 1201 initiator's cache, then Preshared mode MUST NOT be used. 1203 preshared_key = hash(len(rs1) || rs1 || len(auxsecret) || 1204 auxsecret || len(pbxsecret) || pbxsecret) 1206 All of the explicit length fields, len(), in the above hash are 1207 32-bit big-endian integers, giving the length in octets of the field 1208 that follows. Some members of the set of shared secrets (rs1, 1209 auxsecret, and pbxsecret) may have lengths of zero if they are null 1210 (not available), and are each preceded by a 4-octet length field. 1211 For example, if auxsecret is null, len(auxsecret) is 0x00000000, and 1212 auxsecret itself would be absent from the hash calculation, which 1213 means len(pbxsecret) would immediately follow len(auxsecret). 1215 In place of hvi in the Commit message, two smaller fields are 1216 inserted by the initiator: 1218 - A random nonce of length 4 words (16 octets). 1220 - A keyID = MAC(preshared_key, "Prsh") truncated to 64 bits. 1222 Note: Since the nonce is used to calculate different SRTP key and 1223 salt pairs for each session, a duplication will result in the same 1224 key and salt being generated for the two sessions, which would 1225 have disastrous security consequences. 1227 4.4.2.3. Responder Behavior in Preshared Mode 1229 The responder uses the received keyID to search for matching key 1230 material in its cache. It does this by computing a preshared_key 1231 value and keyID value using the same formula as the initiator, 1232 depending on what is available in the responder's local cache. If 1233 the locally computed keyID does not match the received keyID in the 1234 Commit, the responder recomputes a new preshared_key and keyID from a 1235 different subset of shared keys from the cache, dropping auxsecret, 1236 pbxsecret, or both from the hash calculation, until a matching 1237 preshared_key is found or it runs out of possibilities. Note that 1238 rs2 is not included in the process. 1240 If it finds the appropriate matching shared key material, it is used 1241 to derive s0 and a new ZRTPSess key, as described in the next section 1242 on shared secret calculation, Section 4.4.2.4. 1244 If the responder determines that it does not have a cached shared 1245 secret from a previous DH exchange, or it fails to match the keyID 1246 hash from the initiator with any combination of its shared keys, it 1247 SHOULD respond with its own DH Commit message. This would reverse 1248 the roles and the responder would become the initiator, because the 1249 DH Commit must always "trump" the Preshared Commit message as 1250 described in Section 4.2. The key exchange would then proceed using 1251 DH mode. However, if a severely resource-limited responder lacks the 1252 computing resources to respond in a reasonable time with a DH Commit, 1253 it MAY respond with a ZRTP Error message (Section 5.9) indicating 1254 that no shared secret is available. 1256 If both sides send Preshared Commit messages initiating a secure 1257 session at the same time, the contention is resolved and the 1258 initiator/responder roles are settled according to Section 4.2, and 1259 the protocol proceeds. 1261 In Preshared mode, both the DHPart1 and DHPart2 messages are skipped. 1262 After receiving the Commit message from the initiator, the responder 1263 sends the Confirm1 message after calculating this stream's SRTP keys, 1264 as described below. 1266 4.4.2.4. Shared Secret Calculation for Preshared Mode 1268 Preshared mode requires that the s0 and ZRTPSess keys be derived from 1269 the preshared_key, and this must be done in a way that guarantees 1270 uniqueness for each session. This is done by using nonce material 1271 from both parties: the explicit nonce in the initiator's Preshared 1272 Commit message (Figure 7) and the H3 field in the responder's Hello 1273 message (Figure 3). Thus, both parties force the resulting shared 1274 secret to be unique for each session. 1276 A hash of the received and sent ZRTP messages in the current ZRTP 1277 exchange for the current media stream is calculated: 1279 total_hash = hash(Hello of responder || Commit) 1281 Note that only the ZRTP messages (Figures 3 and 7), not the entire 1282 ZRTP packets, are included in the total_hash. 1284 The ZRTP key derivation function (KDF) (Section 4.5.1) requires the 1285 use of a KDF Context field (per [NIST-SP800-108] guidelines), which 1286 should include the ZIDi, ZIDr, and a nonce value known to both 1287 parties. The total_hash qualifies as a nonce value, because its 1288 computation included nonce material from the initiator's Commit 1289 message and the responder's Hello message. 1291 KDF_Context = (ZIDi || ZIDr || total_hash) 1293 The s0 key is derived via the ZRTP key derivation function 1294 (Section 4.5.1) from preshared_key and the nonces implicitly included 1295 in the total_hash. The nonces also ensure KDF_Context is unique for 1296 each session, which is critical for security. 1298 s0 = KDF(preshared_key, "ZRTP PSK", KDF_Context, 1299 negotiated hash length) 1301 The preshared_key MUST be erased as soon as it has been used to 1302 calculate s0. 1304 At this point in Preshared mode, the two endpoints proceed to the key 1305 derivations of ZRTPSess and the rest of the keys in Section 4.5.2, 1306 now that there is a defined s0. 1308 4.4.3. Multistream Mode 1310 The Multistream key agreement mode can be used to generate SRTP keys 1311 and salts for additional media streams established between a pair of 1312 endpoints. Multistream mode cannot be used unless there is an active 1313 SRTP session established between the endpoints, which means a ZRTP 1314 Session key is active. This ZRTP Session key can be used to generate 1315 keys and salts without performing another DH calculation. In this 1316 mode, the retained shared secret cache is not used or updated. As a 1317 result, multiple ZRTP Multistream mode exchanges can be processed in 1318 parallel between two endpoints. 1320 Multistream mode is also used to resume a secure call that has gone 1321 clear using a GoClear message as described in Section 4.7.2.1. 1323 When adding additional media streams to an existing call, Multistream 1324 mode MUST be used. The first media stream MUST use either DH mode or 1325 Preshared mode. Only one DH exchange or Preshared exchange is 1326 performed, just for the first media stream. The DH exchange or 1327 Preshared exchange MUST be completed for the first media stream 1328 before Multistream mode is used to add any other media streams. In a 1329 Multistream session, a ZRTP endpoint MUST use the same ZID for all 1330 media streams, matching the ZID used in the first media stream. 1332 4.4.3.1. Commitment in Multistream Mode 1334 Multistream mode is selected by the initiator setting the Key 1335 Agreement Type to "Mult" in the Commit message (Figure 6). The 1336 Cipher Type, Auth Tag Length, and Hash in Multistream mode SHOULD be 1337 set by the initiator to the same as the values as in the initial DH 1338 Mode Commit. The SAS Type is ignored as there is no SAS 1339 authentication in this mode. 1341 Note: This requirement is needed since some endpoints cannot 1342 support different SRTP algorithms for different media streams. 1343 However, in the case of Multistream mode being used to go secure 1344 after a GoClear, the requirement to use the same SRTP algorithms 1345 is relaxed if there are no other active SRTP sessions. 1347 In place of hvi in the Commit, a random nonce of length 4 words (16 1348 octets) is chosen. Its value MUST be unique for all nonce values 1349 chosen for active ZRTP sessions between a pair of endpoints. If a 1350 Commit is received with a reused nonce value, the ZRTP exchange MUST 1351 be immediately terminated. 1353 Note: Since the nonce is used to calculate different SRTP key and 1354 salt pairs for each media stream, a duplication will result in the 1355 same key and salt being generated for the two media streams, which 1356 would have disastrous security consequences. 1358 If a Commit is received selecting Multistream mode, but the responder 1359 does not have a ZRTP Session Key available, the exchange MUST be 1360 terminated. Otherwise, the responder proceeds to the next section on 1361 shared secret calculation, Section 4.4.3.2. 1363 If both sides send Multistream Commit messages at the same time, the 1364 contention is resolved and the initiator/responder roles are settled 1365 according to Section 4.2, and the protocol proceeds. 1367 In Multistream mode, both the DHPart1 and DHPart2 messages are 1368 skipped. After receiving the Commit message from the initiator, the 1369 responder sends the Confirm1 message after calculating this stream's 1370 SRTP keys, as described below. 1372 4.4.3.2. Shared Secret Calculation for Multistream Mode 1374 In Multistream mode, each media stream requires that a set of keys be 1375 derived from the ZRTPSess key, and this must be done in a way that 1376 guarantees uniqueness for each media stream. This is done by using 1377 nonce material from both parties: the explicit nonce in the 1378 initiator's Multistream Commit message (Figure 6) and the H3 field in 1379 the responder's Hello message (Figure 3). Thus, both parties force 1380 the resulting shared secret to be unique for each media stream. 1382 A hash of the received and sent ZRTP messages in the current ZRTP 1383 exchange for the current media stream is calculated: 1385 total_hash = hash(Hello of responder || Commit) 1387 This refers to the Hello and Commit messages for the current media 1388 stream, which is using Multistream mode, not the original media 1389 stream that included a full DH key agreement. Note that only the 1390 ZRTP messages (Figures 3 and 6), not the entire ZRTP packets, are 1391 included in the hash. 1393 The ZRTP key derivation function (KDF) (Section 4.5.1) requires the 1394 use of a KDF Context field (per [NIST-SP800-108] guidelines), which 1395 should include the ZIDi, ZIDr, and a nonce value known to both 1396 parties. The total_hash qualifies as a nonce value, because its 1397 computation included nonce material from the initiator's Commit 1398 message and the responder's Hello message. 1400 KDF_Context = (ZIDi || ZIDr || total_hash) 1402 The current stream's SRTP keys and salts for the initiator and 1403 responder are calculated using the ZRTP Session Key ZRTPSess and the 1404 nonces implicitly included in the total_hash. The nonces also ensure 1405 that KDF_Context will be unique for each media stream, which is 1406 critical for security. For each additional media stream, a separate 1407 s0 is derived from ZRTPSess via the ZRTP key derivation function 1408 (Section 4.5.1): 1410 s0 = KDF(ZRTPSess, "ZRTP MSK", KDF_Context, 1411 negotiated hash length) 1413 Note that the ZRTPSess key was previously derived from material that 1414 also includes a different and more inclusive total_hash from the 1415 entire packet sequence that performed the original DH exchange for 1416 the first media stream in this ZRTP session. 1418 At this point in Multistream mode, the two endpoints begin key 1419 derivations in Section 4.5.3. 1421 4.5. Key Derivations 1423 4.5.1. The ZRTP Key Derivation Function 1425 To derive keys from a shared secret, ZRTP uses an HMAC-based key 1426 derivation function, or KDF. It is used throughout Section 4.5.3 and 1427 in other sections. The HMAC function for the KDF is based on the 1428 negotiated hash algorithm defined in Section 5.1.2. 1430 The authors believe the ZRTP KDF is in full compliance with the 1431 recommendations in NIST SP 800-108 [NIST-SP800-108]. Section 7.5 of 1432 the NIST document describes "key separation", which is a security 1433 requirement for the cryptographic keys derived from the same key 1434 derivation key. The keys shall be separate in the sense that the 1435 compromise of some derived keys will not degrade the security 1436 strength of any of the other derived keys or the security strength of 1437 the key derivation key. Strong preimage resistance is provided. 1439 The ZRTP KDF runs the NIST pseudorandom function (PRF) in counter 1440 mode, with only a single iteration of the counter. The NIST PRF is 1441 based on the HMAC function. The ZRTP KDF never has to generate more 1442 than 256 bits (or 384 bits for Suite B applications) of output key 1443 material, so only a single invocation of the HMAC function is needed. 1445 The ZRTP KDF is defined in this manner, per Sections 5 and 5.1 of 1446 [NIST-SP800-108]: 1448 KDF(KI, Label, Context, L) = HMAC(KI, i || Label || 1449 0x00 || Context || L) 1451 The HMAC in the KDF is keyed by KI, which is a secret key derivation 1452 key that is unknown to the wiretapper (for example, s0). The HMAC is 1453 computed on a concatenated set of nonsecret fields that are defined 1454 as follows. The first field is a 32-bit big-endian integer counter 1455 (i) required by NIST to be included in the HMAC each time the HMAC is 1456 computed, which we have set to the fixed value of 0x000001 because we 1457 only compute the HMAC once. Label is a string of nonzero octets that 1458 identifies the purpose for the derived keying material. The octet 1459 0x00 is a delimiter required by NIST. The NIST KDF formula has a 1460 "Context" field that includes ZIDi, ZIDr, and some optional nonce 1461 material known to both parties. L is a 32-bit big-endian positive 1462 integer, not to exceed the length in bits of the output of the HMAC. 1463 The output of the KDF is truncated to the leftmost L bits. If 1464 SHA-384 is the negotiated hash algorithm, the HMAC would be HMAC-SHA- 1465 384; thus, the maximum value of L would be 384, the negotiated hash 1466 length. 1468 The ZRTP KDF is not to be confused with the SRTP KDF defined in 1469 [RFC3711]. 1471 4.5.2. Deriving ZRTPSess Key and SAS in DH or Preshared Modes 1473 Both DH mode and Preshared mode (but not Multistream mode) come to 1474 this common point in the protocol to derive ZRTPSess and the SAS from 1475 s0, via the ZRTP Key Derivation Function (Section 4.5.1). At this 1476 point, s0 has been calculated, as well as KDF_Context. These 1477 calculations are done only for the first media stream, not for 1478 Multistream mode. 1480 The ZRTPSess key is used only for these two purposes: 1) to generate 1481 the additional s0 keys (Section 4.4.3.2) for adding additional media 1482 streams to this session in Multistream mode, and 2) to generate the 1483 pbxsecret (Section 7.3.1) that may be cached for use in future 1484 sessions. The ZRTPSess key is kept for the duration of the call 1485 signaling session between the two ZRTP endpoints. That is, if there 1486 are two separate calls between the endpoints (in SIP terms, separate 1487 SIP dialogs), then a ZRTP Session Key MUST NOT be used across the two 1488 call signaling sessions. ZRTPSess MUST be destroyed no later than 1489 the end of the call signaling session. 1491 ZRTPSess = KDF(s0, "ZRTP Session Key", KDF_Context, 1492 negotiated hash length) 1494 Note that KDF_Context is unique for each media stream, but only the 1495 first media stream is permitted to calculate ZRTPSess. 1497 There is only one Short Authentication String (SAS) (Section 7) 1498 computed per call, which is applicable to all media streams derived 1499 from a single DH key agreement in a ZRTP session. KDF_Context is 1500 unique for each media stream, but only the first media stream is 1501 permitted to calculate sashash. 1503 sashash = KDF(s0, "SAS", KDF_Context, 256) 1505 sasvalue = sashash [truncated to leftmost 32 bits] 1507 Despite the exposure of the SAS to the two parties, the rest of the 1508 keying material is protected by the key separation properties of the 1509 KDF (Section 4.5.1). 1511 ZRTP-enabled VoIP clients may need to support additional forms of 1512 communication, such as text chat, instant messaging, or file 1513 transfers. These other forms of communication may need to be 1514 encrypted, and would benefit from leveraging the ZRTP key exchange 1515 used for the VoIP part of the call. In that case, more key material 1516 MAY be derived and "exported" from the ZRTP protocol and provided as 1517 a shared secret to the VoIP client for these non-VoIP purposes. The 1518 application can use this exported key in application-specific ways, 1519 outside the scope of the ZRTP protocol. 1521 ExportedKey = KDF(s0, "Exported key", KDF_Context, 1522 negotiated hash length) 1524 Only one ExportedKey is computed per call. KDF_Context is unique for 1525 each media stream, but only the first media stream is permitted to 1526 calculate ExportedKey. 1528 The application may use this exported key to derive other subkeys for 1529 various non-ZRTP purposes, via a KDF using separate KDF label strings 1530 defined by the application. This key or its derived subkeys can be 1531 used for encryption, or used to authenticate other key exchanges 1532 carried out by the application, protected by ZRTP's MiTM defense 1533 umbrella. The exported key and its descendants may be used for as 1534 long as needed by the application, maintained in a separate crypto 1535 context that may outlast the VoIP session. 1537 At this point in DH mode or Preshared mode, the two endpoints proceed 1538 on to the key derivations in Section 4.5.3, now that there is a 1539 defined s0 and ZRTPSess key. 1541 4.5.3. Deriving the Rest of the Keys from s0 1543 DH mode, Multistream mode, and Preshared mode all come to this common 1544 point in the protocol to derive a set of keys from s0. It can be 1545 assumed that s0 has been calculated, as well the ZRTPSess key and 1546 KDF_Context. A separate s0 key is associated with each media stream. 1548 Subkeys are not drawn directly from s0, as done in NIST SP 800-56A. 1549 To enhance key separation, ZRTP uses s0 to key a Key Derivation 1550 Function (Section 4.5.1) based on [NIST-SP800-108]. Since s0 already 1551 included total_hash in its derivation, it is redundant to use 1552 total_hash again in the KDF Context in all the invocations of the KDF 1553 keyed by s0. Nonetheless, NIST SP 800-108 always requires KDF 1554 Context to be defined for the KDF, and nonce material is required in 1555 some KDF invocations (especially for Multistream mode and Preshared 1556 mode), so total_hash is included as a nonce in the KDF Context. 1558 Separate SRTP master keys and master salts are derived for use in 1559 each direction for each media stream. Unless otherwise specified, 1560 ZRTP uses SRTP with no Master Key Identifier (MKI), 32-bit 1561 authentication using HMAC-SHA1, AES-CM 128 or 256-bit key length, 1562 112-bit session salt key length, 2^48 key derivation rate, and SRTP 1563 prefix length 0. Secure RTCP (SRTCP) is also used, deriving the 1564 SRTCP keys from the same master keys and salts as SRTP, using the 1565 mechanisms specified in [RFC3711], without requiring a separate ZRTP 1566 negotiation for RTCP. 1568 The ZRTP initiator encrypts and the ZRTP responder decrypts packets 1569 by using srtpkeyi and srtpsalti, while the ZRTP responder encrypts 1570 and the ZRTP initiator decrypts packets by using srtpkeyr and 1571 srtpsaltr. The SRTP key and salt values are truncated (taking the 1572 leftmost bits) to the length determined by the chosen SRTP profile. 1573 These are generated by: 1575 srtpkeyi = KDF(s0, "Initiator SRTP master key", KDF_Context, 1576 negotiated AES key length) 1578 srtpsalti = KDF(s0, "Initiator SRTP master salt", KDF_Context, 112) 1580 srtpkeyr = KDF(s0, "Responder SRTP master key", KDF_Context, 1581 negotiated AES key length) 1583 srtpsaltr = KDF(s0, "Responder SRTP master salt", KDF_Context, 112) 1585 The MAC keys are the same length as the output of the underlying hash 1586 function in the KDF and are thus generated without truncation. They 1587 are used only by ZRTP and not by SRTP. Different MAC keys are needed 1588 for the initiator and the responder to ensure that GoClear messages 1589 in each direction are unique and can not be cached by an attacker and 1590 reflected back to the endpoint. 1592 mackeyi = KDF(s0, "Initiator HMAC key", KDF_Context, 1593 negotiated hash length) 1595 mackeyr = KDF(s0, "Responder HMAC key", KDF_Context, 1596 negotiated hash length) 1598 ZRTP keys are generated for the initiator and responder to use to 1599 encrypt the Confirm1 and Confirm2 messages. They are truncated to 1600 the same size as the negotiated SRTP key size. 1602 zrtpkeyi = KDF(s0, "Initiator ZRTP key", KDF_Context, 1603 negotiated AES key length) 1605 zrtpkeyr = KDF(s0, "Responder ZRTP key", KDF_Context, 1606 negotiated AES key length) 1608 All key material is destroyed as soon as it is no longer needed, no 1609 later than the end of the call. s0 is erased in Section 4.6.1, and 1610 the rest of the session key material is erased in Sections 4.7.2.1 1611 and 4.7.3. 1613 4.6. Confirmation 1615 The Confirm1 and Confirm2 messages (Figure 10) contain the cache 1616 expiration interval (defined in Section 4.9) for the newly generated 1617 retained shared secret. The flagoctet is an 8-bit unsigned integer 1618 made up of these flags: the PBX Enrollment flag (E) defined in 1619 Section 7.3.1, the SAS Verified flag (V) defined in Section 7.1, the 1620 Allow Clear flag (A) defined in Section 4.7.2, and the Disclosure 1621 flag (D) defined in Section 11. 1623 flagoctet = (E * 2^3) + (V * 2^2) + (A * 2^1) + (D * 2^0) 1625 Part of the Confirm1 and Confirm2 messages are encrypted using full- 1626 block Cipher Feedback Mode and contain a 128-bit random Cipher 1627 FeedBack (CFB) Initialization Vector (IV). The Confirm1 and Confirm2 1628 messages also contain a MAC covering the encrypted part of the 1629 Confirm1 or Confirm2 message that includes a string of zeros, the 1630 signature length, flag octet, cache expiration interval, signature 1631 type block (if present), and signature (Section 7.2) (if present). 1632 For the responder: 1634 confirm_mac = MAC(mackeyr, encrypted part of Confirm1) 1636 For the initiator: 1638 confirm_mac = MAC(mackeyi, encrypted part of Confirm2) 1640 The mackeyi and mackeyr keys are computed in Section 4.5.3. 1642 The exchange is completed when the responder sends either the 1643 Conf2ACK message or the responder's first SRTP media packet (with a 1644 valid SRTP auth tag). The initiator MUST treat the first valid SRTP 1645 media from the responder as equivalent to receiving a Conf2ACK. The 1646 responder may respond to Confirm2 with either SRTP media, Conf2ACK, 1647 or both, in whichever order the responder chooses (or whichever order 1648 the "cloud" chooses to deliver them). 1650 4.6.1. Updating the Cache of Shared Secrets 1652 After receiving the Confirm messages, both parties must now update 1653 their retained shared secret rs1 in their respective caches, provided 1654 the following conditions hold: 1656 (1) This key exchange is either DH or Preshared mode, not 1657 Multistream mode, which does not update the cache. 1659 (2) Depending on the values of the cache expiration intervals that 1660 are received in the two Confirm messages, there are some 1661 scenarios that do not update the cache, as explained in 1662 Section 4.9. 1664 (3) The responder MUST receive the initiator's Confirm2 message 1665 before updating the responder's cache. 1667 (4) The initiator MUST receive either the responder's Conf2ACK 1668 message or the responder's SRTP media (with a valid SRTP auth 1669 tag) before updating the initiator's cache. 1671 The cache update may also be affected by a cache mismatch, according 1672 to Section 4.6.1.1 or Section 4.6.1.2. 1674 For DH mode only, before updating the retained shared secret rs1 in 1675 the cache, each party first discards their old rs2 and copies their 1676 old rs1 to rs2. The old rs1 is saved to rs2 because of the risk of 1677 session interruption after one party has updated his own rs1 but 1678 before the other party has enough information to update her own rs1. 1679 If that happens, they may regain cache sync in the next session by 1680 using rs2 (per Section 4.3). This mitigates the well-known Two 1681 Generals' Problem [Byzantine]. The old rs1 value is not saved in 1682 Preshared mode. 1684 For DH mode and Preshared mode, both parties compute a new rs1 value 1685 from s0 via the ZRTP key derivation function (Section 4.5.1): 1687 rs1 = KDF(s0, "retained secret", KDF_Context, 256) 1689 Note that KDF_Context is unique for each media stream, but only the 1690 first media stream is permitted to update rs1. 1692 Each media stream has its own s0. At this point in the protocol for 1693 each media stream, the corresponding s0 MUST be erased. 1695 If a cache update is appropriate, subject to the above conditions and 1696 not delayed by a cache mismatch, it should be done as follows. Both 1697 ZRTP endpoints SHOULD commit the new rs1 to nonvolatile storage 1698 immediately upon receiving the remote party's Confirm message. The 1699 initiator should write the new rs1 before sending the Confirm2 1700 message, and the responder should write the new rs1 before sending 1701 any SRTP media. This means no SRTP media will be sent by either 1702 party until the new rs1 is saved by both parties. After receiving 1703 evidence that the remote party has committed the new rs1 to 1704 nonvolatile storage, rs2 (the old value of rs1) SHOULD be discarded. 1705 Receiving a few packets of properly formed SRTP media after the 1706 Confirm message would be evidence that the remote party has remained 1707 functioning long enough to commit the new rs1 to nonvolatile storage. 1708 A brief interval (about one second of encrypted media) should be 1709 sufficient for rs1 to be properly saved across a cluster of 1710 distributed load-sharing PBXs that share a common cache. A good 1711 strategy is to hold back from committing rs2 to nonvolatile storage 1712 for this brief interval, and commit it to nonvolatile storage only if 1713 the connection is lost during that interval, or if encrypted media 1714 fails to appear within a reasonable time. Since this would be a rare 1715 event, in most cases rs2 would not be saved. If rs2 is saved 1716 unconditionally, it would have the undesirable effect of lengthening 1717 the window of vulnerability for a MiTM attack if the cache is 1718 captured by an attacker, as described in Section 15.1. 1720 4.6.1.1. Cache Update Following a Cache Mismatch 1722 If a shared secret cache mismatch (as defined in Section 4.3.2) is 1723 detected in the current session, it indicates a possible MiTM attack. 1724 However, there may be evidence to the contrary, if either one of the 1725 following conditions are met: 1727 o Successful use of the mechanism described in Section 8.1.1, but 1728 only if fully supported by end-to-end integrity-protected delivery 1729 of the a=zrtp-hash in the signaling via SIP Identity [RFC4474] or 1730 better still, Dan Wing's SIP Identity using Media Path 1731 [SIP-IDENTITY]. This allows authentication of the DH exchange 1732 without human assistance. 1734 o A good signature is received and verified using the digital 1735 signature feature on the SAS hash, as described in Section 7.2, if 1736 this feature is supported. 1738 If there is a cache mismatch in the absence of the aforementioned 1739 mitigating evidence, the cache update MUST be delayed in the current 1740 session until the user verbally compares the SAS with his partner 1741 during the call and confirms a successful SAS verify via his user 1742 interface as described in Section 7.1. If the session ends before 1743 that happens, the cache update is not performed, leaving the rs1/rs2 1744 values unmodified in the cache. The local SAS Verified (V) flag is 1745 also left unmodified in this case. 1747 This means the caches will continue to be mismatched on subsequent 1748 calls, and the user will thus be alerted of this security condition 1749 on every call until the SAS is verified. Or, if the cache mismatches 1750 are caused by an actual MiTM attack instead of a cache mishap, the 1751 alerts will continue on every call until the caches match again 1752 because the MiTM attacker ceased his attacks. In that case, the 1753 cache entries and related (V) flags are unscathed by the MiTM 1754 attacker when the attacks cease. The MiTM attacker is thus foiled 1755 from even having a denial-of-service effect on the caches. 1757 If the user verbally compares the SAS with his partner during the 1758 call and confirms a successful SAS verify via his user interface, the 1759 local cache is then updated. Note that in this case rs2 (the old 1760 value of rs1) must also be saved, to mitigate the possibility of the 1761 remote user failing to update. 1763 Regardless of whether a cache mismatch occurs, s0 must still be 1764 erased. 1766 If no cache entry exists, as is the case in the initial call, the 1767 cache update is handled in the normal fashion. 1769 4.6.1.2. Cache Update for a PBX Following a Cache Mismatch 1771 In the event of a cache mismatch, a PBX MUST NOT update the cache if 1772 there is a pbxsecret defined on the PBX, but it does not match the 1773 pbxsecret of the remote endpoint. Otherwise, the PBX MUST update the 1774 cache, notwithstanding Section 4.6.1.1. 1776 If a ZRTP endpoint is enrolled with a PBX, it is desirable that 1777 the PBX's cache is not easily disrupted by an attempted MiTM 1778 attack. The enrolled phone should also not update the cache per 1779 Section 4.6.1.1. A PBX has no human to verify the SAS, so the PBX 1780 assumes the cache should be updated unless a pbxsecret mismatch 1781 suggests otherwise. Note that unenrolled phones will lose cache 1782 sync after an attempted MiTM attack, because the PBX will update 1783 the cache during the attack. 1785 However, this loss of cache sync for an unenrolled phone may be 1786 easily remedied by calling an enrolled phone behind the PBX (with the 1787 PBX acting as a MiTM) and re-verifying the SAS with a human. That 1788 would update the cache on both the unenrolled phone and the PBX, re- 1789 establishing cache sync. 1791 The PBX's lack of human assisted SAS verification following a cache 1792 mismatch is one more reason to reduce the PBX's MiTM role whenever 1793 possible, as explained in Section 10.1. 1795 4.7. Termination 1797 A ZRTP session is normally terminated at the end of a call, but it 1798 may be terminated early by either the Error message or the GoClear 1799 message. 1801 4.7.1. Termination via Error Message 1803 The Error message (Section 5.9) is used to terminate an in-progress 1804 ZRTP exchange due to an error. The Error message contains an integer 1805 Error Code for debugging purposes. The termination of a ZRTP key 1806 agreement exchange results in no updates to the cached shared secrets 1807 and deletion of all crypto context for that media stream. The ZRTP 1808 Session key, ZRTPSess, is only deleted if all ZRTP media streams that 1809 are using it are terminated. 1811 Because no key agreement has been reached, the Error message cannot 1812 use the same MAC protection as the GoClear message. A denial of 1813 service is possible by injecting fake Error messages. (However, even 1814 if the Error message were somehow designed with integrity protection, 1815 it would raise other questions. What would a badly formed Error 1816 message mean if it were sent to report a badly formed message? A 1817 good message?) 1819 4.7.2. Termination via GoClear Message 1821 The GoClear message (Section 5.11) is used to switch from SRTP to 1822 RTP, usually because the user has chosen to do that by pressing a 1823 button. The GoClear uses a MAC of the Message Type Block sent in the 1824 GoClear message computed with the mackey derived from the shared 1825 secret. This MAC is truncated to the leftmost 64 bits. When sent by 1826 the initiator: 1828 clear_mac = MAC(mackeyi, "GoClear ") 1830 When sent by the responder: 1832 clear_mac = MAC(mackeyr, "GoClear ") 1834 Both of these MACs are calculated across the 8-octet "GoClear " 1835 Message Type Block, including the trailing space. 1837 A GoClear message that does not receive a ClearACK response must be 1838 resent. If a GoClear message is received with a bad MAC, ClearACK 1839 MUST NOT be sent and the GoClear MUST NOT be acted on by the 1840 recipient, but it MAY be processed as a security exception, perhaps 1841 by logging or alerting the user. 1843 A ZRTP endpoint MAY choose to accept GoClear messages after the 1844 session has switched to SRTP, allowing the session to revert to RTP. 1845 This is indicated in the Confirm1 or Confirm2 messages (Figure 10) by 1846 setting the Allow Clear flag (A). If an endpoint sets the Allow 1847 Clear (A) flag in their Confirm message, it indicates that they 1848 support receiving GoClear messages. 1850 A ZRTP endpoint that receives a GoClear MUST authenticate the message 1851 by checking the clear_mac. If the message authenticates, the 1852 endpoint stops sending SRTP packets, and generates a ClearACK in 1853 response. It MUST also delete all the crypto key material for all 1854 the SRTP media streams, as defined in Section 4.7.2.1. 1856 Until confirmation from the user is received (e.g., clicking a 1857 button, pressing a dual-tone multi-frequency (DTMF) key, etc.), the 1858 ZRTP endpoint MUST NOT resume sending RTP packets. The endpoint then 1859 renders to the user an indication that the media session has switched 1860 to clear mode and waits for confirmation from the user. This blocks 1861 the flow of sensitive discourse until the user is forced to take 1862 notice that he's no longer protected by encryption. To prevent 1863 pinholes from closing or NAT bindings from expiring, the ClearACK 1864 message MAY be resent at regular intervals (e.g., every 5 seconds) 1865 while waiting for confirmation from the user. After confirmation of 1866 the notification is received from the user, the sending of RTP 1867 packets may begin. 1869 After sending a GoClear message, the ZRTP endpoint stops sending SRTP 1870 packets. When a ClearACK is received, the ZRTP endpoint deletes the 1871 crypto context for the SRTP session, as defined in Section 4.7.2.1, 1872 and may then resume sending RTP packets. 1874 In the event a ClearACK is not received before the retransmissions of 1875 GoClear are exhausted, the key material is deleted, as defined in 1876 Section 4.7.2.1. 1878 After the users have transitioned from SRTP media back to RTP media 1879 (clear mode), they may decide later to return to secure mode by 1880 manual activation, usually by pressing a GO SECURE button. In that 1881 case, a new secure session is initiated by the party that presses the 1882 button, by sending a new Commit message, leading to a new session key 1883 negotiation. It is not necessary to send another Hello message, as 1884 the two parties have already done that at the start of the call and 1885 thus have already discovered each other's ZRTP capabilities. It is 1886 possible for users to toggle back and forth between clear and secure 1887 modes multiple times in the same session, just as they could in the 1888 old days of secure PSTN phones. 1890 4.7.2.1. Key Destruction for GoClear Message 1892 All SRTP session key material MUST be erased by the receiver of the 1893 GoClear message upon receiving a properly authenticated GoClear. The 1894 same key destruction MUST be done by the sender of GoClear message, 1895 upon receiving the ClearACK. This must be done for the key material 1896 for all of the media streams. 1898 All key material that would have been erased at the end of the SIP 1899 session MUST be erased, as described in Section 4.7.3, with the 1900 single exception of ZRTPSess. In this case, ZRTPSess is destroyed in 1901 a manner different from the other key material. Both parties replace 1902 ZRTPSess with a KDF-derived non-invertible function of itself: 1904 ZRTPSess = KDF(ZRTPSess, "New ZRTP Session", (ZIDi || ZIDr), 1905 negotiated hash length) 1907 ZRTPSess will be replaced twice if a session generates separate 1908 GoClear messages for both audio and video streams, and the two 1909 endpoints need not carry out the replacements in the same order. 1911 The destruction of key material meets the requirements of Perfect 1912 Forward Secrecy (PFS), but still preserves a new version of ZRTPSess, 1913 so that the user can later re-initiate secure mode during the same 1914 session without performing another Diffie-Hellman calculation using 1915 Multistream mode, which requires and assumes the existence of 1916 ZRTPSess with the same value at both ZRTP endpoints. A new key 1917 negotiation after a GoClear SHOULD use a Multistream Commit message. 1919 Note: Multistream mode is preferred over a Diffie-Hellman mode 1920 since this does not require the generation of a new hash chain and 1921 a new signaling exchange to exchange new Hello Hash values. 1923 Later, at the end of the entire call, ZRTPSess is finally destroyed 1924 along with the other key material, as described in Section 4.7.3. 1926 4.7.3. Key Destruction at Termination 1928 All SRTP session key material MUST be erased by both parties at the 1929 end of the call. In particular, the destroyed key material includes 1930 the SRTP session keys and salts, SRTP master keys and salts, and all 1931 material sufficient to reconstruct the SRTP keys and salts, including 1932 ZRTPSess and s0 (although s0 should have been destroyed earlier, in 1933 Section 4.6.1). This must be done for the key material for all of 1934 the media streams. The only exceptions are the cached shared secrets 1935 needed for future sessions, including rs1, rs2, and pbxsecret. 1937 4.8. Random Number Generation 1939 The ZRTP protocol uses random numbers for cryptographic key material, 1940 notably for the DH secret exponents and nonces, which must be freshly 1941 generated with each session. Whenever a random number is needed, all 1942 of the following criteria must be satisfied: 1944 Random numbers MUST be freshly generated, meaning that they must not 1945 have been used in a previous calculation. 1947 When generating a random number k of L bits in length, k MUST be 1948 chosen with equal probability from the range of [1 < k < 2^L]. 1950 It MUST be derived from a physical entropy source, such as radio 1951 frequency (RF) noise, acoustic noise, thermal noise, high-resolution 1952 timings of environmental events, or other unpredictable physical 1953 sources of entropy. One possible source of entropy for a VoIP client 1954 would be microphone noise. For a detailed explanation of 1955 cryptographic grade random numbers and guidance for collecting 1956 suitable entropy, see [RFC4086] and Chapter 10 of "Practical 1957 Cryptography" [Ferguson]. The raw entropy must be distilled and 1958 processed through a deterministic random-bit generator (DRBG). 1959 Examples of DRBGs may be found in [NIST-SP800-90], in [Ferguson], and 1960 in [RFC5869]. Failure to use true entropy from the physical 1961 environment as a basis for generating random cryptographic key 1962 material would lead to a disastrous loss of security. 1964 4.9. ZID and Cache Operation 1966 Each instance of ZRTP has a unique 96-bit random ZRTP ID, or ZID, 1967 that is generated once at installation time. It is used to look up 1968 retained shared secrets in a local cache. A single global ZID for a 1969 single installation is the simplest way to implement ZIDs. However, 1970 it is specifically not precluded for an implementation to use 1971 multiple ZIDs, up to the limit of a separate one per callee. This 1972 then turns it into a long-lived "association ID" that does not apply 1973 to any other associations between a different pair of parties. It is 1974 a goal of this protocol to permit both options to interoperate 1975 freely. A PBX acting as a trusted man in the middle will also 1976 generate a single ZID and use that ZID for all endpoints behind it, 1977 as described in Section 10. 1979 There is no protocol mechanism to invalidate a previously used ZID. 1980 An endpoint wishing to change ZIDs would simply generate a new one 1981 and begin using it. 1983 The ZID should not be hard coded or hard defined in the firmware of a 1984 product. It should be randomly generated by the software and stored 1985 at installation or initialization time. It should be randomly 1986 generated rather than allocated from a preassigned range of ZID 1987 values, because 96 bits should be enough to avoid birthday collisions 1988 in realistic scenarios. 1990 Each time a new s0 is calculated, a new retained shared secret rs1 is 1991 generated and stored in the cache, indexed by the ZID of the other 1992 endpoint. This cache updating is described in Section 4.6.1. For 1993 the new retained shared secret, each endpoint chooses a cache 1994 expiration value that is an unsigned 32-bit integer of the number of 1995 seconds that this secret should be retained in the cache. The time 1996 interval is relative to when the Confirm1 message is sent or 1997 received. 1999 The cache intervals are exchanged in the Confirm1 and Confirm2 2000 messages (Figure 10). The actual cache interval used by both 2001 endpoints is the minimum of the values from the Confirm1 and Confirm2 2002 messages. A value of 0 seconds means the newly computed shared 2003 secret SHOULD NOT be stored in the cache, and if a cache entry 2004 already exists from an earlier call, the stored cache interval should 2005 be set to 0. This means if either Confirm message contains a null 2006 cache expiration interval, and there is no cache entry already 2007 defined, no new cache entry is created. A value of 0xffffffff means 2008 the secret should be cached indefinitely and is the recommended 2009 value. If the ZRTP exchange is Multistream mode, the field in the 2010 Confirm1 and Confirm2 is set to 0xffffffff and is ignored; the cache 2011 is not updated. 2013 The expiration interval need not be used to force the deletion of a 2014 shared secret from the cache when the interval has expired. It just 2015 means the shared secret MAY be deleted from that cache at any point 2016 after the interval has expired without causing the other party to 2017 note it as an unexpected security event when the next key negotiation 2018 occurs between the same two parties. This means there need not be 2019 perfectly synchronized deletion of expired secrets from the two 2020 caches, and makes it easy to avoid a race condition that might 2021 otherwise be caused by clock skew. 2023 If the expiration interval is not properly agreed to by both 2024 endpoints, it may later result in false alarms of MiTM attacks, due 2025 to apparent cache mismatches (Section 4.3.2). 2027 It is essential that each cache entry have some form of human- 2028 readable name associated with it. If cache entries are stored 2029 without human-readable names, a MiTM attack is possible for an 2030 attacker who has previously established cache entries with both 2031 parties, as explained in Section 12. Users would have to do a verbal 2032 SAS compare for every call, greatly diminishing the value of caching. 2034 The relationship between a ZID and a SIP AOR is explained in 2035 Section 12. 2037 4.9.1. Cacheless Implementations 2039 It is possible to implement a simplified but nonetheless useful (and 2040 still compliant) profile of the ZRTP protocol that does not support 2041 any caching of shared secrets. In this case, the users would have to 2042 rely exclusively on the verbal SAS comparison for every call. That 2043 is, unless MiTM protection is provided by the mechanisms in 2044 Section 8.1.1 or 7.2, which introduce their own forms of complexity. 2046 If a ZRTP endpoint does not support the caching of shared secrets, it 2047 MUST set the cache expiration interval to zero, and MUST set the SAS 2048 Verified (V) flag (Section 7.1) to false. In addition, because the 2049 ZID serves mainly as a cache index, the ZID would not be required to 2050 maintain the same value across separate SIP sessions, although there 2051 is no reason why it should not. 2053 Cacheless operation would sacrifice the key continuity (Section 16.1) 2054 features, as well as Preshared mode (Section 4.4.2). Further, if the 2055 pbxsecret is also not cached, there would be no PBX trusted MiTM 2056 (Section 7.3) features, including the PBX security enrollment 2057 (Section 7.3.1) mechanism. 2059 5. ZRTP Messages 2061 All ZRTP messages use the message format defined in Figure 2. All 2062 word lengths referenced in this specification are 32 bits, or 4 2063 octets. All integer fields are carried in network byte order, that 2064 is, most-significant byte (octet) first, commonly known as big- 2065 endian. 2067 0 1 2 3 2068 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2069 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2070 |0 0 0 1 0 0| Not Used (All 0's)| Sequence Number | 2071 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2072 | Magic Cookie 'ZRTP' (0x5a525450) | 2073 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2074 | Source Identifier | 2075 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2076 | | 2077 | ZRTP Message (length depends on Message Type) | 2078 | . . . | 2079 | | 2080 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2081 | CRC (1 word) | 2082 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2084 Figure 2: ZRTP Packet Format 2086 The Sequence Number is a count that is incremented for each ZRTP 2087 packet sent. The count is initialized to a random value. This is 2088 useful in estimating ZRTP packet loss and also detecting when ZRTP 2089 packets arrive out of sequence. 2091 The ZRTP Magic Cookie is a 32-bit string that uniquely identifies a 2092 ZRTP packet and has the value 0x5a525450. 2094 Source Identifier is the SSRC number of the RTP stream to which this 2095 ZRTP packet relates. For cases of forking or forwarding, RTP, and 2096 hence ZRTP, may arrive at the same port from several different 2097 sources -- each of these sources will have a different SSRC and may 2098 initiate an independent ZRTP protocol session. SSRC collisions would 2099 be disruptive to ZRTP. SSRC collision handling procedures are 2100 described in Section 4.1. 2102 This format is clearly distinguishable from RTP, STUN, and DTLS due 2103 to the first six bit settings, as discussed in 2104 [I-D.ietf-avtcore-rfc5764-mux-fixes]. The next 10 bits are unused 2105 and are set to zero and MUST be ignored when received. 2107 In early versions of this spec, ZRTP messages were encapsulated in 2108 RTP header extensions, which made ZRTP an eponymous variant of 2109 RTP. In later versions, the packet format changed to make it 2110 syntactically distinguishable from RTP. In RFC 6189, bits 4 and 5 2111 were considered unused but set to zero. 2113 The ZRTP messages are defined in Figures 3 to 17 and are of variable 2114 length. 2116 The ZRTP protocol uses a 32-bit Cyclic Redundancy Check (CRC) as 2117 defined in RFC 4960, Appendix B [RFC4960], in each ZRTP packet to 2118 detect transmission errors. ZRTP packets are typically transported 2119 by UDP, which carries its own built-in 16-bit checksum for integrity, 2120 but ZRTP does not rely on it. This is because of the effect of an 2121 undetected transmission error in a ZRTP message. For example, an 2122 undetected error in the DH exchange could appear to be an active man- 2123 in-the-middle attack. A false announcement of this by ZRTP clients 2124 can be psychologically distressing. The probability of such a false 2125 alarm hinges on a mere 16-bit checksum that usually protects UDP 2126 packets, so more error detection is needed. For these reasons, this 2127 belt-and-suspenders approach is used to minimize the chance of a 2128 transmission error affecting the ZRTP key agreement. 2130 The CRC is calculated across the entire ZRTP packet shown in 2131 Figure 2, including the ZRTP header and the ZRTP message, but not 2132 including the CRC field. If a ZRTP message fails the CRC check, it 2133 is silently discarded. 2135 5.1. ZRTP Message Formats 2137 ZRTP messages are designed to simplify endpoint parsing requirements 2138 and to reduce the opportunities for buffer overflow attacks (a good 2139 goal of any security extension should be to not introduce new attack 2140 vectors). 2142 ZRTP uses a block of 8 octets (2 words) to encode the Message Type. 2143 4-octet (1 word) blocks are used to encode Hash Type, Cipher Type, 2144 Key Agreement Type, and Authentication Tag Type. The values in the 2145 blocks are ASCII strings that are extended with spaces (0x20) to make 2146 them the desired length. Currently defined block values are listed 2147 in Tables 1-6. 2149 Additional block values may be defined and used. 2151 ZRTP uses this ASCII encoding to simplify debugging and make it 2152 "Wireshark (Ethereal) friendly". 2154 5.1.1. Message Type Block 2156 Currently, 16 Message Type Blocks are defined -- they represent the 2157 set of ZRTP message primitives. ZRTP endpoints MUST support the 2158 Hello, HelloACK, Commit, DHPart1, DHPart2, Confirm1, Confirm2, 2159 Conf2ACK, SASrelay, RelayACK, Error, ErrorACK, and PingACK message 2160 types. ZRTP endpoints MAY support the GoClear, ClearACK, and Ping 2161 messages. In order to generate a PingACK message, it is necessary to 2162 parse a Ping message. Additional messages may be defined in 2163 extensions to ZRTP. 2165 Message Type Block | Meaning 2166 --------------------------------------------------- 2167 "Hello " | Hello Message 2168 --------------------------------------------------- 2169 "HelloACK" | HelloACK Message 2170 --------------------------------------------------- 2171 "Commit " | Commit Message 2172 --------------------------------------------------- 2173 "DHPart1 " | DHPart1 Message 2174 --------------------------------------------------- 2175 "DHPart2 " | DHPart2 Message 2176 --------------------------------------------------- 2177 "Confirm1" | Confirm1 Message 2178 --------------------------------------------------- 2179 "Confirm2" | Confirm2 Message 2180 --------------------------------------------------- 2181 "Conf2ACK" | Conf2ACK Message 2182 --------------------------------------------------- 2183 "Error " | Error Message 2184 --------------------------------------------------- 2185 "ErrorACK" | ErrorACK Message 2186 --------------------------------------------------- 2187 "GoClear " | GoClear Message 2188 --------------------------------------------------- 2189 "ClearACK" | ClearACK Message 2190 --------------------------------------------------- 2191 "SASrelay" | SASrelay Message 2192 --------------------------------------------------- 2193 "RelayACK" | RelayACK Message 2194 --------------------------------------------------- 2195 "Ping " | Ping Message 2196 --------------------------------------------------- 2197 "PingACK " | PingACK Message 2198 --------------------------------------------------- 2200 Table 1. Message Type Block Values 2202 5.1.2. Hash Type Block 2204 The hash algorithm and its related MAC algorithm are negotiated via 2205 the Hash Type Block found in the Hello message (Section 5.2) and the 2206 Commit message (Section 5.4). 2208 All ZRTP endpoints MUST support a Hash Type of SHA-256 [FIPS-180-3]. 2209 SHA-384 SHOULD be supported and MUST be supported if ECDH-384 is 2210 used. Additional Hash Types MAY be used, such as the NIST SHA-3 hash 2211 [SHA-3] when it becomes available. Note that the Hash Type refers to 2212 the hash algorithm that will be used throughout the ZRTP key 2213 exchange, not the hash algorithm to be used in the SRTP 2214 Authentication Tag. 2216 The choice of the negotiated Hash Type is coupled to the Key 2217 Agreement Type, as explained in Section 5.1.5. 2219 Hash Type Block | Meaning 2220 ---------------------------------------------------------- 2221 "S256" | SHA-256 Hash defined in FIPS 180-3 2222 ---------------------------------------------------------- 2223 "S384" | SHA-384 Hash defined in FIPS 180-3 2224 ---------------------------------------------------------- 2225 "N256" | NIST SHA-3 256-bit hash (when published) 2226 ---------------------------------------------------------- 2227 "N384" | NIST SHA-3 384-bit hash (when published) 2228 ---------------------------------------------------------- 2230 Table 2. Hash Type Block Values 2232 At the time of this writing, the NIST SHA-3 hashes [SHA-3] are not 2233 yet available. NIST is expected to publish SHA-3 in 2012, as a 2234 successor to the SHA-2 hashes in [FIPS-180-3]. 2236 5.1.2.1. Negotiated Hash and MAC Algorithm 2238 ZRTP makes use of message authentication codes (MACs) that are keyed 2239 hashes based on the negotiated Hash Type. For the SHA-2 and SHA-3 2240 hashes, the negotiated MAC is the HMAC based on the negotiated hash. 2241 This MAC function is also used in the ZRTP key derivation function 2242 (Section 4.5.1). 2244 The HMAC function is defined in [FIPS-198-1]. A discussion of the 2245 general security of the HMAC construction may be found in [RFC2104]. 2246 Test vectors for HMAC-SHA-256 and HMAC-SHA-384 may be found in 2247 [RFC4231]. 2249 The negotiated Hash Type does not apply to the hash used in the 2250 digital signature defined in Section 7.2. For example, even if the 2251 negotiated Hash Type is SHA-256, the digital signature may use 2252 SHA-384 if an Elliptic Curve Digital Signature Algorithm (ECDSA) 2253 P-384 signature key is used. Digital signatures are optional in 2254 ZRTP. 2256 Except for the aforementioned digital signatures, and the special 2257 cases noted in Section 5.1.2.2, all the other hashes and MACs used 2258 throughout the ZRTP protocol will use the negotiated Hash Type. 2260 A future hash may include its own built-in MAC, not based on the HMAC 2261 construct, for example, the Skein hash function [Skein]. If NIST 2262 chooses such a hash as the SHA-3 winner, Hash Types "N256", and 2263 "N384" will still use the related HMAC as the negotiated MAC. If an 2264 implementer wishes to use Skein and its built-in MAC as the 2265 negotiated MAC, new Hash Types must be used. 2267 5.1.2.2. Implicit Hash and MAC Algorithm 2269 While most of the hash and MAC usage in ZRTP is defined by the 2270 negotiated Hash Type (Section 5.1.2), some hashes and MACs must be 2271 precomputed prior to negotiations, and thus cannot have their 2272 algorithms negotiated during the ZRTP exchange. They are implicitly 2273 predetermined to use SHA-256 [FIPS-180-3] and HMAC-SHA-256. 2275 These are the hashes and MACs that MUST use the Implicit hash and MAC 2276 algorithm: 2278 The hash chain H0-H3 defined in Section 9. 2280 The MACs that are keyed by this hash chain, as defined in 2281 Section 8.1.1. 2283 The Hello Hash in the a=zrtp-hash attribute defined in 2284 Section 8.1. 2286 ZRTP defines a method for negotiating different ZRTP protocol 2287 versions (Section 4.1.1). SHA-256 is the Implicit Hash and HMAC- 2288 SHA-256 is the Implicit MAC for ZRTP protocol version 1.10. Future 2289 ZRTP protocol versions may, if appropriate, use another hash 2290 algorithm as the Implicit Hash, such as the NIST SHA-3 hash [SHA-3], 2291 when it becomes available. For example, a future SIP packet may list 2292 two a=zrtp-hash SDP attributes, one based on SHA-256 for ZRTP version 2293 1.10, and another based on SHA-3 for ZRTP version 2.00. 2295 5.1.3. Cipher Type Block 2297 The block cipher algorithm is negotiated via the Cipher Type Block 2298 found in the Hello message (Section 5.2) and the Commit message 2299 (Section 5.4). 2301 All ZRTP endpoints MUST support AES-128 (AES1) and MAY support 2302 AES-192 (AES2), AES-256 (AES3), or other Cipher Types. The Advanced 2303 Encryption Standard is defined in [FIPS-197]. 2305 The use of AES-128 in SRTP is defined by [RFC3711]. The use of 2306 AES-192 and AES-256 in SRTP is defined by [RFC6188]. All ZRTP 2307 endpoints must support AES in counter mode for SRTP. The choice of 2308 the AES key length is coupled to the Key Agreement Type, as explained 2309 in Section 5.1.5. 2311 Other block ciphers may be supported that have the same block size 2312 and key sizes as AES. If implemented, they may be used anywhere in 2313 ZRTP or SRTP in place of the AES, in the same modes of operation and 2314 key size. Notably, in counter mode to replace AES-CM in [RFC3711] 2315 and [RFC6188], as well as in CFB mode to encrypt a portion of the 2316 Confirm message (Figure 10) and SASrelay message (Figure 16). ZRTP 2317 endpoints MAY support the TwoFish [TwoFish] block cipher. 2319 Cipher Type Block | Meaning 2320 ------------------------------------------------- 2321 "AES1" | AES with 128-bit keys 2322 ------------------------------------------------- 2323 "AES2" | AES with 192-bit keys 2324 ------------------------------------------------- 2325 "AES3" | AES with 256-bit keys 2326 ------------------------------------------------- 2327 "2FS1" | TwoFish with 128-bit keys 2328 ------------------------------------------------- 2329 "2FS2" | TwoFish with 192-bit keys 2330 ------------------------------------------------- 2331 "2FS3" | TwoFish with 256-bit keys 2332 ------------------------------------------------- 2334 Table 3. Cipher Type Block Values 2336 5.1.4. Auth Tag Type Block 2338 All ZRTP endpoints MUST support HMAC-SHA1 authentication tags for 2339 SRTP, with both 32-bit and 80-bit length tags as defined in 2340 [RFC3711]. 2342 ZRTP endpoints MAY support 32-bit and 64-bit SRTP authentication tags 2343 based on the Skein hash function [Skein]. The Skein-512-MAC key 2344 length is fixed at 256 bits for this application, and the output 2345 length is adjustable. The Skein MAC is defined in Sections 2.6 and 2346 4.3 of [Skein] and is not based on the HMAC construct. Reference 2347 implementations for Skein may be found at [Skein1]. A Skein-based 2348 MAC is significantly more efficient than HMAC-SHA1, especially for 2349 short SRTP payloads. 2351 The Skein MAC key is computed by the SRTP key derivation function, 2352 which is also referred to as the AES-CM PRF, or pseudorandom 2353 function. This is defined either in [RFC3711] or in [RFC6188], 2354 depending on the selected SRTP AES key length. To compute a Skein 2355 MAC key, the SRTP PRF output for the authentication key is left 2356 untruncated at 256 bits, instead of the usual truncated length of 160 2357 bits (the key length used by HMAC-SHA1). 2359 In [RFC3711], Section 9.5 prohibits the use of 32-bit auth tags for 2360 SRTCP, regardless of the SRTP auth tag length. Accordingly, if Skein 2361 is used for SRTP auth tags, SRTCP MUST use Skein 64-bit auth tags, 2362 regardless of the negotiated SRTP auth tag length. 2364 Auth Tag Type Block | Meaning 2365 ---------------------------------------------------------- 2366 "HS32" | 32-bit authentication tag based on 2367 | HMAC-SHA1 as defined in RFC 3711. 2368 ---------------------------------------------------------- 2369 "HS80" | 80-bit authentication tag based on 2370 | HMAC-SHA1 as defined in RFC 3711. 2371 ---------------------------------------------------------- 2372 "SK32" | 32-bit authentication tag based on 2373 | Skein-512-MAC as defined in [Skein], 2374 | with 256-bit key, 32-bit MAC length. 2375 ---------------------------------------------------------- 2376 "SK64" | 64-bit authentication tag based on 2377 | Skein-512-MAC as defined in [Skein], 2378 | with 256-bit key, 64-bit MAC length. 2379 ---------------------------------------------------------- 2381 Table 4. Auth Tag Type Values 2383 Implementers should be aware that AES-GCM and AES-CCM for SRTP are 2384 expected to become available when [SRTP-AES-GCM] is published as an 2385 RFC. If an implementer wishes to use these modes when they become 2386 available, new Auth Tag Types must be added. 2388 5.1.5. Key Agreement Type Block 2390 All ZRTP endpoints MUST support DH3k, SHOULD support Preshared, and 2391 MAY support EC25, EC38, and DH2k. 2393 If a ZRTP endpoint supports multiple concurrent media streams, such 2394 as audio and video, it MUST support Multistream (Section 4.4.3) mode. 2395 Also, if a ZRTP endpoint supports the GoClear message 2396 (Section 4.7.2), it SHOULD support Multistream, to be used if the two 2397 parties choose to return to the secure state after going Clear (as 2398 explained in Section 4.7.2.1). 2400 For Finite Field Diffie-Hellman, ZRTP endpoints MUST use the DH 2401 parameters defined in [RFC3526], as follows. DH3k uses the 3072-bit 2402 modular exponentiation group (MODP). DH2k uses the 2048-bit MODP 2403 group. The DH generator g is 2. The random Diffie-Hellman secret 2404 exponent SHOULD be twice as long as the AES key length. If AES-128 2405 is used, the DH secret value SHOULD be 256 bits long. If AES-256 is 2406 used, the secret value SHOULD be 512 bits long. 2408 If Elliptic Curve DH is used, the ECDH algorithm and key generation 2409 is from [NIST-SP800-56A]. The curves used are from [NSA-Suite-B], 2410 which uses the same curves as ECDSA defined by [FIPS-186-3], and can 2411 also be found in RFC 5114, Sections 2.6 through 2.8 [RFC5114]. ECDH 2412 test vectors may be found in RFC 5114, appendices A.6 through A.8 2413 [RFC5114]. The validation procedures are from [NIST-SP800-56A], 2414 Section 5.6.2.6, method 3, Elliptic Curve Cryptography (ECC) Partial 2415 Validation. Both the X and Y coordinates of the point on the curve 2416 are sent, in the first and second half of the ECDH public value, 2417 respectively. The ECDH result returns only the X coordinate, as 2418 specified in SP 800-56A. Useful strategies for implementing ECC may 2419 be found in [RFC6090]. 2421 The choice of the negotiated hash algorithm (Section 5.1.2) is 2422 coupled to the choice of Key Agreement Type. If ECDH-384 (EC38) is 2423 chosen as the key agreement, the negotiated hash algorithm MUST be 2424 either SHA-384 or the corresponding SHA-3 successor. 2426 The choice of AES key length is coupled to the choice of Key 2427 Agreement Type. If EC38 is chosen as the key agreement, AES-256 2428 (AES3) SHOULD be used but AES-192 MAY be used. If DH3k or EC25 is 2429 chosen, any AES key size MAY be used. Note that SRTP as defined in 2430 [RFC3711] only supports AES-128. 2432 DH2k is intended to provide acceptable security for low power 2433 applications, or for applications that require faster key 2434 negotiations. NIST asserts in Table 4 of [NIST-SP800-131A] that 2435 DH-2048 is safe to use through 2013. The security of DH2k can be 2436 augmented by implementing ZRTP's key continuity features 2437 (Section 16.1). DH2k SHOULD use AES-128. If an implementor must use 2438 slow hardware, DH2k should precede DH3k in the Hello message. 2440 ECDH-521 SHOULD NOT be used, due to disruptive computational delays. 2441 These delays may lead to exhaustion of the retransmission schedule, 2442 unless both endpoints have very fast hardware. Note that ECDH-521 is 2443 not part of NSA Suite B. 2445 ZRTP also defines two non-DH modes, Multistream and Preshared, in 2446 which the SRTP key is derived from a shared secret and some nonce 2447 material. 2449 The table below lists the pv length in words and DHPart1 and DHPart2 2450 message length in words for each Key Agreement Type Block. 2452 Key Agreement | pv | message | Meaning 2453 Type Block | words | words | 2454 ----------------------------------------------------------- 2455 "DH3k" | 96 | 117 | DH mode with p=3072 bit prime 2456 | | | per RFC 3526, Section 4. 2457 ----------------------------------------------------------- 2458 "DH2k" | 64 | 85 | DH mode with p=2048 bit prime 2459 | | | per RFC 3526, Section 3. 2460 ----------------------------------------------------------- 2461 "EC25" | 16 | 37 | Elliptic Curve DH, P-256 2462 | | | per RFC 5114, Section 2.6 2463 ----------------------------------------------------------- 2464 "EC38" | 24 | 45 | Elliptic Curve DH, P-384 2465 | | | per RFC 5114, Section 2.7 2466 ----------------------------------------------------------- 2467 "EC52" | 33 | 54 | Elliptic Curve DH, P-521 2468 | | | per RFC 5114, Section 2.8 2469 | | | (deprecated - do not use) 2470 ----------------------------------------------------------- 2471 "Prsh" | - | - | Preshared Non-DH mode 2472 ----------------------------------------------------------- 2473 "Mult" | - | - | Multistream Non-DH mode 2474 ----------------------------------------------------------- 2476 Table 5. Key Agreement Type Block Values 2478 5.1.6. SAS Type Block 2480 The SAS Type determines how the SAS is rendered to the user so that 2481 the user may verbally compare it with his partner over the voice 2482 channel. This allows detection of a MiTM attack. 2484 All ZRTP endpoints MUST support the base32 and MAY support the 2485 base256 rendering schemes for the Short Authentication String, and 2486 other SAS rendering schemes. See Section 4.5.2 for how the sasvalue 2487 is computed and Section 7 for how the SAS is used. 2489 SAS Type Block | Meaning 2490 --------------------------------------------------- 2491 "B32 " | Short Authentication String using 2492 | base32 encoding 2493 --------------------------------------------------- 2494 "B256" | Short Authentication String using 2495 | base256 encoding (PGP Word List) 2496 --------------------------------------------------- 2498 Table 6. SAS Type Block Values 2499 For the SAS Type of "B256", the most-significant (leftmost) 16 bits 2500 of the 32-bit sasvalue are rendered in network byte order using the 2501 PGP Word List [pgpwordlist] [Juola1][Juola2]. 2503 For the SAS Type of "B32 ", the most-significant (leftmost) 20 bits 2504 of the 32-bit sasvalue are rendered as a form of base32 encoding. 2505 The leftmost 20 bits of the sasvalue results in four base32 2506 characters that are rendered, most-significant quintet first, to both 2507 ZRTP endpoints. Here is a normative pseudocode implementation of the 2508 base32 function: 2510 char[4] base32(uint32 bits) 2511 { int i, n, shift; 2512 char result[4]; 2513 for (i=0,shift=27; i!=4; ++i,shift-=5) 2514 { n = (bits>>shift) & 31; 2515 result[i] = "ybndrfg8ejkmcpqxot1uwisza345h769"[n]; 2516 } 2517 return result; 2518 } 2520 This base32 encoding scheme differs from RFC 4648, and was designed 2521 (by Bryce Wilcox-O'Hearn) to represent bit sequences in a form that 2522 is convenient for human users to manipulate with minimal ambiguity. 2523 The unusually permuted character ordering was designed for other 2524 applications that use bit sequences that do not end on quintet 2525 boundaries. 2527 5.1.7. Signature Type Block 2529 The Signature Type Block specifies what signature algorithm is used 2530 to sign the SAS as discussed in Section 7.2. The 4-octet Signature 2531 Type Block, along with the accompanying signature block, are OPTIONAL 2532 and may be present in the Confirm message (Figure 10) or the SASrelay 2533 message (Figure 16). The signature types are given in the table 2534 below. 2536 Signature | Meaning 2537 Type Block | 2538 ------------------------------------------------ 2539 "PGP " | OpenPGP Signature, per RFC 4880 2540 | 2541 ------------------------------------------------ 2542 "X509" | ECDSA, with X.509v3 cert 2543 | per RFC 5759 and FIPS-186-3 2544 ------------------------------------------------ 2546 Table 7. Signature Type Block Values 2547 Additional details on the signature and signing key format may be 2548 found in Section 7.2. OpenPGP signatures (Signature Type "PGP ") are 2549 discussed in Section 7.2.1. The ECDSA curves are over prime fields 2550 only, drawn from Appendix D.1.2 of [FIPS-186-3]. X.509v3 ECDSA 2551 Signatures (Signature Type "X509") are discussed in Section 7.2.2. 2553 5.2. Hello Message 2555 The Hello message has the format shown in Figure 3. 2557 All ZRTP messages begin with the preamble value 0x505a, then a 16-bit 2558 length in 32-bit words. This length includes only the ZRTP message 2559 (including the preamble and the length) but not the ZRTP packet 2560 header or CRC. The 8-octet Message Type follows the length field. 2562 Next, there is a 4-character string containing the version (ver) of 2563 the ZRTP protocol, which is "1.10" for this specification. Next, 2564 there is the Client Identifier string (cid), which is 4 words long 2565 and identifies the vendor and release of the ZRTP software. The 2566 256-bit hash image H3 is defined in Section 9. The next parameter is 2567 the ZID, the 96-bit-long unique identifier for the ZRTP endpoint, 2568 defined in Section 4.9. 2570 The next four bits include three flag bits: 2572 o The Signature-capable flag (S) indicates this Hello message is 2573 sent from a ZRTP endpoint which is able to parse and verify 2574 digital signatures, as described in Section 7.2. If signatures 2575 are not supported, the (S) flag MUST be set to zero. 2577 o The MiTM flag (M) is a Boolean that is set to true if and only if 2578 this Hello message is sent from a device, usually a PBX, that has 2579 the capability to send an SASrelay message (Section 5.13). 2581 o The Passive flag (P) is a Boolean normally set to false, and is 2582 set to true if and only if this Hello message is sent from a 2583 device that is configured to never send a Commit message 2584 (Section 5.4). This would mean it cannot initiate secure 2585 sessions, but may act as a responder. 2587 The next 8 bits are unused and SHOULD be set to zero when sent and 2588 MUST be ignored on receipt. 2590 Next is a list of supported Hash algorithms, Cipher algorithms, SRTP 2591 Auth Tag Types, Key Agreement Types, and SAS Types. The number of 2592 listed algorithms are listed for each type: hc=hash count, cc=cipher 2593 count, ac=auth tag count, kc=key agreement count, and sc=sas count. 2594 The values for these algorithms are defined in Tables 2, 3, 4, 5, and 2595 6. A count of zero means that only the mandatory-to-implement 2596 algorithms are supported. Mandatory algorithms MAY be included in 2597 the list. The order of the list indicates the preferences of the 2598 endpoint. If a mandatory algorithm is not included in the list, it 2599 is implicitly added to the end of the list for preference. 2601 The 64-bit MAC at the end of the message is computed across the whole 2602 message, not including the MAC, using the MAC algorithm defined in 2603 Section 5.1.2.2. The MAC key is the sender's H2 (defined in 2604 Section 9), and thus the MAC cannot be checked by the receiving party 2605 until the sender's H2 value is known to the receiving party later in 2606 the protocol. 2608 0 1 2 3 2609 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2610 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2611 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length | 2612 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2613 | Message Type Block="Hello " (2 words) | 2614 | | 2615 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2616 | version="1.10" (1 word) | 2617 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2618 | | 2619 | Client Identifier (4 words) | 2620 | | 2621 | | 2622 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2623 | | 2624 | Hash image H3 (8 words) | 2625 | . . . | 2626 | | 2627 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2628 | | 2629 | ZID (3 words) | 2630 | | 2631 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2632 |0|S|M|P| unused (zeros)| hc | cc | ac | kc | sc | 2633 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2634 | hash algorithms (0 to 7 values) | 2635 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2636 | cipher algorithms (0 to 7 values) | 2637 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2638 | auth tag types (0 to 7 values) | 2639 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2640 | Key Agreement Types (0 to 7 values) | 2641 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2642 | SAS Types (0 to 7 values) | 2643 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2644 | MAC (2 words) | 2645 | | 2646 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2648 Figure 3: Hello Message Format 2650 5.3. HelloACK Message 2652 The HelloACK message is used to stop retransmissions of a Hello 2653 message. A HelloACK is sent regardless if the version number in the 2654 Hello is supported or the algorithm list supported. The receipt of a 2655 HelloACK stops retransmission of the Hello message. The format is 2656 shown in the figure below. A Commit message may be sent in place of 2657 a HelloACK by an Initiator, if a Commit message is ready to be sent 2658 promptly. 2660 0 1 2 3 2661 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2662 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2663 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 2664 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2665 | Message Type Block="HelloACK" (2 words) | 2666 | | 2667 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2669 Figure 4: HelloACK Message Format 2671 5.4. Commit Message 2673 The Commit message is sent to initiate the key agreement process 2674 after both sides have received a Hello message, which means it can 2675 only be sent after receiving both a Hello message and a HelloACK 2676 message. There are three subtypes of Commit messages, whose formats 2677 are shown in Figures 5, 6, and 7. 2679 The Commit message contains the Message Type Block, then the 256-bit 2680 hash image H2, which is defined in Section 9. The next parameter is 2681 the initiator's ZID, the 96-bit-long unique identifier for the ZRTP 2682 endpoint, which MUST have the same value as was used in the Hello 2683 message. 2685 Next, there is a list of algorithms selected by the initiator (hash, 2686 cipher, auth tag type, key agreement, sas type). For a DH Commit, 2687 the hash value hvi is a hash of the DHPart2 of the Initiator and the 2688 Responder's Hello message, as explained in Section 4.4.1.1. 2690 The 64-bit MAC at the end of the message is computed across the whole 2691 message, not including the MAC, using the MAC algorithm defined in 2692 Section 5.1.2.2. The MAC key is the sender's H1 (defined in 2693 Section 9), and thus the MAC cannot be checked by the receiving party 2694 until the sender's H1 value is known to the receiving party later in 2695 the protocol. 2697 0 1 2 3 2698 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2699 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2700 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=29 words | 2701 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2702 | Message Type Block="Commit " (2 words) | 2703 | | 2704 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2705 | | 2706 | Hash image H2 (8 words) | 2707 | . . . | 2708 | | 2709 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2710 | | 2711 | ZID (3 words) | 2712 | | 2713 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2714 | hash algorithm | 2715 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2716 | cipher algorithm | 2717 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2718 | auth tag type | 2719 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2720 | Key Agreement Type | 2721 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2722 | SAS Type | 2723 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2724 | | 2725 | hvi (8 words) | 2726 | . . . | 2727 | | 2728 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2729 | MAC (2 words) | 2730 | | 2731 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2733 Figure 5: DH Commit Message Format 2735 0 1 2 3 2736 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2737 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2738 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=25 words | 2739 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2740 | Message Type Block="Commit " (2 words) | 2741 | | 2742 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2743 | | 2744 | Hash image H2 (8 words) | 2745 | . . . | 2746 | | 2747 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2748 | | 2749 | ZID (3 words) | 2750 | | 2751 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2752 | hash algorithm | 2753 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2754 | cipher algorithm | 2755 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2756 | auth tag type | 2757 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2758 | Key Agreement Type = "Mult" | 2759 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2760 | SAS Type | 2761 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2762 | | 2763 | nonce (4 words) | 2764 | . . . | 2765 | | 2766 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2767 | MAC (2 words) | 2768 | | 2769 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2771 Figure 6: Multistream Commit Message Format 2773 0 1 2 3 2774 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2775 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2776 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=27 words | 2777 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2778 | Message Type Block="Commit " (2 words) | 2779 | | 2780 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2781 | | 2782 | Hash image H2 (8 words) | 2783 | . . . | 2784 | | 2785 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2786 | | 2787 | ZID (3 words) | 2788 | | 2789 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2790 | hash algorithm | 2791 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2792 | cipher algorithm | 2793 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2794 | auth tag type | 2795 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2796 | Key Agreement Type = "Prsh" | 2797 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2798 | SAS Type | 2799 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2800 | | 2801 | nonce (4 words) | 2802 | . . . | 2803 | | 2804 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2805 | keyID (2 words) | 2806 | | 2807 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2808 | MAC (2 words) | 2809 | | 2810 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2812 Figure 7: Preshared Commit Message Format 2814 5.5. DHPart1 Message 2816 The DHPart1 message shown in Figure 8 begins the DH exchange. It is 2817 sent by the Responder if a valid Commit message is received from the 2818 Initiator. The length of the pvr value and the length of the DHPart1 2819 message depends on the Key Agreement Type chosen. This information 2820 is contained in the table in Section 5.1.5. Note that for both 2821 Multistream and Preshared modes, no DHPart1 or DHPart2 message will 2822 be sent. 2824 The 256-bit hash image H1 is defined in Section 9. 2826 The next four parameters are non-invertible hashes (computed in 2827 Section 4.3.1) of potential shared secrets used in generating the 2828 ZRTP secret s0. The first two, rs1IDr and rs2IDr, are the hashes of 2829 the responder's two retained shared secrets, truncated to 64 bits. 2830 Next, there is auxsecretIDr, a hash of the responder's auxsecret 2831 (defined in Section 4.3), truncated to 64 bits. The last parameter 2832 is a hash of the trusted MiTM PBX shared secret pbxsecret, defined in 2833 Section 7.3.1. 2835 The 64-bit MAC at the end of the message is computed across the whole 2836 message, not including the MAC, using the MAC algorithm defined in 2837 Section 5.1.2.2. The MAC key is the sender's H0 (defined in 2838 Section 9), and thus the MAC cannot be checked by the receiving party 2839 until the sender's H0 value is known to the receiving party later in 2840 the protocol. 2842 0 1 2 3 2843 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2844 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2845 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on KA Type | 2846 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2847 | Message Type Block="DHPart1 " (2 words) | 2848 | | 2849 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2850 | | 2851 | Hash image H1 (8 words) | 2852 | . . . | 2853 | | 2854 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2855 | rs1IDr (2 words) | 2856 | | 2857 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2858 | rs2IDr (2 words) | 2859 | | 2860 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2861 | auxsecretIDr (2 words) | 2862 | | 2863 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2864 | pbxsecretIDr (2 words) | 2865 | | 2866 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2867 | | 2868 | pvr (length depends on KA Type) | 2869 | . . . | 2870 | | 2871 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2872 | MAC (2 words) | 2873 | | 2874 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2876 Figure 8: DHPart1 Message Format 2878 5.6. DHPart2 Message 2880 The DHPart2 message, shown in Figure 9, completes the DH exchange. 2881 It is sent by the Initiator if a valid DHPart1 message is received 2882 from the Responder. The length of the pvi value and the length of 2883 the DHPart2 message depends on the Key Agreement Type chosen. This 2884 information is contained in the table in Section 5.1.5. Note that 2885 for both Multistream and Preshared modes, no DHPart1 or DHPart2 2886 message will be sent. 2888 The 256-bit hash image H1 is defined in Section 9. 2890 The next four parameters are non-invertible hashes (computed in 2891 Section 4.3.1) of potential shared secrets used in generating the 2892 ZRTP secret s0. The first two, rs1IDi and rs2IDi, are the hashes of 2893 the initiator's two retained shared secrets, truncated to 64 bits. 2894 Next, there is auxsecretIDi, a hash of the initiator's auxsecret 2895 (defined in Section 4.3), truncated to 64 bits. The last parameter 2896 is a hash of the trusted MiTM PBX shared secret pbxsecret, defined in 2897 Section 7.3.1. 2899 The 64-bit MAC at the end of the message is computed across the whole 2900 message, not including the MAC, using the MAC algorithm defined in 2901 Section 5.1.2.2. The MAC key is the sender's H0 (defined in 2902 Section 9), and thus the MAC cannot be checked by the receiving party 2903 until the sender's H0 value is known to the receiving party later in 2904 the protocol. 2906 0 1 2 3 2907 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2908 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2909 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on KA Type | 2910 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2911 | Message Type Block="DHPart2 " (2 words) | 2912 | | 2913 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2914 | | 2915 | Hash image H1 (8 words) | 2916 | . . . | 2917 | | 2918 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2919 | rs1IDi (2 words) | 2920 | | 2921 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2922 | rs2IDi (2 words) | 2923 | | 2924 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2925 | auxsecretIDi (2 words) | 2926 | | 2927 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2928 | pbxsecretIDi (2 words) | 2929 | | 2930 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2931 | | 2932 | pvi (length depends on KA Type) | 2933 | . . . | 2934 | | 2935 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2936 | MAC (2 words) | 2937 | | 2938 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2940 Figure 9: DHPart2 Message Format 2942 5.7. Confirm1 and Confirm2 Messages 2944 The Confirm1 message is sent by the Responder in response to a valid 2945 DHPart2 message after the SRTP session key and parameters have been 2946 negotiated. The Confirm2 message is sent by the Initiator in 2947 response to a Confirm1 message. The format is shown in Figure 10. 2948 The message contains the Message Type Block "Confirm1" or "Confirm2". 2949 Next, there is the confirm_mac, a MAC computed over the encrypted 2950 part of the message (shown enclosed by "====" in Figure 10). This 2951 confirm_mac is keyed and computed according to Section 4.6. The next 2952 16 octets contain the CFB Initialization Vector. The rest of the 2953 message is encrypted using CFB and protected by the confirm_mac. 2955 The first field inside the encrypted region is the hash preimage H0, 2956 which is defined in detail in Section 9. 2958 The next 15 bits are not used and SHOULD be set to zero when sent and 2959 MUST be ignored when received in Confirm1 or Confirm2 messages. 2961 The next 9 bits contain the signature length. If no SAS signature 2962 (described in Section 7.2) is present, all bits are set to zero. The 2963 signature length is in words and includes the signature type block. 2964 If the calculated signature octet count is not a multiple of 4, zeros 2965 are added to pad it out to a word boundary. If no signature is 2966 present, the overall length of the Confirm1 or Confirm2 message will 2967 be set to 19 words. 2969 The next 8 bits are used for flags. Undefined flags are set to zero 2970 and ignored. Four flags are currently defined. The PBX Enrollment 2971 flag (E) is a Boolean bit defined in Section 7.3.1. The SAS Verified 2972 flag (V) is a Boolean bit defined in Section 7.1. The Allow Clear 2973 flag (A) is a Boolean bit defined in Section 4.7.2. The Disclosure 2974 Flag (D) is a Boolean bit defined in Section 11. The cache 2975 expiration interval is defined in Section 4.9. 2977 If the signature length (in words) is non-zero, a signature type 2978 block will be present along with a signature block. Next, there is 2979 the signature block. The signature block includes the signature and 2980 the key (or a link to the key) used to generate the signature 2981 (Section 7.2). 2983 CFB mode [NIST-SP800-38A] is applied with a feedback length of 128 2984 bits, a full cipher block, and the final block is truncated to match 2985 the exact length of the encrypted data. The CFB Initialization 2986 Vector is a 128-bit random nonce. The block cipher algorithm and the 2987 key size are the same as the negotiated block cipher (Section 5.1.3) 2988 for media encryption. CFB is used to encrypt the part of the 2989 Confirm1 message beginning after the CFB IV to the end of the message 2990 (the encrypted region is enclosed by "====" in Figure 10). 2992 The responder uses the zrtpkeyr to encrypt the Confirm1 message. The 2993 initiator uses the zrtpkeyi to encrypt the Confirm2 message. 2995 0 1 2 3 2996 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2997 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2998 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=variable | 2999 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3000 | Message Type Block="Confirm1" or "Confirm2" (2 words) | 3001 | | 3002 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3003 | confirm_mac (2 words) | 3004 | | 3005 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3006 | | 3007 | CFB Initialization Vector (4 words) | 3008 | | 3009 | | 3010 +===============================================================+ 3011 | | 3012 | Hash preimage H0 (8 words) | 3013 | . . . | 3014 | | 3015 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3016 | Unused (15 bits of zeros) | sig len (9 bits)|0 0 0 0|E|V|A|D| 3017 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3018 | cache expiration interval (1 word) | 3019 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3020 | optional signature type block (1 word if present) | 3021 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3022 | | 3023 | optional signature block (variable length) | 3024 | . . . | 3025 | | 3026 | | 3027 +===============================================================+ 3029 Figure 10: Confirm1 and Confirm2 Message Format 3031 5.8. Conf2ACK Message 3033 The Conf2ACK message is sent by the Responder in response to a valid 3034 Confirm2 message. The message format for the Conf2ACK is shown in 3035 the figure below. The receipt of a Conf2ACK stops retransmission of 3036 the Confirm2 message. Note that the first SRTP media (with a valid 3037 SRTP auth tag) from the responder also stops retransmission of the 3038 Confirm2 message. 3040 0 1 2 3 3041 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3042 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3043 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 3044 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3045 | Message Type Block="Conf2ACK" (2 words) | 3046 | | 3047 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3049 Figure 11: Conf2ACK Message Format 3051 5.9. Error Message 3053 The Error message is sent to terminate an in-process ZRTP key 3054 agreement exchange due to an error. The format is shown in the 3055 figure below. The use of the Error message is described in 3056 Section 4.7.1. 3058 0 1 2 3 3059 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3060 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3061 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=4 words | 3062 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3063 | Message Type Block="Error " (2 words) | 3064 | | 3065 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3066 | Integer Error Code (1 word) | 3067 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3069 Figure 12: Error Message Format 3071 Defined hexadecimal values for the Error Code are listed in the table 3072 below. 3074 Error Code | Meaning 3075 ----------------------------------------------------------- 3076 0x10 | Malformed packet (CRC OK, but wrong structure) 3077 ----------------------------------------------------------- 3078 0x20 | Critical software error 3079 ----------------------------------------------------------- 3080 0x30 | Unsupported ZRTP version 3081 ----------------------------------------------------------- 3082 0x40 | Hello components mismatch 3083 ----------------------------------------------------------- 3084 0x51 | Hash Type not supported 3085 ----------------------------------------------------------- 3086 0x52 | Cipher Type not supported 3087 ----------------------------------------------------------- 3088 0x53 | Public key exchange not supported 3089 ----------------------------------------------------------- 3090 0x54 | SRTP auth tag not supported 3091 ----------------------------------------------------------- 3092 0x55 | SAS rendering scheme not supported 3093 ----------------------------------------------------------- 3094 0x56 | No shared secret available, DH mode required 3095 ----------------------------------------------------------- 3096 0x61 | DH Error: bad pvi or pvr ( == 1, 0, or p-1) 3097 ----------------------------------------------------------- 3098 0x62 | DH Error: hvi != hashed data 3099 ----------------------------------------------------------- 3100 0x63 | Received relayed SAS from untrusted MiTM 3101 ----------------------------------------------------------- 3102 0x70 | Auth Error: Bad Confirm pkt MAC 3103 ----------------------------------------------------------- 3104 0x80 | Nonce reuse 3105 ----------------------------------------------------------- 3106 0x90 | Equal ZIDs in Hello 3107 ----------------------------------------------------------- 3108 0x91 | SSRC collision 3109 ----------------------------------------------------------- 3110 0xA0 | Service unavailable 3111 ----------------------------------------------------------- 3112 0xB0 | Protocol timeout error 3113 ----------------------------------------------------------- 3114 0x100 | GoClear message received, but not allowed 3115 ----------------------------------------------------------- 3117 Table 8. ZRTP Error Codes 3119 5.10. ErrorACK Message 3121 The ErrorACK message is sent in response to an Error message. The 3122 receipt of an ErrorACK stops retransmission of the Error message. 3123 The format is shown in the figure below. 3125 0 1 2 3 3126 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3127 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3128 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 3129 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3130 | Message Type Block="ErrorACK" (2 words) | 3131 | | 3132 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3134 Figure 13: ErrorACK Message Format 3136 5.11. GoClear Message 3138 Support for the GoClear message is OPTIONAL in the protocol, and it 3139 is sent to switch from SRTP to RTP. The format is shown in the 3140 figure below. The clear_mac is used to authenticate the GoClear 3141 message so that bogus GoClear messages introduced by an attacker can 3142 be detected and discarded. The use of GoClear is described in 3143 Section 4.7.2. 3145 0 1 2 3 3146 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3147 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3148 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=5 words | 3149 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3150 | Message Type Block="GoClear " (2 words) | 3151 | | 3152 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3153 | clear_mac (2 words) | 3154 | | 3155 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3157 Figure 14: GoClear Message Format 3159 5.12. ClearACK Message 3161 Support for the ClearACK message is OPTIONAL in the protocol, and it 3162 is sent to acknowledge receipt of a GoClear. A ClearACK is only sent 3163 if the clear_mac from the GoClear message is authenticated. 3164 Otherwise, no response is returned. The format is shown in the 3165 figure below. 3167 0 1 2 3 3168 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3169 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3170 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 3171 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3172 | Message Type Block="ClearACK" (2 words) | 3173 | | 3174 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3176 Figure 15: ClearACK Message Format 3178 5.13. SASrelay Message 3180 The SASrelay message is sent by a trusted MiTM, most often a PBX. It 3181 is not sent as a response to a packet, but is sent as a self- 3182 initiated packet by the trusted MiTM (Section 7.3). It can only be 3183 sent after the rest of the ZRTP key negotiations have completed, 3184 after the Confirm messages and their ACKs. It can only be sent after 3185 the trusted MiTM has finished key negotiations with the other party, 3186 because it is the other party's SAS that is being relayed. It is 3187 sent with retry logic until a RelayACK message (Section 5.14) is 3188 received or the retry schedule has been exhausted. 3190 If a device, usually a PBX, sends an SASrelay message, it MUST have 3191 previously declared itself as a MiTM device by setting the MiTM (M) 3192 flag in the Hello message (Section 5.2). If the receiver of the 3193 SASrelay message did not previously receive a Hello message with the 3194 MiTM (M) flag set, the Relayed SAS SHOULD NOT be rendered. A 3195 RelayACK is still sent, but no Error message is sent. 3197 The SASrelay message format is shown in Figure 16. The message 3198 contains the Message Type Block "SASrelay". Next, there is a MAC 3199 computed over the encrypted part of the message (shown enclosed by 3200 "====" in Figure 16). This MAC is keyed the same way as the 3201 confirm_mac in the Confirm messages (see Section 4.6). The next 16 3202 octets contain the CFB Initialization Vector. The rest of the 3203 message is encrypted using CFB and protected by the MAC. 3205 The next 15 bits are not used and SHOULD be set to zero when sent, 3206 and they MUST be ignored when received in SASrelay messages. 3208 The next 9 bits contain the signature length. The trusted MiTM MAY 3209 compute a digital signature on the SAS hash, as described in 3210 Section 7.2, using a persistent signing key owned by the trusted 3211 MiTM. If no SAS signature is present, all bits are set to zero. The 3212 signature length is in words and includes the signature type block. 3213 If the calculated signature octet count is not a multiple of 4, zeros 3214 are added to pad it out to a word boundary. If no signature block is 3215 present, the overall length of the SASrelay message will be set to 19 3216 words. 3218 The next 8 bits are used for flags. Undefined flags are set to zero 3219 and ignored. Three flags are currently defined. The Disclosure Flag 3220 (D) is a Boolean bit defined in Section 11. The Allow Clear flag (A) 3221 is a Boolean bit defined in Section 4.7.2. The SAS Verified flag (V) 3222 is a Boolean bit defined in Section 7.1. These flags are updated 3223 values to the same flags provided earlier in the Confirm message, but 3224 they are updated to reflect the new flag information relayed by the 3225 PBX from the other party. 3227 The relayed V flag comes from the ZRTP endpoint on the other side of 3228 the PBX. If this relayed V flag is zero, the local ZRTP user agent 3229 should render a conspicuous display of the SAS to prompt the human to 3230 verbally verify it. However, a relayed V flag should not affect the 3231 local V flag, unlike the V flag received in the Confirm message. 3233 The next 32-bit word contains the SAS rendering scheme for the 3234 relayed sashash, which will be the same rendering scheme used by the 3235 other party on the other side of the trusted MiTM. Section 7.3 3236 describes how the PBX determines whether the ZRTP client regards the 3237 PBX as a trusted MiTM. If the PBX determines that the ZRTP client 3238 trusts the PBX, the next 8 words contain the sashash relayed from the 3239 other party. The first 32-bit word of the sashash contains the 3240 sasvalue, which may be rendered to the user using the specified SAS 3241 rendering scheme. If this SASrelay message is being sent to a ZRTP 3242 client that does not trust this MiTM, the sashash will be ignored by 3243 the recipient and should be set to zeros by the PBX. 3245 If the signature length (in words) is non-zero, a signature type 3246 block will be present along with a signature block. Next, there is 3247 the signature block. The signature block includes the signature and 3248 the key (or a link to the key) used to generate the signature 3249 (Section 7.2). 3251 CFB mode [NIST-SP800-38A] is applied with a feedback length of 128 3252 bits, a full cipher block, and the final block is truncated to match 3253 the exact length of the encrypted data. The CFB Initialization 3254 Vector is a 128-bit random nonce. The block cipher algorithm and the 3255 key size is same as the negotiated block cipher (Section 5.1.3) for 3256 media encryption. CFB is used to encrypt the part of the SASrelay 3257 message beginning after the CFB IV to the end of the message (the 3258 encrypted region is enclosed by "====" in Figure 16). 3260 Depending on whether the trusted MiTM had taken the role of the 3261 initiator or the responder during the ZRTP key negotiation, the 3262 SASrelay message is encrypted with zrtpkeyi or zrtpkeyr. 3264 0 1 2 3 3265 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3266 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3267 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=variable | 3268 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3269 | Message Type Block="SASrelay" (2 words) | 3270 | | 3271 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3272 | MAC (2 words) | 3273 | | 3274 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3275 | | 3276 | CFB Initialization Vector (4 words) | 3277 | | 3278 | | 3279 +===============================================================+ 3280 | Unused (15 bits of zeros) | sig len (9 bits)|0 0 0 0|0|V|A|D| 3281 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3282 | rendering scheme of relayed SAS (1 word) | 3283 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3284 | | 3285 | Trusted MiTM relayed sashash (8 words) | 3286 | . . . | 3287 | | 3288 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3289 | optional signature type block (1 word if present) | 3290 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3291 | | 3292 | optional signature block (variable length) | 3293 | . . . | 3294 | | 3295 | | 3296 +===============================================================+ 3298 Figure 16: SASrelay Message Format 3300 5.14. RelayACK Message 3302 The RelayACK message is sent in response to a valid SASrelay message. 3303 The message format for the RelayACK is shown in the figure below. 3304 The receipt of a RelayACK stops retransmission of the SASrelay 3305 message. 3307 0 1 2 3 3308 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3309 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3310 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 3311 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3312 | Message Type Block="RelayACK" (2 words) | 3313 | | 3314 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3316 Figure 17: RelayACK Message Format 3318 5.15. Ping Message 3320 The Ping and PingACK messages are unrelated to the rest of the ZRTP 3321 protocol. No ZRTP endpoint is required to generate a Ping message, 3322 but every ZRTP endpoint MUST respond to a Ping message with a PingACK 3323 message. 3325 Although Ping and PingACK messages have no effect on the rest of the 3326 ZRTP protocol, their inclusion in this specification simplifies the 3327 design of "bump-in-the-wire" ZRTP proxies (Section 10) (notably, 3328 [Zfone]). It enables proxies to be designed that do not rely on 3329 assistance from the signaling layer to map out the associations 3330 between media streams and ZRTP endpoints. 3332 Before sending a ZRTP Hello message, a ZRTP proxy MAY send a Ping 3333 message as a means to sort out which RTP media streams are connected 3334 to particular ZRTP endpoints. Ping messages are generated only by 3335 ZRTP proxies. If neither party is a ZRTP proxy, no Ping messages 3336 will be encountered. Ping retransmission behavior is discussed in 3337 Section 6. 3339 The Ping message (Figure 18) contains an "EndpointHash", defined in 3340 Section 5.16. 3342 The Ping message contains a version number that defines what version 3343 of PingACK is requested. If that version number is supported by the 3344 Ping responder, a PingACK with a format that matches that version 3345 will be received. Otherwise, a PingACK with a lower version number 3346 may be received. 3348 0 1 2 3 3349 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3350 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3351 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=6 words | 3352 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3353 | Message Type Block="Ping " (2 words) | 3354 | | 3355 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3356 | version="1.10" (1 word) | 3357 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3358 | EndpointHash (2 words) | 3359 | | 3360 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3362 Figure 18: Ping Message Format 3364 5.15.1. Rationale for Ping messages 3366 Ping messages are useful for implementing ZRTP proxies. A ZRTP proxy 3367 (Section 10) is a "bump-in-the-wire" that sits between a (usually 3368 non-ZRTP-enabled) VoIP client and the Internet. It attempts to 3369 secure the VoIP call by examining the RTP media streams, detecting 3370 the call, and intervening to encrypt the call "on the fly". 3372 This is not always easy to do, as it may have to be done without help 3373 from the signaling layer. The VoIP client may make internal 3374 decisions on how to do NAT traversal, which are not readily apparent 3375 to the proxy. The proxy has to reverse engineer this knowledge by 3376 inspecting all the RTP streams. The RTP stream from Alice to Bob 3377 might not follow the same path, through the same ports, as the RTP 3378 stream from Bob to Alice. One stream may go directly peer to peer, 3379 while the reverse stream may take a detour through a media relay. 3380 The two parties may have both audio and video streams between them, 3381 and may also be simultaneously talking to others in a conference 3382 call, and some of those parties may be behind the same PBX. All of 3383 these RTP streams have to be sorted out and associated with the 3384 correct ZRTP endpoints. Related audio and video streams have to be 3385 matched up between two parties, and not confused with other streams 3386 to nearby parties behind the same PBX. Ping and PingACK messages 3387 make this possible. 3389 5.16. PingACK Message 3391 A PingACK message is sent only in response to a Ping. A ZRTP 3392 endpoint MUST respond to a Ping with a PingACK message. The version 3393 of PingACK requested is contained in the Ping message. If that 3394 version number is supported, a PingACK with a format that matches 3395 that version MUST be sent. Otherwise, if the version number of the 3396 Ping is not supported, a PingACK SHOULD be sent in the format of the 3397 highest supported version known to the Ping responder. Only version 3398 "1.10" is supported in this specification. 3400 The PingACK message carries its own 64-bit EndpointHash, distinct 3401 from the EndpointHash of the other party's Ping message. It is 3402 REQUIRED that it be highly improbable for two participants in a call 3403 to have the same EndpointHash and that an EndpointHash maintains a 3404 persistent value between calls. For a normal ZRTP endpoint, such as 3405 a ZRTP-enabled VoIP client, the EndpointHash can be just the 3406 truncated ZID. For a ZRTP endpoint such as a PBX that has multiple 3407 endpoints behind it, the EndpointHash must be a distinct value for 3408 each endpoint behind it. It is recommended that the EndpointHash be 3409 a truncated hash of the ZID of the ZRTP endpoint concatenated with 3410 something unique about the actual endpoint or phone behind the PBX. 3411 This may be the SIP URI of the phone, the PBX extension number, or 3412 the local IP address of the phone, whichever is more readily 3413 available in the application environment: 3415 EndpointHash = hash(ZID || SIP URI of the endpoint) 3417 EndpointHash = hash(ZID || PBX extension number of the endpoint) 3419 EndpointHash = hash(ZID || local IP address of the endpoint) 3421 Any of these formulae confer uniqueness for the simple case of 3422 terminating the ZRTP connection at the VoIP client, or the more 3423 complex case of a PBX terminating the ZRTP connection for multiple 3424 VoIP phones in a conference call, all sharing the PBX's ZID, but with 3425 separate IP addresses behind the PBX. There is no requirement for 3426 the same hash function to be used by both parties. 3428 The PingACK message contains the EndpointHash of the sender of the 3429 PingACK as well as the EndpointHash of the sender of the Ping. The 3430 Source Identifier (SSRC) received in the ZRTP header from the Ping 3431 packet (Figure 2) is copied into the PingACK message body 3432 (Figure 19). This SSRC is not the SSRC of the sender of the PingACK. 3434 0 1 2 3 3435 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3436 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3437 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=9 words | 3438 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3439 | Message Type Block="PingACK " (2 words) | 3440 | | 3441 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3442 | version="1.10" (1 word) | 3443 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3444 | EndpointHash of PingACK Sender (2 words) | 3445 | | 3446 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3447 | EndpointHash of Received Ping (2 words) | 3448 | | 3449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3450 | Source Identifier (SSRC) of Received Ping (1 word) | 3451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3453 Figure 19: PingACK Message Format 3455 6. Retransmissions 3457 ZRTP uses two retransmission timers T1 and T2. T1 is used for 3458 retransmission of Hello messages, when the support of ZRTP by the 3459 other endpoint may not be known. T2 is used in retransmissions of 3460 all the other ZRTP messages. 3462 All message retransmissions MUST be identical to the initial message 3463 including nonces, public values, etc; otherwise, hashes of the 3464 message sequences may not agree. 3466 Practical experience has shown that RTP packet loss at the start of 3467 an RTP session can be extremely high. Since the entire ZRTP message 3468 exchange occurs during this period, the defined retransmission scheme 3469 is defined to be aggressive. Since ZRTP packets with the exception 3470 of the DHPart1 and DHPart2 messages are small, this should have 3471 minimal effect on overall bandwidth utilization of the media session. 3473 ZRTP endpoints MUST NOT exceed the bandwidth of the resulting media 3474 session as determined by the offer/answer exchange in the signaling 3475 layer. 3477 The Ping message (Section 5.15) may follow the same retransmission 3478 schedule as the Hello message, but this is not required in this 3479 specification. Ping message retransmission is subject to 3480 application-specific ZRTP proxy heuristics. 3482 Hello ZRTP messages are retransmitted at an interval that starts at 3483 T1 seconds and doubles after every retransmission, capping at 200 ms. 3484 T1 has a recommended initial value of 50 ms. A Hello message is 3485 retransmitted 20 times before giving up, which means the entire retry 3486 schedule for Hello messages is exhausted after 3.75 seconds (50 + 100 3487 + 18*200 ms). Retransmission of a Hello ends upon receipt of a 3488 HelloACK or Commit message. 3490 The post-Hello ZRTP messages are retransmitted only by the session 3491 initiator -- that is, only Commit, DHPart2, and Confirm2 are 3492 retransmitted if the corresponding message from the responder, 3493 DHPart1, Confirm1, and Conf2ACK, are not received. Note that the 3494 Confirm2 message retransmission can also be stopped by receiving the 3495 first SRTP media (with a valid SRTP auth tag) from the responder. 3497 The GoClear, Error, and SASrelay messages may be initiated and 3498 retransmitted by either party, and responded to by the other party, 3499 regardless of which party is the overall session initiator. They are 3500 retransmitted if the corresponding response message ClearACK, 3501 ErrorACK, and RelayACK are not received. 3503 Non-Hello (and non-Ping) ZRTP messages are retransmitted at an 3504 interval that starts at T2 seconds and doubles after every 3505 retransmission, capping at 1200 ms. T2 has a recommended initial 3506 value of 150 ms. Each non-Hello message is retransmitted 10 times 3507 before giving up, which means the entire retry schedule is exhausted 3508 after 9.45 seconds (150 + 300 + 600 + 7*1200 ms). Only the initiator 3509 performs retransmissions. Each message has a response message that 3510 stops retransmissions, as shown in the table below. The higher 3511 values of T2 means that retransmissions will likely occur only in the 3512 event of packet loss. 3514 Message Acknowledgement Message 3515 ------- ----------------------- 3516 Hello HelloACK or Commit 3517 Commit DHPart1 or Confirm1 3518 DHPart2 Confirm1 3519 Confirm2 Conf2ACK or SRTP media 3520 GoClear ClearACK 3521 Error ErrorACK 3522 SASrelay RelayACK 3523 Ping PingACK 3525 Table 9. Retransmitted ZRTP Messages and Responses 3527 The retry schedule must handle not only packet loss, but also slow or 3528 heavily loaded peers that need additional time to perform their DH 3529 calculations. The following mitigations are recommended: 3531 o Slow or heavily loaded ZRTP endpoints that are at risk of taking 3532 too long to perform their DH calculation SHOULD use a HelloACK 3533 message instead of a Commit message to reply to a Hello from the 3534 other party. 3536 o If a ZRTP endpoint has evidence that the other party is a ZRTP 3537 endpoint, by receiving a Hello message or Ping message, or by 3538 receiving a Hello Hash in the signaling layer, it SHOULD extend 3539 its own Hello retry schedule to span at least 12 seconds of 3540 retries. If this extended Hello retry schedule is exhausted 3541 without receiving a HelloACK or Commit message, a late Commit 3542 message from the peer SHOULD still be accepted. 3544 These recommended retransmission intervals are designed for a typical 3545 broadband Internet connection. In some high-latency communication 3546 channels, such as those provided by some mobile phone environments or 3547 geostationary satellites, a different retransmission schedule may be 3548 used. The initial value for the T1 or T2 retransmission timer should 3549 be increased to be no less than the round-trip time provided by the 3550 communications channel. It should take into account the time 3551 required to transmit the entire message and the entire reply, as well 3552 as a reasonable time estimate to perform the DH calculation. 3554 ZRTP has its own retransmission schedule because it is carried along 3555 with RTP, usually over UDP. In unusual cases, RTP can run over a 3556 non-UDP transport, such as TCP or DCCP, which provides its own built- 3557 in retransmission mechanism. It may be hard for the ZRTP endpoint to 3558 detect that TCP is being used if media relays are involved. The ZRTP 3559 endpoint may be sending only UDP, but there may be a media relay 3560 along the media path that converts from UDP to TCP for part of the 3561 journey. Or, if the ZRTP endpoint is sending TCP, the media relay 3562 might be converting from TCP to UDP. There have been empirical 3563 observations of this in the wild. In cases where TCP is used, ZRTP 3564 and TCP might together generate some extra retransmissions. It is 3565 tempting to avoid this effect by eliminating the ZRTP retransmission 3566 schedule when connected to a TCP channel, but that would risk failure 3567 of the protocol, because it may not be TCP all the way to the remote 3568 ZRTP endpoint. It only takes a few packets to complete a ZRTP 3569 exchange, so trying to optimize out the extra retransmissions in that 3570 scenario is not worth the risk. 3572 After receiving a Commit message, but before receiving a Confirm2 3573 message, if a ZRTP responder receives no ZRTP messages for more than 3574 10 seconds, the responder MAY send a protocol timeout Error message 3575 and terminate the ZRTP protocol. 3577 7. Short Authentication String 3579 This section will discuss the implementation of the Short 3580 Authentication String, or SAS in ZRTP. The SAS can be verbally 3581 compared by the human users reading the string aloud, or it can be 3582 compared by validating an OPTIONAL digital signature (described in 3583 Section 7.2) exchanged in the Confirm1 or Confirm2 messages. 3585 The use of hash commitment in the DH exchange (Section 4.4.1.1) 3586 constrains the attacker to only one guess to generate the correct SAS 3587 in his attack, which means the SAS can be quite short. A 16-bit SAS, 3588 for example, provides the attacker only one chance out of 65536 of 3589 not being detected. How the hash commitment enables the SAS to be so 3590 short is explained in Section 4.4.1.1. 3592 There is only one SAS value computed per call. That is the SAS value 3593 for the first media stream established, which is calculated in 3594 Section 4.5.2. This SAS applies to all media streams for the same 3595 session. 3597 The SAS SHOULD be rendered to the user for authentication. The 3598 rendering of the SAS value through the user interface at both 3599 endpoints depends on the SAS Type agreed upon in the Commit message. 3600 See Section 5.1.6 for a description of how the SAS is rendered to the 3601 user. 3603 The SAS is not treated as a secret value, but it must be compared to 3604 see if it matches at both ends of the communications channel. The 3605 two users verbally compare it using their human voices, human ears, 3606 and human judgement. If it doesn't match, it indicates the presence 3607 of a MiTM attack. 3609 It is worse than useless and absolutely unsafe to rely on a robot 3610 voice from the remote endpoint to compare the SAS, because a robot 3611 voice can be trivially forged by a MiTM. The SAS verbal comparison 3612 can only be done with a real live human at the remote endpoint. 3614 7.1. SAS Verified Flag 3616 The SAS Verified flag (V) is set based on the user indicating that 3617 SAS comparison has been successfully performed. The SAS Verified 3618 flag is exchanged securely in the Confirm1 and Confirm2 messages 3619 (Figure 10) of the next session. In other words, each party sends 3620 the SAS Verified flag from the previous session in the Confirm 3621 message of the current session. It is perfectly reasonable to have a 3622 ZRTP endpoint that never sets the SAS Verified flag, because it would 3623 require adding complexity to the user interface to allow the user to 3624 set it. The SAS Verified flag is not required to be set, but if it 3625 is available to the client software, it allows for the possibility 3626 that the client software could render to the user that the SAS verify 3627 procedure was carried out in a previous session. 3629 Regardless of whether there is a user interface element to allow the 3630 user to set the SAS Verified flag, it is worth caching a shared 3631 secret, because doing so reduces opportunities for an attacker in the 3632 next call. 3634 If at any time the users carry out the SAS comparison procedure, and 3635 it actually fails to match, then this indicates a very resourceful 3636 MiTM. If the SAS comparison fails on the very first call, that would 3637 indicate an attacker who had some foresight, agility, and fortuitous 3638 positioning, but he is still caught by the SAS comparison. If the 3639 MiTM misses the first call and attacks later, this will trigger a 3640 cache mismatch alarm. If the SAS fails to match without a cache 3641 mismatch alarm, it means the MiTM knows the cached shared secret. 3642 This either implies the MiTM attacker has somehow stolen the cached 3643 shared secret from one of the two parties, or it implies the MiTM 3644 must have been present in all the previous sessions, since the 3645 initial establishment of the first shared secret. This is indeed a 3646 resourceful attacker. It also means that if at any time he ceases 3647 his participation as a MiTM on one of the calls, the protocol will 3648 detect that the cached shared secret is no longer valid -- because it 3649 was really two different shared secrets all along, one of them 3650 between Alice and the attacker, and the other between the attacker 3651 and Bob. The continuity of the cached shared secrets makes it 3652 possible to detect the MiTM when he inserts himself into the ongoing 3653 relationship, as well as when he leaves. Also, if the attacker tries 3654 to stay with a long lineage of calls, but fails to execute a DH MiTM 3655 attack for even one missed call, he is permanently excluded. He can 3656 no longer resynchronize with the chain of cached shared secrets. 3657 This is discussed further in Section 15.1. 3659 A user interface element (i.e., a checkbox or button) is needed to 3660 allow the user to tell the software the SAS verify was successful, 3661 causing the software to set the SAS Verified flag (V), which 3662 (together with our cached shared secret) obviates the need to perform 3663 the SAS procedure in the next call. An additional user interface 3664 element can be provided to let the user tell the software he detected 3665 an actual SAS mismatch, which indicates a MiTM attack. The software 3666 can then take appropriate action, clearing the SAS Verified flag, and 3667 erase the cached shared secret from this session. It is up to the 3668 implementer to decide if this added user interface complexity is 3669 warranted. 3671 If the SAS matches, it means there is no MiTM, which also implies it 3672 is now safe to trust a cached shared secret for later calls. If 3673 inattentive users don't bother to check the SAS, it means we don't 3674 know whether there is or is not a MiTM, so even if we do establish a 3675 new cached shared secret, there is a risk that our potential attacker 3676 may have a subsequent opportunity to continue inserting himself in 3677 the call, until we finally get around to checking the SAS. If the 3678 SAS matches, it means no attacker was present for any previous 3679 session since we started propagating cached shared secrets, because 3680 this session and all the previous sessions were also authenticated 3681 with a continuous lineage of shared secrets. 3683 7.2. Signing the SAS 3685 In most applications, it is desirable to avoid the added complexity 3686 of a PKI-backed digital signature, which is why ZRTP is designed not 3687 to require it. Nonetheless, in some applications, it may be hard to 3688 arrange for two human users to verbally compare the SAS. Or, an 3689 application may already be using an existing PKI and wants to use it 3690 to augment ZRTP. 3692 To handle these cases, ZRTP allows for an OPTIONAL signature feature, 3693 which allows the SAS to be checked without human participation. The 3694 SAS MAY be signed and the signature sent inside the Confirm1, 3695 Confirm2 (Figure 10), or SASrelay (Figure 16) messages. The 3696 signature type (Section 5.1.7), length of the signature, and the key 3697 used to create the signature (or a link to it) are all sent along 3698 with the signature. The signature is calculated across the entire 3699 SAS hash result (sashash), from which the sasvalue was derived. The 3700 signatures exchanged in the encrypted Confirm1, Confirm2, or SASrelay 3701 messages MAY be used to authenticate the ZRTP exchange. A signature 3702 may be sent only in the initial media stream in a DH or ECDH ZRTP 3703 exchange, not in Multistream mode. 3705 Although the signature is sent, the material that is signed, the 3706 sashash, is not sent with it in the Confirm message, since both 3707 parties have already independently calculated the sashash. That is 3708 not the case for the SASrelay message, which must relay the sashash. 3710 To avoid unnecessary signature calculations, a signature SHOULD NOT 3711 be sent if the other ZRTP endpoint did not set the (S) flag in the 3712 Hello message (Section 5.2). 3714 Note that the choice of hash algorithm used in the digital signature 3715 is independent of the hash used in the sashash. The sashash is 3716 determined by the negotiated Hash Type (Section 5.1.2), while the 3717 hash used by the digital signature is separately defined by the 3718 digital signature algorithm. For example, the sashash may be based 3719 on SHA-256, while the digital signature might use SHA-384, if an 3720 ECDSA P-384 key is used. 3722 If the sashash (which is always truncated to 256 bits) is shorter 3723 than the signature hash, the security is not weakened because the 3724 hash commitment precludes the attacker from searching for sashash 3725 collisions, as explained in Section 4.4.1.1. 3727 ECDSA algorithms may be used with either OpenPGP-formatted keys, or 3728 X.509v3 certificates. If the ZRTP key exchange is ECDH, and the SAS 3729 is signed, then the signature SHOULD be ECDSA, and SHOULD use the 3730 same size curve as the ECDH exchange if an ECDSA key of that size is 3731 available. 3733 If a ZRTP endpoint supports incoming signatures (evidenced by setting 3734 the (S) flag in the Hello message), it SHOULD be able to parse 3735 signatures from the other endpoint in OpenPGP format and MUST be able 3736 to parse them in X.509v3 format. If the incoming signature is in an 3737 unsupported format, or the trust model does not lead to a trusted 3738 introducer or a trusted certificate authority (CA), another 3739 authentication method may be used if available, such as the SAS 3740 compare, or a cached shared secret from a previous session. If none 3741 of these methods are available, it is up to the ZRTP user agent and 3742 the user to decide whether to proceed with the call, after the user 3743 is informed. 3745 Both ECDSA and DSA [FIPS-186-3] have a feature that allows most of 3746 the signature calculation to be done in advance of the session, 3747 reducing latency during call setup. This is useful for low-power 3748 mobile handsets. 3750 ECDSA is preferred because it has compact keys as well as compact 3751 signatures. If the signature along with its public key certificate 3752 are insufficiently compact, the Confirm message may become too long 3753 for the maximum transmission unit (MTU) size, and UDP fragmentation 3754 may result. Some firewalls and NATs may discard fragmented UDP 3755 packets, which would cause the ZRTP exchange to fail. It is 3756 RECOMMENDED that a ZRTP endpoint avoid sending signatures if they 3757 would cause UDP fragmentation. For a discussion on MTU size and PMTU 3758 discovery, see [RFC1191] and [RFC1981]. 3760 From a packet-size perspective, ECDSA and DSA both produce equally 3761 compact signatures for a given signature strength. DSA keys are much 3762 bigger than ECDSA keys, but in the case of OpenPGP signatures, the 3763 public key is not sent along with the signature. 3765 All signatures generated MUST use only NIST-approved hash algorithms, 3766 and MUST avoid using SHA1. This applies to both OpenPGP and X.509v3 3767 signatures. NIST-approved hash algorithms are found in [FIPS-180-3] 3768 or its SHA-3 successor. All ECDSA curves used throughout this spec 3769 are over prime fields, drawn from Appendix D.1.2 of [FIPS-186-3]. 3771 7.2.1. OpenPGP Signatures 3773 If the SAS Signature Type (Section 5.1.7) specifies an OpenPGP 3774 signature ("PGP "), the signature-related fields are arranged as 3775 follows. 3777 The first field after the 4-octet Signature Type Block is the OpenPGP 3778 signature. The format of this signature and the algorithms that 3779 create it are specified by [RFC4880] and [RFC6637]. The signature is 3780 comprised of a complete OpenPGP version 4 signature in binary form 3781 (not Radix-64), as specified in RFC 4880, Section 5.2.3, enclosed in 3782 the full OpenPGP packet syntax. The length of the OpenPGP signature 3783 is parseable from the signature, and depends on the type and length 3784 of the signing key. 3786 If OpenPGP signatures are supported, an implementation SHOULD NOT 3787 generate signatures using any other signature algorithm except DSA or 3788 ECDSA (ECDSA in openPGP is defined in [RFC6637]), but MAY accept 3789 other signature types from the other party. DSA signatures with keys 3790 shorter than 2048 bits or longer than 3072 bits MUST NOT be 3791 generated. 3793 Any use of ECDSA signatures in ZRTP SHOULD NOT generate signatures 3794 using ECDSA key sizes other than P-224, P-256, and P-384, as defined 3795 in [FIPS-186-3]. 3797 RFC 4880, Section 5.2.3.18, specifies a way to embed, in an OpenPGP 3798 signature, a URI of the preferred key server. The URI should be 3799 fully specified to obtain the public key of the signing key that 3800 created the signature. This URI MUST be present. It is up to the 3801 recipient of the signature to obtain the public key of the signing 3802 key and determine its validity status using the OpenPGP trust model 3803 discussed in [RFC4880]. 3805 The contents of Figure 20 lie inside the encrypted region of the 3806 Confirm message (Figure 10) or the SASrelay message (Figure 16). 3808 The total length of all the material in Figure 20, including the key 3809 server URI, must not exceed 511 32-bit words (2044 octets). This 3810 length, in words, is stored in the signature length field in the 3811 Confirm or SASrelay message containing the signature. It is 3812 desirable to avoid UDP fragmentation, so the URI should be kept 3813 short. 3815 0 1 2 3 3816 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3817 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3818 | Signature Type Block = "PGP " (1 word) | 3819 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3820 | | 3821 | OpenPGP signature | 3822 | (variable length) | 3823 | . . . | 3824 | | 3825 +===============================================================+ 3827 Figure 20: OpenPGP Signature Format 3829 7.2.2. ECDSA Signatures with X.509v3 Certs 3831 If the SAS Signature Type (Section 5.1.7) is "X509", the ECDSA 3832 signature-related fields are arranged as follows. 3834 The first field after the 4-octet Signature Type Block is the DER 3835 encoded X.509v3 certificate (the signed public key) of the ECDSA 3836 signing key that created the signature. The format of this 3837 certificate is specified by the NSA's Suite B Certificate and CRL 3838 Profile [RFC5759]. 3840 Following the X.509v3 certificate at the next word boundary is the 3841 ECDSA signature itself. The size of this field depends on the size 3842 and type of the public key in the aforementioned certificate. The 3843 format of this signature and the algorithms that create it are 3844 specified by [FIPS-186-3]. The signature is comprised of the ECDSA 3845 signature output parameters (r, s) in binary form, concatenated, in 3846 network byte order, with no truncation of leading zeros. The first 3847 half of the signature is r and the second half is s. If ECDSA P-256 3848 is specified, the signature fills 16 words (64 octets), 32 octets 3849 each for r and s. If ECDSA P-384 is specified, the signature fills 3850 24 words (96 octets), 48 octets each for r and s. 3852 It is up to the recipient of the signature to use information in the 3853 certificate and path discovery mechanisms to trace the chain back to 3854 the root CA. It is recommended that end user certificates issued for 3855 secure telephony should contain appropriate path discovery links to 3856 facilitate this. 3858 Figure 21 shows a certificate and an ECDSA signature. All this 3859 material lies inside the encrypted region of the Confirm message 3860 (Figure 10) or the SASrelay message (Figure 16). 3862 The total length of all the material in Figure 21, including the 3863 X.509v3 certificate, must not exceed 511 32-bit words (2044 octets). 3864 This length, in words, is stored in the signature length field in the 3865 Confirm or SASrelay message containing the signature. It is 3866 desirable to avoid UDP fragmentation, so the certificate material 3867 should be kept to a much smaller size than this. End user certs 3868 issued for this purpose should minimize the size of extraneous 3869 material such as legal notices. 3871 0 1 2 3 3872 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3873 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3874 | Signature Type Block = "X509" (1 word) | 3875 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3876 | | 3877 | Signing key's X.509v3 certificate | 3878 | (variable length) | 3879 | . . . | 3880 | | 3881 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3882 | | 3883 | ECDSA P-256 or P-384 signature | 3884 | (16 words or 24 words) | 3885 | . . . | 3886 | | 3887 +===============================================================+ 3889 Figure 21: X.509v3 ECDSA Signature Format 3891 7.2.3. Signing the SAS without a PKI 3893 It's not strictly necessary to use a PKI to back the public key that 3894 signs the SAS. For example, it is possible to use a self-signed 3895 X.509v3 certificate or an OpenPGP key that is not signed by any other 3896 key. In this scenario, the same key continuity technique used by SSH 3897 [RFC4251] may be used. The public key is cached locally the first 3898 time it is encountered, and when the same public key is encountered 3899 again in subsequent sessions, it's deemed not to be a MiTM attack. 3900 If there is no MiTM attack in the first session, there cannot be a 3901 MiTM attack in any subsequent session. This is exactly how SSH does 3902 it. 3904 Of course, the security rests on the assumption that the MiTM did not 3905 attack in the first session. That assumption seems to work most of 3906 the time in the SSH world. The user would have to be warned the 3907 first time a public key is encountered, just as in SSH. If possible, 3908 the SAS should be checked before the user consents to caching the new 3909 public key. If the SAS matches in the first session, there is no 3910 MiTM, and it's safe to cache the public key. If no SAS comparison is 3911 possible, it's up to the user, or up to the application, to decide 3912 whether to take a leap of faith and proceed. That's how SSH works 3913 most of the time, because SSH users don't have the chance to verbally 3914 compare an SAS with anyone. 3916 For a phone that is SIP-registered to a PBX, it may be provisioned 3917 with the public key of the PBX, using a trusted automated 3918 provisioning process. Even without a PKI, the phone knows that the 3919 public key is the correct one, since it was provisioned into the 3920 phone by a trusted provisioning mechanism. This makes it easy for 3921 the phone to access several automated services commonly offered by a 3922 PBX, such as voice mail or a conference bridge, where there is no 3923 human at the PBX to do a verbal SAS compare. The same provisioning 3924 may be used to preload the pbxsecret into the phone, which is 3925 discussed in Section 7.3.1. 3927 7.3. Relaying the SAS through a PBX 3929 ZRTP is designed to use end-to-end encryption. The two parties' 3930 verbal comparison of the short authentication string (SAS) depends on 3931 this assumption. But in some PBX environments, such as Asterisk, 3932 there are usage scenarios that have the PBX acting as a trusted MiTM, 3933 which means there are two back-to-back ZRTP connections with separate 3934 session keys and separate SASs. 3936 For example, imagine that Bob has a ZRTP-enabled VoIP phone that has 3937 been registered with his company's PBX, so that it is regarded as an 3938 extension of the PBX. Alice, whose phone is not associated with the 3939 PBX, might dial the PBX from the outside, and a ZRTP connection is 3940 negotiated between her phone and the PBX. She then selects Bob's 3941 extension from the company directory in the PBX. The PBX makes a 3942 call to Bob's phone (which might be offsite, many miles away from the 3943 PBX through the Internet) and a separate ZRTP connection is 3944 negotiated between the PBX and Bob's phone. The two ZRTP sessions 3945 have different session keys and different SASs, which would render 3946 the SAS useless for verbal comparison between Alice and Bob. They 3947 might even mistakenly believe that a wiretapper is present because of 3948 the SAS mismatch, causing undue alarm. 3950 ZRTP has a mechanism for solving this problem by having the PBX relay 3951 the Alice/PBX SAS to Bob, sending it through to Bob in a special 3952 SASrelay message as defined in Section 5.13, which is sent after the 3953 PBX/Bob ZRTP negotiation is complete, after the Confirm messages. 3954 Only the PBX, acting as a special trusted MiTM (trusted by the 3955 recipient of the SASrelay message), will relay the SAS. The SASrelay 3956 message protects the relayed SAS from tampering via an included MAC, 3957 similar to how the Confirm message is protected. Bob's ZRTP-enabled 3958 phone accepts the relayed SAS for rendering only because Bob's phone 3959 had previously been configured to trust the PBX. This special 3960 trusted relationship with the PBX can be established through a 3961 special security enrollment procedure (Section 7.3.1). After that 3962 enrollment procedure, the PBX is treated by Bob as a special trusted 3963 MiTM. This results in Alice's SAS being rendered to Bob, so that 3964 Alice and Bob may verbally compare them and thus prevent a MiTM 3965 attack by any other untrusted MiTM. 3967 A real "bad-guy" MiTM cannot exploit this protocol feature to mount a 3968 MiTM attack and relay Alice's SAS to Bob, because Bob has not 3969 previously carried out a special registration ritual with the bad 3970 guy. The relayed SAS would not be rendered by Bob's phone, because 3971 it did not come from a trusted PBX. The recognition of the special 3972 trust relationship is achieved with the prior establishment of a 3973 special shared secret between Bob and his PBX, which is called 3974 pbxsecret (defined in Section 7.3.1), also known as the trusted MiTM 3975 key. 3977 The trusted MiTM key can be stored in a special cache at the time of 3978 the initial enrollment (which is carried out only once for Bob's 3979 phone), and Bob's phone associates this key with the ZID of the PBX, 3980 while the PBX associates it with the ZID of Bob's phone. After the 3981 enrollment has established and stored this trusted MiTM key, it can 3982 be detected during subsequent ZRTP session negotiations between the 3983 PBX and Bob's phone, because the PBX and the phone MUST pass the hash 3984 of the trusted MiTM key in the DH message. It is then used as part 3985 of the key agreement to calculate s0. 3987 The PBX can determine whether it is trusted by the ZRTP user agent of 3988 a phone. The presence of a shared trusted MiTM key in the key 3989 negotiation sequence indicates that the phone has been enrolled with 3990 this PBX and therefore trusts it to act as a trusted MiTM. During a 3991 key agreement with two other ZRTP endpoints, the PBX may have a 3992 shared trusted MiTM key with both endpoints, only one endpoint, or 3993 neither endpoint. If the PBX has a shared trusted MiTM key with 3994 neither endpoint, the PBX MUST NOT relay the SAS. If the PBX has a 3995 shared trusted MiTM key with only one endpoint, the PBX MUST relay 3996 the SAS from one party to the other by sending an SASrelay message to 3997 the endpoint with which it shares a trusted MiTM key. If the PBX has 3998 a (separate) shared trusted MiTM key with each of the endpoints, the 3999 PBX MUST relay the SAS to only one endpoint, not both endpoints. 4001 Note: In the case of a PBX sharing trusted MiTM keys with both 4002 endpoints, it does not matter which endpoint receives the relayed 4003 SAS as long as only one endpoint receives it. 4005 The relayed SAS fields contain the SAS rendering type and the 4006 complete sashash. The receiver absolutely MUST NOT render the 4007 relayed SAS if it does not come from a specially trusted ZRTP 4008 endpoint. The security of the ZRTP protocol depends on not rendering 4009 a relayed SAS from an untrusted MiTM, because it may be relayed by a 4010 MiTM attacker. See the SASrelay message definition (Figure 16) for 4011 further details. 4013 To ensure that both Alice and Bob will use the same SAS rendering 4014 scheme after the keys are negotiated, the PBX also sends the SASrelay 4015 message to the unenrolled party (which does not regard this PBX as a 4016 trusted MiTM), conveying the SAS rendering scheme, but not the 4017 sashash, which it sets to zero. The unenrolled party will ignore the 4018 relayed SAS field, but will use the specified SAS rendering scheme. 4019 If both endpoints are enrolled, one of them will still receive an 4020 "empty" SASrelay message. If and only if a PBX relays an SAS to one 4021 endpoint, it MUST also send an "empty" SASrelay to the other 4022 endpoint, containing a null sashash. 4024 It is possible to route a call through two ZRTP-enabled PBXs using 4025 this scheme. Assume Alice is a ZRTP endpoint who trusts her local 4026 PBX in Atlanta, and Bob is a ZRTP endpoint who trusts his local PBX 4027 in Biloxi. The call is routed from Alice to the Atlanta PBX to the 4028 Biloxi PBX to Bob. Atlanta would relay the Atlanta-Biloxi SAS to 4029 Alice because Alice is enrolled with Atlanta, and Biloxi would relay 4030 the Atlanta-Biloxi SAS to Bob because Bob is enrolled with Biloxi. 4031 The two PBXs are not assumed to be enrolled with each other in this 4032 example. Both Alice and Bob would view and verbally compare the same 4033 relayed SAS, the Atlanta-Biloxi SAS. No more than two trusted MiTM 4034 nodes can be traversed with this relaying scheme. This behavior is 4035 extended to two PBXs that are enrolled with each other, via this 4036 rule: In the case of a PBX sharing trusted MiTM keys with both 4037 endpoints (i.e., both enrolled with this PBX), one of which is 4038 another PBX (evidenced by the M-flag) and one of which is a non-PBX, 4039 the MiTM PBX MUST always relay the PBX-to-PBX SAS to the non-PBX 4040 endpoint. 4042 A ZRTP endpoint phone that trusts a PBX to act as a trusted MiTM is 4043 effectively delegating its own policy decisions of algorithm 4044 negotiation to the PBX. 4046 When a PBX is between two ZRTP endpoints and is terminating their 4047 media streams at the PBX, the PBX presents its own ZID to the two 4048 parties, eclipsing the ZIDs of the two parties from each other. For 4049 example, if several different calls are routed through such a PBX to 4050 several different ZRTP-enabled phones behind the PBX, only a single 4051 ZID is presented to the calling party in every case -- the ZID of the 4052 PBX itself. 4054 This SAS relay mechanism imposes a cognitive burden on the user, and 4055 the number of intermediaries does not scale up beyond two PBXs 4056 trusted by their respective local users. The ZRTP ecosystem becomes 4057 more elegant if all PBXs and other media intermediaries avoid the 4058 MiTM role whenever possible, as explained in Section 10.1. 4060 The next section describes the initial enrollment procedure that 4061 establishes a special shared secret, a trusted MiTM key, between a 4062 PBX and a phone, so that the phone will learn to recognize the PBX as 4063 a trusted MiTM. 4065 7.3.1. PBX Enrollment and the PBX Enrollment Flag 4067 Both the PBX and the endpoint need to know when enrollment is taking 4068 place. One way of doing this is to set up an enrollment extension on 4069 the PBX that a newly configured endpoint would call and establish a 4070 ZRTP session. The PBX would then play audio media that offers the 4071 user an opportunity to configure his phone to trust this PBX as a 4072 trusted MiTM. The PBX calculates and stores the trusted MiTM shared 4073 secret in its cache and associates it with this phone, indexed by the 4074 phone's ZID. The trusted MiTM PBX shared secret is derived from 4075 ZRTPSess via the ZRTP key derivation function (Section 4.5.1) in this 4076 manner: 4078 pbxsecret = KDF(ZRTPSess, "Trusted MiTM key", (ZIDi || ZIDr), 256) 4080 The pbxsecret is calculated for the whole ZRTP session, not for each 4081 stream within a session, thus the KDF Context field in this case does 4082 not include any stream-specific nonce material. 4084 The PBX signals the enrollment process by setting the PBX Enrollment 4085 flag (E) in the Confirm message (Figure 10). This flag is used to 4086 trigger the ZRTP endpoint's user interface to prompt the user to see 4087 if it wants to trust this PBX and calculate and store the pbxsecret 4088 in the cache. If the user decides to respond by activating the 4089 appropriate user interface element (a menu item, checkbox, or 4090 button), his ZRTP user agent calculates pbxsecret using the same 4091 formula, and saves it in a special cache entry associated with this 4092 PBX. 4094 During a PBX enrollment, the GoClear features are disabled. If the 4095 (E) flag is set by the PBX, the PBX MUST NOT set the Allow Clear (A) 4096 flag. Thus, (E) implies not (A). If a received Confirm message has 4097 the (E) flag set, the (A) flag MUST be disregarded and treated as 4098 false. 4100 If the user elects not to enroll, perhaps because he dialed a wrong 4101 number or does not yet feel comfortable with this PBX, he can simply 4102 hang up and not save the pbxsecret in his cache. The PBX will have 4103 it saved in the PBX cache, but that will do no harm. The SASrelay 4104 scheme does not depend on the PBX trusting the phone. It only 4105 depends on the phone trusting the PBX. It is the phone (the user) 4106 who is at risk if the PBX abuses its MiTM privileges. 4108 An endpoint MUST NOT store the pbxsecret in the cache without 4109 explicit user authorization. 4111 After this enrollment process, the PBX and the ZRTP-enabled phone 4112 both share a secret that enables the phone to recognize the PBX as a 4113 trusted MiTM in future calls. This means that when a future call 4114 from an outside ZRTP-enabled caller is relayed through the PBX to 4115 this phone, the phone will render a relayed SAS from the PBX. If the 4116 SASrelay message comes from a MiTM that does not know the pbxsecret, 4117 the phone treats it as a bad-guy MiTM, and refuses to render the 4118 relayed SAS. Regardless of which party initiates any future phone 4119 calls through the PBX, the enrolled phone or the outside phone, the 4120 PBX will relay the SAS to the enrolled phone. 4122 This enrollment procedure is designed primarily for phones that are 4123 already associated with the PBX -- enterprise phones that are 4124 "behind" the PBX. It is not intended for the countless outside 4125 phones that are not registered to this PBX's SIP server. It should 4126 be regarded as part of the installation and provisioning process for 4127 a new phone in the organization. 4129 There are more streamlined methods to configure ZRTP user agents to 4130 trust a PBX. In large scale deployments, the pbxsecret may be 4131 configured into the phone by an automated provisioning process, which 4132 may be less burdensome for the users and less error prone. This 4133 specification does not require a manual enrollment process. Any 4134 process that results in a pbxsecret to be computed and shared between 4135 the PBX and the phone will suffice, as long as the user is made aware 4136 that this puts the PBX in a position to wiretap the calls. 4138 It is recommended that a ZRTP client not proceed with the PBX 4139 enrollment procedure without evidence that a MiTM attack is not 4140 taking place during the enrollment session. It would be especially 4141 damaging if a MiTM tricks the client into enrolling with the wrong 4142 PBX. That would enable the malevolent MiTM to wiretap all future 4143 calls without arousing suspicion, because he would appear to be 4144 trusted. 4146 To this end, the client ZRTP endpoint should not proceed with PBX 4147 enrollment unless at least one of the following conditions apply: 4149 o An automated mechanism is used, from Section 7.4. TLS-protected 4150 signaling may be especially well-suited in this special case, for 4151 reasons explained in Section 8.1.1. 4153 o The SAS is verified with a live human on the PBX side during the 4154 enrollment session. 4156 o It is the judgement of the administrator supervising the 4157 enrollment that the threat model and the circumstances indicate a 4158 low probability of a MiTM being present, perhaps because this is 4159 the first call to the PBX, or because the enrollment is conducted 4160 over a relatively safe network. For example, a mobile smart phone 4161 can be enrolled through a protected WiFi local network near the 4162 PBX, before issuing it to an employee for international travel. 4163 This leap of faith is usually justified in benign environments. 4165 7.4. Automated Methods of Authenticating the DH Exchange 4167 Alternate methods of authenticating the DH exchange may be used when 4168 interacting with an automated remote system, when no human is 4169 available at the remote endpoint to verbally compare the SAS. Usage 4170 scenarios include leaving or retrieving voicemail, interacting with a 4171 conference bridge, or the PBX security enrollment procedure 4172 (Section 7.3.1). 4174 Here are the automated ways to have ZRTP authenticate the DH 4175 exchange: 4177 o Successful use of the mechanism described in Section 8.1.1, but 4178 only if fully supported by end-to-end integrity-protected delivery 4179 of the a=zrtp-hash in the signaling. This might be achieved via 4180 [RFC4474] or better still, Dan Wing's SIP Identity using Media 4181 Path [SIP-IDENTITY]. This allows authentication of the DH 4182 exchange without human assistance. However, in most usage 4183 scenarios that access an automated system, the entire end-to-end 4184 path is comprised of only one hop, so TLS provides sufficient 4185 integrity protection in this special case. This is explained in 4186 detail in Section 8.1.1. 4188 o The SAS was previously verified with the remote system in an 4189 earlier session, evidenced by the SAS verified flag (V) 4190 (Section 7.1) at both ends and a matching cache entry. If 4191 circumstances permit this method, it has the advantage of not 4192 requiring a PKI. 4194 o A good signature is received and verified using the digital 4195 signature feature on the SAS hash, as described in Section 7.2, if 4196 this feature is supported. Note that for PBX enrollment, only the 4197 PBX endpoint needs to supply the signature, because the trust 4198 decision is made on the client side only. 4200 In any PKI-backed scheme, there is the disadvantage of having to 4201 decide what to do if the connection fails to authenticate because of 4202 a certificate problem. Warning messages may not be effective because 4203 users become habituated to security warnings [Sunshine] about PKI 4204 certificates. Implementors should carefully weigh the cognitive 4205 burden on the user before they invoke such a heavyweight mechanism. 4206 ZRTP is intended to be a lightweight protocol with a low activation 4207 energy and minimal cognitive burden. 4209 When calling an automated system for the first time, the threat model 4210 and circumstances should be examined to decide if a PKI is the only 4211 way to protect against a MiTM. A reasonable alternative to a PKI 4212 would be to rely on the leap of faith that a MiTM attack is less 4213 likely in the initial session, an assumption that seems to work well 4214 enough for SSH. After the first session, cached shared secrets 4215 should suffice. 4217 8. Signaling Interactions 4219 This section discusses how ZRTP, SIP, and SDP work together. 4221 Note that ZRTP may be implemented without coupling with the SIP 4222 signaling. For example, ZRTP can be implemented as a "bump in the 4223 wire" or as a "bump in the stack" in which RTP sent by the SIP User 4224 Agent (UA) is converted to ZRTP. In these cases, the SIP UA will 4225 have no knowledge of ZRTP. As a result, the signaling path discovery 4226 mechanisms introduced in this section should not be definitive -- 4227 they are a hint. Despite the absence of an indication of ZRTP 4228 support in an offer or answer, a ZRTP endpoint SHOULD still send 4229 Hello messages. 4231 ZRTP endpoints that have control over the signaling path include a 4232 ZRTP SDP attributes in their SDP offers and answers. The ZRTP 4233 attribute, a=zrtp-hash, is used to indicate support for ZRTP and to 4234 convey a hash of the Hello message. The hash is computed according 4235 to Section 8.1. 4237 Aside from the advantages described in Section 8.1, there are a 4238 number of potential uses for this attribute. It is useful when 4239 signaling elements would like to know when ZRTP may be utilized by 4240 endpoints. It is also useful if endpoints support multiple methods 4241 of SRTP key management. The ZRTP attribute can be used to ensure 4242 that these key management approaches work together instead of against 4243 each other. For example, if only one endpoint supports ZRTP, but 4244 both support another method to key SRTP, then the other method will 4245 be used instead. When the a=crypto [RFC4568] attribute and the 4246 a=zrtp-hash attribute are both used in parallel, the media can 4247 transition from SDP Security Descriptions-keyed SRTP to ZRTP-keyed 4248 SRTP, as described in Section 8.2. The ZRTP attribute is also used 4249 to signal to an intermediary ZRTP device not to act as a ZRTP 4250 endpoint, as discussed in Section 10 and Section 10.1. 4252 The a=zrtp-hash attribute can only be included in the SDP at the 4253 media level since Hello messages sent in different media streams will 4254 have unique hashes. A separate a=zrtp-hash attribute should be 4255 included for each media stream. Both ZRTP endpoints should provide 4256 a=zrtp-hash attributes in their SDP. 4258 The ABNF for the ZRTP attribute is as follows: 4260 zrtp-attribute = "a=zrtp-hash:" zrtp-version zrtp-hash-value 4262 zrtp-version = token 4264 zrtp-hash-value = 1*(HEXDIG) 4266 Here's an example of the ZRTP attribute in an initial SDP offer or 4267 answer used at the media level, using the convention 4268 defined in RFC 4475, Section 2.1 [RFC4475]: 4270 v=0 4271 o=bob 2890844527 2890844527 IN IP4 client.biloxi.example.com 4272 s= 4273 c=IN IP4 client.biloxi.example.com 4274 t=0 0 4275 m=audio 3456 RTP/AVP 97 33 4276 a=rtpmap:97 iLBC/8000 4277 a=rtpmap:33 no-op/8000 4278 4279 a=zrtp-hash:1.10 fe30efd02423cb054e50efd0248742ac7a52c8f91bc2 4280 df881ae642c371ba46df 4281 4283 A mechanism for carrying this same zrtp-hash information in the 4284 Jingle signaling protocol is defined in [XEP-0262]. 4286 It should be safe to send ZRTP messages even when there is no 4287 evidence in the signaling that the other party supports it, because 4288 ZRTP has been designed to be clearly different from RTP, having a 4289 similar structure to STUN packets sent during an ICE exchange. 4291 8.1. Binding the Media Stream to the Signaling Layer via the Hello Hash 4293 Tying the media stream to the signaling channel can help prevent a 4294 third party from inserting false media packets. If the signaling 4295 layer contains information that ties it to the media stream, false 4296 media streams can be rejected. 4298 To accomplish this, the entire Hello message (Figure 3) is hashed, 4299 using the hash algorithm defined in Section 5.1.2.2. The ZRTP packet 4300 framing from Figure 2 is not included in the hash. The resulting 4301 hash image is made available without truncation to the signaling 4302 layer, where it is transmitted as a hexadecimal value in the SIP 4303 channel using the SDP attribute a=zrtp-hash, defined in this 4304 specification. Assuming Section 5.1.2.2 defines a 256-bit hash 4305 length, the a=zrtp-hash field in the SDP attribute carries 64 4306 hexadecimal digits. Each media stream (audio or video) will have a 4307 separate Hello message, and thus will require a separate a=zrtp-hash 4308 in an SDP attribute. The recipient of the SIP/SDP message can then 4309 use this hash image to detect and reject false Hello messages in the 4310 media channel, as well as identify which media stream is associated 4311 with this SIP call. Each Hello message hashes uniquely, because it 4312 contains the H3 field derived from a random nonce, defined in 4313 Section 9. 4315 The Hello Hash as an SDP attribute is not a REQUIRED feature, because 4316 some ZRTP endpoints do not have the ability to add SDP attributes to 4317 the signaling. For example, if ZRTP is implemented in a hardware 4318 bump-in-the-wire device, it might only have the ability to modify the 4319 media packets, not the SIP packets, especially if the SIP packets are 4320 integrity protected and thus cannot be modified on the wire. If the 4321 SDP has no hash image of the ZRTP Hello message, the recipient's ZRTP 4322 user agent cannot check it, and thus will not be able to reject Hello 4323 messages based on this hash. 4325 After the Hello Hash is used to properly identify the ZRTP Hello 4326 message as belonging to this particular SIP call, the rest of the 4327 ZRTP message sequence is protected from false packet injection by 4328 other protection mechanisms, such as the hash chaining mechanism 4329 defined in Section 9. 4331 An attacker who controls only the signaling layer, such as an 4332 uncooperative VoIP service provider, may be able to deny service by 4333 corrupting the hash of the Hello message in the SDP attribute, which 4334 would force ZRTP to reject perfectly good Hello messages. If there 4335 is reason to believe this is happening, the ZRTP endpoint MAY allow 4336 Hello messages to be accepted that do not match the hash image in the 4337 SDP attribute. 4339 Even in the absence of SIP integrity protection, the inclusion of the 4340 a=zrtp-hash SDP attribute, when coupled with the hash chaining 4341 mechanism defined in Section 9, meets the R-ASSOC requirement in the 4342 Media Security Requirements [RFC5479], which requires: 4344 ...a mechanism for associating key management messages with both 4345 the signaling traffic that initiated the session and with 4346 protected media traffic. It is useful to associate key management 4347 messages with call signaling messages, as this allows the SDP 4348 offerer to avoid performing CPU-consuming operations (e.g., 4349 Diffie-Hellman or public key operations) with attackers that have 4350 not seen the signaling messages. 4352 The a=zrtp-hash SDP attribute becomes especially useful if the SDP is 4353 integrity-protected end-to-end by SIP Identity [RFC4474] or better 4354 still, Dan Wing's SIP Identity using Media Path [SIP-IDENTITY]. This 4355 leads to an ability to stop MiTM attacks independent of ZRTP's SAS 4356 mechanism, as explained in Section 8.1.1. 4358 8.1.1. Integrity-Protected Signaling Enables Integrity-Protected DH 4359 Exchange 4361 If and only if the signaling path and the SDP is protected by some 4362 form of end-to-end integrity protection, such as one of the 4363 abovementioned mechanisms, so that it can guarantee delivery of the 4364 a=zrtp-hash attribute without any tampering by a third party, and if 4365 there is good reason to trust the signaling layer to protect the 4366 interests of the end user, it is possible to authenticate the key 4367 exchange and prevent a MiTM attack. This can be done without 4368 requiring the users to verbally compare the SAS, by using the hash 4369 chaining mechanism defined in Section 9 to provide a series of MAC 4370 keys that protect the entire ZRTP key exchange. Thus, an end-to-end 4371 integrity-protected signaling layer automatically enables an 4372 integrity-protected Diffie-Hellman exchange in ZRTP, which in turn 4373 means immunity from a MiTM attack. Here's how it works. 4375 The integrity-protected SIP SDP contains a hash commitment to the 4376 entire Hello message. The Hello message contains H3, which provides 4377 a hash commitment for the rest of the hash chain H0-H2 (Section 9). 4378 The Hello message is protected by a 64-bit MAC, keyed by H2. The 4379 Commit message is protected by a 64-bit MAC, keyed by H1. The 4380 DHPart1 or DHPart2 messages are protected by a 64-bit MAC, keyed by 4381 H0. The MAC protecting the Confirm messages is computed by a 4382 different MAC key derived from the resulting key agreement. Each 4383 message's MAC is checked when the MAC key is received in the next 4384 message. If a bad MAC is discovered, it MUST be treated as a 4385 security exception indicating a MiTM attack, perhaps by logging or 4386 alerting the user, and MUST NOT be treated as a random error. Random 4387 errors are already discovered and quietly rejected by bad CRCs 4388 (Figure 2). 4390 The Hello message must be assembled before any hash algorithms are 4391 negotiated, so an implicit predetermined hash algorithm and MAC 4392 algorithm (both defined in Section 5.1.2.2) must be used. All of the 4393 aforementioned MACs keyed by the hashes in the aforementioned hash 4394 chain MUST be computed with the MAC algorithm defined in 4395 Section 5.1.2.2, with the MAC truncated to 64 bits. 4397 The Media Security Requirements [RFC5479] R-EXISTING requirement can 4398 be fully met by leveraging a certificate-backed PKI in the signaling 4399 layer to integrity protect the delivery of the a=zrtp-hash SDP 4400 attribute. This would thereby protect ZRTP against a MiTM attack, 4401 without requiring the user to check the SAS, without adding any 4402 explicit signatures or signature keys to the ZRTP key exchange and 4403 without any extra public key operations or extra packets. 4405 Without an end-to-end integrity-protection mechanism in the signaling 4406 layer to guarantee delivery of the a=zrtp-hash SDP attribute without 4407 modification by a third party, these MACs alone will not prevent a 4408 MiTM attack. In that case, ZRTP's built-in SAS mechanism will still 4409 have to be used to authenticate the key exchange. At the time of 4410 this writing, very few deployed VoIP clients offer a fully 4411 implemented SIP stack that provides end-to-end integrity protection 4412 for the delivery of SDP attributes. Also, end-to-end signaling 4413 integrity becomes more problematic if E.164 numbers [RFC3824] are 4414 used in SIP. Thus, real-world implementations of ZRTP endpoints will 4415 continue to depend on SAS authentication for quite some time. Even 4416 after there is widespread availability of SIP user agents that offer 4417 integrity protected delivery of SDP attributes, many users will still 4418 be faced with the fact that the signaling path may be controlled by 4419 institutions that do not have the best interests of the end user in 4420 mind. In those cases, SAS authentication will remain the gold 4421 standard for the prudent user. 4423 The SIP layer can obtain hop-wise integrity protection simply by 4424 using TLS [RFC5246], but this does not achieve full end-to-end 4425 integrity protection of the a=zrtp-hash attribute in the multi-hop 4426 general case. However, if the entire end-to-end signaling path is 4427 comprised of only one hop, TLS is good enough, provided the 4428 associated PKI complexity can be contained. This usually covers the 4429 use cases where a client is traversing one TLS hop to access the 4430 automated remote services of its own PBX, where no human is available 4431 to verbally compare the SAS. Examples include leaving or retrieving 4432 voicemail, interacting with an IVR or conference bridge, or 4433 performing the PBX security enrollment procedure (Section 7.3.1). 4434 Note that the risk of trusting the SIP server or PBX becomes moot 4435 when the PBX itself is the intended ZRTP endpoint. Thus, TLS- 4436 protected signaling is recommended and preferred for these special 4437 use cases. TLS-protected signaling is usually justified for its own 4438 separate reasons, to mitigate exposure to traffic analysis, which 4439 means the signaling layer already would have borne the additional 4440 cost of TLS. 4442 Even without SIP end-to-end integrity protection, the Media Security 4443 Requirements [RFC5479] R-ACT-ACT requirement can be met by ZRTP's SAS 4444 mechanism. Although ZRTP may benefit from an integrity-protected SIP 4445 layer, it is fortunate that ZRTP's self-contained MiTM defenses do 4446 not actually require an integrity-protected SIP layer. ZRTP can 4447 bypass the delays and problems that SIP integrity faces, such as 4448 E.164 number usage, and the complexity of building and maintaining a 4449 PKI. 4451 In contrast, DTLS-SRTP [RFC5764] appears to depend heavily on end-to- 4452 end integrity protection in the SIP layer. Further, DTLS-SRTP must 4453 bear the additional cost of a signature calculation of its own, in 4454 addition to the signature calculation the SIP layer uses to achieve 4455 its integrity protection. ZRTP needs no signature calculation of its 4456 own to leverage the signature calculation carried out in the SIP 4457 layer. 4459 8.2. Combining ZRTP With SDP Security Descriptions (SDES) 4461 The signaling layer may negotiate its own SRTP master key and salt, 4462 using the SDP Security Descriptions (SDES [RFC4568]) or Key 4463 Management for SDP [RFC4567]. This section describes how ZRTP may be 4464 used in combination with SDP Security Descriptions, which uses the 4465 SDP extension attribute a=crypto. 4467 Most ZRTP endpoints are expected to use TLS [RFC5246] to protect the 4468 signaling layer, just because it's a good idea to hide the signaling 4469 from eavesdroppers who want to see who you are calling. If TLS is 4470 used for the signaling, SDP Security Descriptions incurs no 4471 additional cost in packets or computation. 4473 However, SDP Security Descriptions has significant security 4474 vulnerabilities if used alone. Because the SDP Security Descriptions 4475 keying material is known to the SIP server, SDP Security Descriptions 4476 is vulnerable to any SIP server controlled by a wiretapper. For that 4477 reason, SDP Security Descriptions must be regarded as a "wiretap- 4478 friendly" protocol. ZRTP does not reveal key material to the 4479 signaling layer. Further, some TLS cipher suites found in the wild 4480 lack Perfect Forward Secrecy (PFS), so SDP Security Descriptions 4481 would inherit that deficiency. Conversely, ZRTP's a=zrtp-hash 4482 attribute, which is also communicated in the signaling, does not 4483 depend on PFS as this value is already known to the attacker. 4484 Despite these deficiencies of SDP Security Descriptions, it is useful 4485 against other threat models, and can complement ZRTP's strengths. 4487 The advantages of combining SDP Security Descriptions with ZRTP are: 4489 Protects media in the RTP session that precedes a ZRTP exchange. 4490 For example, the first few packets of video may expose sensitive 4491 information and may be transmitted before a ZRTP exchange 4492 completes. 4494 If ZRTP fails for any reason (e.g. an opponent blocks it in the 4495 media layer), the media remains protected by SDP Security 4496 Descriptions-keyed SRTP, which may provide better confidentiality 4497 than having no media encryption at all. 4499 If and only if SDP Security Descriptions is chosen in the SDP answer 4500 and both the SDP offer and answer for the media session contain the 4501 a=zrtp-hash attribute, the SRTP stack MUST, upon completion of the 4502 ZRTP exchange, replace its keying from SDP Security Descriptions- 4503 provided key material to ZRTP-provided key material. In this case, 4504 both ZRTP endpoints MUST clear the Allow Clear flag (A) in their 4505 respective Confirm messages (Figure 10), which disables the GoClear 4506 mechanism (Section 4.7.2). Also in this case, ZRTP MAY include 4507 imported SDP Security Descriptions key material via auxsecret, as 4508 described in Section 8.2.1. 4510 If either endpoint fails to explicitly provide the a=zrtp-hash 4511 attribute via SDP, the SRTP stack MUST NOT be rekeyed by the ZRTP 4512 exchange. Instead, the plaintext media MUST continue to be encrypted 4513 with the keys negotiated via SDP Security Descriptions. This SDP 4514 Security Descriptions-keyed ciphertext media MUST then be treated as 4515 though it were plaintext RTP and enciphered with a second, 4516 independent SRTP context keyed by ZRTP. The result is that the media 4517 will pass through two layers of SRTP encryption, with the inner layer 4518 keyed by SDP Security Descriptions, and the outer layer keyed by 4519 ZRTP. This relatively inefficient scenario is expected to be rare, 4520 and applies mainly to "bump-in-the-wire" ZRTP proxies (Section 10) 4521 that have no access to the signaling layer, such as [Zfone]. Note 4522 that this paragraph breaks backward compatibility with RFC 6189 for 4523 any ZRTP devices which negotiate SDP Security Descriptions via SDP 4524 but fail to send the a=zrtp-hash attribute in their SDP. 4526 8.2.1. Deriving auxsecret from SDP Security Descriptions Key Material 4528 The shared secret calculations defined in Section 4.3 make use of the 4529 auxsecret, which may be optionally provided by various out-of-band 4530 sources. In this section, we show how auxsecret may be derived from 4531 SDP Security Descriptions [RFC4568] keying information that may be 4532 present in the signaling layer. 4534 If only one SRTP key negotiation protocol is to be used, that 4535 protocol should be ZRTP. But in the event the signaling layer 4536 negotiates its own SRTP master key and salt, using the SDP Security 4537 Descriptions (SDP Security Descriptions [RFC4568]) or [RFC4567], it 4538 can be passed from the signaling to the ZRTP layer and mixed into 4539 ZRTP's own shared secret calculations, without compromising security 4540 by creating a dependency on the signaling for media encryption. ZRTP 4541 endpoints may make use of SDP Security Descriptions parameters from 4542 any signaling protocol that provides it. 4544 If SDP Security Descriptions is used in the signaling layer, there 4545 are two separate SRTP master keys and salts provided by SDP Security 4546 Descriptions, one for each direction of media flow. These two keys 4547 and salts are combined here into a single shared secret, auxsecret, 4548 to feed into the mix of ZRTP shared secret calculations. 4550 auxsecret = KDF(hash(len(srtpmki) || srtpmki || 4551 len(srtpmkr) || srtpmkr), 4552 "SRTP Secret", 4553 (ZIDi || ZIDr || 4554 srtpmsi || srtpmsr), 4555 negotiated hash length) 4557 In the above formula, the parameters srtpmki and srtpmsi are 4558 extracted from the SDP Security Descriptions transmitted in the 4559 signaling by the SIP initiator, while srtpmkr and srtpmsr are 4560 extracted from the SDP Security Descriptions transmitted in the 4561 signaling by the SIP responder. These keys and salts are in binary 4562 form, not the base64 representation used by SDP Security 4563 Descriptions. The explicit length fields, len(), in the above hash 4564 are 32-bit big-endian integers, giving the length in octets of the 4565 field that follows. The length in octets of srtpmki or srtpmkr can 4566 only be 16, 24, or 32, if the AES is used. srtpmki is the SIP 4567 initiator's SRTP master key, srtpmkr is the SIP responder's SRTP 4568 master key, srtpmsi is the SIP initiator's SRTP master salt, and 4569 srtpmsr is the SIP responder's SRTP master salt. The length of the 4570 SRTP master salts are defined as 112 bits in [RFC3711]. ZIDi is the 4571 ZRTP initiator's ZID, and ZIDr is the ZRTP responder's ZID. 4573 This mechanism only provides a way to import the associated SDP 4574 Security Descriptions keying material from the first media stream in 4575 a ZRTP exchange. Any additional media stream would be keyed by 4576 ZRTP's Multistream mode (Section 4.4.3), and thus would not import 4577 any additional SDP Security Descriptions keying material associated 4578 with the additional media stream. 4580 The inclusion of SDP Security Descriptions keying material is 4581 optional for a ZRTP endpoint. Even if only one endpoint computes 4582 auxsecret from the SDP Security Descriptions material, ZRTP protocol 4583 completion is still possible if security policy permits a non- 4584 matching auxsecret, as can be seen in Section 4.3. SDP Security 4585 Descriptions key material MUST NOT be imported into ZRTP except in 4586 circumstances defined in Section 8.2, when the a=zrtp-hash attribute 4587 is also present in the signaling. 4589 There are no security enhancements conferred by importing SDP 4590 Security Descriptions material into ZRTP, that are not already 4591 conferred by using the a=zrtp-hash attribute. Both enhance security 4592 only if the SIP server is trustworthy. For this reason, this section 4593 may be deprecated in future versions of this specification. 4595 8.3. Codec Selection for Secure Media 4597 Codec selection is negotiated in the signaling layer. If the 4598 signaling layer determines that ZRTP is supported by both endpoints, 4599 this should provide guidance in codec selection to avoid variable 4600 bitrate (VBR) codecs that leak information. 4602 When voice is compressed with a VBR codec, the packet lengths vary 4603 depending on the types of sounds being compressed. This leaks a lot 4604 of information about the content even if the packets are encrypted, 4605 regardless of what encryption protocol is used [Wright1]. It is 4606 RECOMMENDED that VBR codecs be avoided in encrypted calls. It is not 4607 a problem if the codec adapts the bitrate to the available channel 4608 bandwidth. The vulnerable codecs are the ones that change their 4609 bitrate depending on the type of sound being compressed. 4611 It also appears that voice activity detection (VAD) leaks information 4612 about the content of the conversation, but to a lesser extent than 4613 VBR. This effect can be mitigated by lengthening the VAD hangover 4614 time by a random amount between 1 and 2 seconds, if this is feasible 4615 in your application. Only short bursts of speech would benefit from 4616 lengthening the VAD hangover time. 4618 The security problems of VBR and VAD are addressed in detail by the 4619 guidelines in [RFC6562]. It is RECOMMENDED that ZRTP endpoints 4620 follow these guidelines. 4622 9. False ZRTP Packet Rejection 4624 An attacker who is not in the media path may attempt to inject false 4625 ZRTP protocol packets, possibly to effect a denial-of-service attack 4626 or to inject his own media stream into the call. VoIP, by its 4627 nature, invites various forms of denial-of-service attacks and 4628 requires protocol features to reject such attacks. While bogus SRTP 4629 packets may be easily rejected via the SRTP auth tag field, that can 4630 only be applied after a key agreement is completed. During the ZRTP 4631 key negotiation phase, other false packet rejection mechanisms are 4632 needed. One such mechanism is the use of the total_hash in the final 4633 shared secret calculation, but that can only detect false packets 4634 after performing the computationally expensive Diffie-Hellman 4635 calculation. 4637 A lot of work has been done on the analysis of denial-of-service 4638 attacks, especially from attackers who are not in the media path. 4639 Such an attacker might inject false ZRTP packets to force a ZRTP 4640 endpoint to engage in an endless series of pointless and expensive DH 4641 calculations. To detect and reject false packets cheaply and rapidly 4642 as soon as they are received, ZRTP uses a one-way hash chain, which 4643 is a series of successive hash images. Before each session, the 4644 following values are computed: 4646 H0 = 256-bit random nonce (different for each party) 4648 H1 = hash (H0) 4650 H2 = hash (H1) 4652 H3 = hash (H2) 4654 This one-way hash chain MUST use the hash algorithm defined in 4655 Section 5.1.2.2, truncated to 256 bits. Each 256-bit hash image is 4656 the preimage of the next, and the sequence of images is sent in 4657 reverse order in the ZRTP packet sequence. The hash image H3 is sent 4658 in the Hello message, H2 is sent in the Commit message, H1 is sent in 4659 the DHPart1 or DHPart2 messages, and H0 is sent in the Confirm1 or 4660 Confirm2 messages. The initial random H0 nonces that each party 4661 generates MUST be unpredictable to an attacker and unique within a 4662 ZRTP session, which thereby forces the derived hash images H1-H3 to 4663 also be unique and unpredictable. 4665 The recipient checks if the packet has the correct hash preimage, by 4666 hashing it and comparing the result with the hash image for the 4667 preceding packet. Packets that contain an incorrect hash preimage 4668 MUST NOT be used by the recipient, but they MAY be processed as 4669 security exceptions, perhaps by logging or alerting the user. As 4670 long as these bogus packets are not used, and correct packets are 4671 still being received, the protocol SHOULD be allowed to run to 4672 completion, thereby rendering ineffective this denial-of-service 4673 attack. 4675 Note that since H2 is sent in the Commit message, and the initiator 4676 does not receive a Commit message, the initiator computes the 4677 responder's missing H2 by hashing the responder's H1. An analogous 4678 interpolation is performed by both parties to handle the skipped 4679 DHPart1 and DHPart2 messages in Preshared (Section 3.1.2) or 4680 Multistream (Section 3.1.3) modes. 4682 Because these hash images alone do not protect the rest of the 4683 contents of the packet they reside in, this scheme assumes the 4684 attacker cannot modify the packet contents from a legitimate party, 4685 which is a reasonable assumption for an attacker who is not in the 4686 media path. This covers an important range of denial-of-service 4687 attacks. For dealing with the remaining set of attacks that involve 4688 packet modification, other mechanisms are used, such as the 4689 total_hash in the final shared secret calculation, and the hash 4690 commitment in the Commit message. 4692 Hello messages injected by an attacker may be detected and rejected 4693 by the inclusion of a hash of the Hello message in the signaling, as 4694 described in Section 8. This mechanism requires that each Hello 4695 message be unique, and the inclusion of the H3 hash image meets that 4696 requirement. 4698 If and only if an integrity-protected signaling channel is available, 4699 the MACs that are keyed by this hash chaining scheme can be used to 4700 authenticate the entire ZRTP key exchange, and thereby prevent a MiTM 4701 attack, without relying on the users verbally comparing the SAS. See 4702 Section 8.1.1 for details. 4704 Some ZRTP user agents allow the user to manually switch to clear mode 4705 (via the GoClear message) in the middle of a secure call, and then 4706 later initiate secure mode again. Many consumer client products will 4707 omit this feature, but those that allow it may return to secure mode 4708 again in the same media stream. Although the same chain of hash 4709 images will be reused and thus rendered ineffective the second time, 4710 no real harm is done because the new SRTP session keys will be 4711 derived in part from a cached shared secret, which was safely 4712 protected from the MiTM in the previous DH exchange earlier in the 4713 same session. 4715 10. Intermediary ZRTP Devices 4717 This section discusses the operation of a ZRTP endpoint that is 4718 actually an intermediary. For example, consider a device that 4719 proxies both signaling and media between endpoints. There are three 4720 possible ways in which such a device could support ZRTP. 4722 An intermediary device can act transparently to the ZRTP protocol. 4723 To do this, a device MUST pass non-RTP protocols multiplexed on the 4724 same port as RTP (to allow ZRTP and STUN). This is the RECOMMENDED 4725 behavior for intermediaries as ZRTP and SRTP are best when done end- 4726 to-end. 4728 An intermediary device could implement the ZRTP protocol and act as a 4729 ZRTP endpoint on behalf of non-ZRTP endpoints behind the intermediary 4730 device. The intermediary could determine on a call-by-call basis 4731 whether the endpoint behind it supports ZRTP based on the presence or 4732 absence of the ZRTP SDP attribute flag (a=zrtp-hash). For non-ZRTP 4733 endpoints, the intermediary device could act as the ZRTP endpoint 4734 using its own ZID and cache. This approach SHOULD only be used when 4735 there is some other security method protecting the confidentiality of 4736 the media between the intermediary and the inside endpoint, such as 4737 IPsec or physical security. 4739 The third mode, which is NOT RECOMMENDED, is for the intermediary 4740 device to attempt to back-to-back the ZRTP protocol. The only 4741 exception to this case is where the intermediary device is a trusted 4742 element providing services to one of the endpoints -- e.g., a Private 4743 Branch Exchange or PBX. In this mode, the intermediary would attempt 4744 to act as a ZRTP endpoint towards both endpoints of the media 4745 session. This approach MUST NOT be used except as described in 4746 Section 7.3 as it will always result in a detected MiTM attack and 4747 will generate alarms on both endpoints and likely result in the 4748 immediate termination of the session. The PBX MUST uses a single ZID 4749 for all endpoints behind it. 4751 In cases where centralized media mixing is taking place, the SAS will 4752 not match when compared by the humans. This situation can sometimes 4753 be known in the SIP signaling by the presence of the isfocus feature 4754 tag [RFC4579]. As a result, when the isfocus feature tag is present, 4755 the DH exchange can be authenticated by the mechanism defined in 4756 Section 8.1.1 or by validating signatures (Section 7.2) in the 4757 Confirm or SASrelay messages. For example, consider an audio 4758 conference call with three participants Alice, Bob, and Carol hosted 4759 on a conference bridge in Dallas. There will be three ZRTP encrypted 4760 media streams, one encrypted stream between each participant and 4761 Dallas. Each will have a different SAS. Each participant will be 4762 able to validate their SAS with the conference bridge by using 4763 signatures optionally present in the Confirm messages (described in 4764 Section 7.2). Or, if the signaling path has end-to-end integrity 4765 protection, each DH exchange will have automatic MiTM protection by 4766 using the mechanism in Section 8.1.1. 4768 SIP feature tags can also be used to detect if a session is 4769 established with an automaton such as an Interactive Voice Response 4770 (IVR), voicemail system, or speech recognition system. The display 4771 of SAS strings to users should be disabled in these cases. 4773 It is possible that an intermediary device acting as a ZRTP endpoint 4774 might still receive ZRTP Hello and other messages from the inside 4775 endpoint. This could occur if there is another inline ZRTP device 4776 that does not include the ZRTP SDP attribute flag. An intermediary 4777 acting as a ZRTP endpoint receiving ZRTP Hello and other messages 4778 from the inside endpoint MUST NOT pass these ZRTP messages. 4780 10.1. On Reducing PBX MiTM Behavior 4782 ZRTP is designed to negotiate session keys directly between two 4783 users, and to detect a man-in-the-middle (MiTM) attack. A PBX often 4784 tries to be a MiTM, as part of its natural functionality. This 4785 creates a conflict between the objectives of a ZRTP client and the 4786 objectives of a PBX. This conflict may be resolved by using the 4787 trusted MiTM mechanism (Section 7.3), but this adds complexity and 4788 only works well between users of a single trusted PBX. It can be 4789 stretched further to handle calls between two PBXs trusted by their 4790 respective local users, but breaks down if more intermediaries are 4791 involved. It also imposes a cognitive burden on the user, who may 4792 not be aware of the security properties or trustworthiness of all the 4793 intermediaries. 4795 The client usually prefers to negotiate ZRTP end-to-end with the 4796 other client, without exposing the keys or plaintext to the PBX, and 4797 use the PBX as a trusted MiTM only when necessary. A PBX should 4798 allow this whenever possible, even if the clients trust the PBX. 4800 The PBX may avoid acting as a MiTM either by allowing the media to 4801 completely bypass the PBX, with the two clients routing their media 4802 peer-to-peer, or by acting as a media relay in a manner similar to a 4803 TURN server. The advantages of the latter approach are mainly to 4804 facilitate NAT traversal. If only one of the two parties is a ZRTP 4805 endpoint, and the PBX is capable of serving as a ZRTP endpoint, the 4806 PBX MUST attempt to negotiate a ZRTP session with the client that 4807 supports ZRTP, so that at least one leg of the call is secure. This 4808 is a far better choice than directly connecting the media streams 4809 between a ZRTP client and a non-ZRTP client, and having the ZRTP 4810 negotiation fail completely. 4812 The PBX SHOULD make best efforts to not act as a MiTM if the PBX has 4813 evidence that both VoIP clients support ZRTP. Evidence of ZRTP 4814 support is best indicated by the presence of the optional a=zrtp-hash 4815 attribute (Section 8) in the signaling layer of both the caller and 4816 callee. Evidence of ZRTP support or non-support in the clients may 4817 also be available to the PBX in the form of configuration information 4818 stored in the PBX. 4820 If the client sends the a=zrtp-hash attribute, and the PBX acts as a 4821 MiTM nonetheless, the client SHOULD alert the user to the fact that 4822 the security level is less than expected. The client can readily 4823 detect this condition by receiving an SASrelay message (Figure 16) 4824 from the PBX. The severity of the alert is left to the application, 4825 which would be relying on the trusted MiTM mechanism. 4827 A PBX should not act as a MiTM unless there is a compelling reason to 4828 do so. Transcoding is fundamentally incompatible with end-to-end 4829 secure media. It should be done only when there is no alternative, 4830 such as when the two ZRTP endpoints do not share a common codec. 4831 ZRTP clients should implement a repertoire of codecs sufficient to 4832 minimize the need for PBX transcoding. Transcoding between two ZRTP 4833 clients forces a PBX to act as a MiTM. If only one media stream 4834 needs transcoding in a multimedia session, all of the media streams 4835 in that session must be handled in MiTM mode. 4837 If there is more than one media stream in a session between two ZRTP 4838 endpoints, a PBX MUST either act as a MiTM for all of them, or for 4839 none of them. This is because all the media streams between two ZRTP 4840 endpoints must share the same SAS (Section 7), due to the use of 4841 Multistream mode (Section 3.1.3). This includes the related RTCP/ 4842 SRTCP streams. 4844 A PBX may forgo end-to-end security and choose the MiTM mode for 4845 policy reasons. An institution may choose to present a single ZRTP 4846 endpoint to the outside world, through its locally trusted PBX. Or, 4847 a client application may explicitly request a PBX to act as a MiTM 4848 for a particular call, for example via a special dial prefix. 4850 It's especially harmful if a PBX that lacks its own ZRTP stack 4851 performs unnecessary transcoding between two ZRTP endpoints, ruling 4852 out the possibility of any secure connection at all. Not even the 4853 trusted MiTM mechanism is available, because this PBX is incapable of 4854 acting as a back-to-back ZRTP MiTM. Even if the PBX avoids 4855 transcoding, it might terminate the media streams for other reasons, 4856 reasons that are likely to be less important than the clients' need 4857 for a secure call. If this kind of PBX sees the a=zrtp-hash 4858 attribute in the caller's signaling, and the two clients share at 4859 least one common codec, the PBX should at least attempt to do no 4860 harm, and get out of the way of ZRTP. Let the users speak Navajo 4861 with each other if they want. 4863 A common usage scenario for a ZRTP-enabled PBX is for a VoIP client 4864 to call a PBX trusted by the client, in order to bridge to a PSTN 4865 gateway in or near the PBX. In such a case, the PBX SHOULD act as a 4866 ZRTP endpoint so that the VoIP leg of the call is secured. The call 4867 should be regarded as not secure past the ZRTP endpoint closest to 4868 the PSTN gateway. If the PSTN gateway is distant from the PBX, the 4869 PBX should provide a secure connection to the PSTN gateway, perhaps 4870 through a VPN connection. Even then, the call becomes vulnerable 4871 when it enters the PSTN. Nonetheless, this would be appropriate for 4872 a caller who originates his ZRTP session from a hostile environment, 4873 but is less concerned about the wiretap threat near the PSTN gateway. 4875 11. The ZRTP Disclosure Flag 4877 There are no back doors defined in the ZRTP protocol specification. 4878 The designers of ZRTP would like to discourage back doors in ZRTP- 4879 enabled products. However, despite the lack of back doors in the 4880 actual ZRTP protocol, it must be recognized that a ZRTP implementer 4881 might still deliberately create a rogue ZRTP-enabled product that 4882 implements a back door outside the scope of the ZRTP protocol. For 4883 example, they could create a product that discloses the SRTP session 4884 key generated using ZRTP out-of-band to a third party. They may even 4885 have a legitimate business reason to do this for some customers. 4887 For example, some environments have a need to monitor or record 4888 calls, such as stock brokerage houses who want to discourage insider 4889 trading, or special high-security environments with special needs to 4890 monitor their own phone calls. We've all experienced automated 4891 messages telling us that "This call may be monitored for quality 4892 assurance". A ZRTP endpoint in such an environment might 4893 unilaterally disclose the session key to someone monitoring the call. 4894 ZRTP-enabled products that perform such out-of-band disclosures of 4895 the session key can undermine public confidence in the ZRTP protocol, 4896 unless we do everything we can in the protocol to alert the other 4897 user that this is happening. 4899 If one of the parties is using a product that is designed to disclose 4900 their session key, ZRTP requires them to confess this fact to the 4901 other party through a protocol message to the other party's ZRTP 4902 client, which can properly alert that user, perhaps by rendering it 4903 in a graphical user interface. The disclosing party does this by 4904 sending a Disclosure flag (D) in Confirm1 and Confirm2 messages as 4905 described in Section 5.7. 4907 Note that the intention here is to have the Disclosure flag identify 4908 products that are designed to disclose their session keys, not to 4909 identify which particular calls are compromised on a call-by-call 4910 basis. This is an important legal distinction, because most 4911 government sanctioned wiretap regulations require a VoIP service 4912 provider to not reveal which particular calls are wiretapped. But 4913 there is nothing illegal about revealing that a product is designed 4914 to be wiretap-friendly. The ZRTP protocol mandates that such a 4915 product "out" itself. 4917 You might be using a ZRTP-enabled product with no back doors, but if 4918 your own graphical user interface tells you the call is (mostly) 4919 secure, except that the other party is using a product that is 4920 designed in such a way that it may have disclosed the session key for 4921 monitoring purposes, you might ask him what brand of secure telephone 4922 he is using, and make a mental note not to purchase that brand 4923 yourself. If we create a protocol environment that requires such 4924 back-doored phones to confess their nature, word will spread quickly, 4925 and the "invisible hand" of the free market will act. The free 4926 market has effectively dealt with this in the past. 4928 Of course, a ZRTP implementer can lie about his product having a back 4929 door, but the ZRTP standard mandates that ZRTP-compliant products 4930 MUST adhere to the requirement that a back door be confessed by 4931 sending the Disclosure flag to the other party. 4933 There will be inevitable comparisons to Steve Bellovin's 2003 April 4934 fool joke, when he submitted RFC 3514 [RFC3514], which defined the 4935 "Evil bit" in the IPv4 header, for packets with "evil intent". But 4936 we submit that a similar idea can actually have some merit for 4937 securing VoIP. Sure, one can always imagine that some implementer 4938 will not be fazed by the rules and will lie, but they would have lied 4939 anyway even without the Disclosure flag. There are good reasons to 4940 believe that it will improve the overall percentage of 4941 implementations that at least tell us if they put a back door in 4942 their products, and may even get some of them to decide not to put in 4943 a back door at all. From a civic hygiene perspective, we are better 4944 off with having the Disclosure flag in the protocol. 4946 If an endpoint stores or logs SRTP keys or information that can be 4947 used to reconstruct or recover SRTP keys after they are no longer in 4948 use (i.e., the session is active), or otherwise discloses or passes 4949 SRTP keys or information that can be used to reconstruct or recover 4950 SRTP keys to another application or device, the Disclosure flag D 4951 MUST be set in the Confirm1 or Confirm2 message. 4953 11.1. Guidelines on Proper Implementation of the Disclosure Flag 4955 Some implementers have asked for guidance on implementing the 4956 Disclosure flag. Some people have incorrectly thought that a 4957 connection secured with ZRTP cannot be used in a call center, with 4958 voluntary voice recording, or even with a voicemail system. 4959 Similarly, some potential users of ZRTP have over considered the 4960 protection that ZRTP can give them. These guidelines clarify both 4961 concerns. 4963 The ZRTP Disclosure flag only governs the ZRTP/SRTP stream itself. 4964 It does not govern the underlying RTP media stream, nor the actual 4965 media itself. Consequently, a PBX that uses ZRTP may provide 4966 conference calls, call monitoring, call recording, voicemail, or 4967 other PBX features and still say that it does not disclose the ZRTP 4968 key material. A video system may provide DVR features and still say 4969 that it does not disclose the ZRTP key material. The ZRTP Disclosure 4970 flag, when not set, means only that the ZRTP cryptographic key 4971 material stays within the bounds of the ZRTP subsystem. 4973 If an application has a need to disclose the ZRTP cryptographic key 4974 material, the easiest way to comply with the protocol is to set the 4975 flag to the proper value. The next easiest way is to overestimate 4976 disclosure. For example, a call center that commonly records calls 4977 might choose to set the Disclosure flag even though all recording is 4978 an analog recording of a call (and thus outside the ZRTP scope) 4979 because it sets an expectation with clients that their calls might be 4980 recorded. 4982 Note also that the ZRTP Disclosure Flag does not require an 4983 implementation to preclude hacking or malware. Malware that leaks 4984 ZRTP cryptographic key material does not create a liability for the 4985 implementer from non-compliance with the ZRTP specification. 4987 A user of ZRTP should note that ZRTP is not a panacea against 4988 unauthorized recording. ZRTP does not and cannot protect against an 4989 untrustworthy partner who holds a microphone up to the speaker. It 4990 does not protect against someone else being in the room. It does not 4991 protect against analog wiretaps in the phone or in the room. It does 4992 not mean your partner has not been hacked with spyware. It does not 4993 mean that the software has no flaws. It means that the ZRTP 4994 subsystem is not knowingly leaking ZRTP cryptographic key material. 4996 12. Mapping between ZID and AOR (SIP URI) 4998 The role of the ZID in the management of the local cache of shared 4999 secrets is explained in Section 4.9. A particular ZID is associated 5000 with a particular ZRTP endpoint, typically a VoIP client. A single 5001 SIP URI (also known as an Address-of-Record, or AOR) may be hosted on 5002 several different soft VoIP clients, desktop phones, and mobile 5003 handsets, and each of them will have a different ZID. Further, a 5004 single VoIP client may have several SIP URIs configured into its 5005 profiles, but only one ZID. There is not a one-to-one mapping 5006 between a ZID and a SIP URI. A single SIP URI may be associated with 5007 several ZIDs, and a single ZID may be associated with several SIP 5008 URIs on the same client. 5010 Not only that, but ZRTP is independent of which signaling protocol is 5011 used. It works equally well with SIP, Jingle, H.323, or any 5012 proprietary signaling protocol. Thus, a ZRTP ZID has little to do 5013 with SIP, per se, which means it has little to do with a SIP URI. 5015 Even though a ZID is associated with a device, not a human, it is 5016 often the case that a ZRTP endpoint is controlled mainly by a 5017 particular human. For example, it may be a mobile phone. For the 5018 key continuity features (Section 15.1) to be effective, a local cache 5019 entry (and thus a ZID) should be associated with some sort of name of 5020 the remote party. That name could be a human name, or it could be 5021 made more precise by specifying which ZRTP endpoint he's using. For 5022 example "Jon Callas", or "Jon Callas on his iPhone", or "Jon on his 5023 iPad", or "Alice on her office phone". These name strings can be 5024 stored in the local cache, indexed by ZID, and may have been 5025 initially provided by the local user by hand. Or the local cache 5026 entry may contain a pointer to an entry in the local address book. 5027 When a secure session is established, if a prior session has 5028 established a cache entry, and the new session has a matching cache 5029 entry indexed by the same ZID, and the SAS has been previously 5030 verified, the person's name stored in that cache entry should be 5031 displayed. 5033 It is absolutely essential to have these human-readable names 5034 associated with cache entries. If the cache is implemented without 5035 them, it opens the door to a simple form of MiTM attack. An attacker 5036 who has previously established a cache entry with both parties (or 5037 simply captures a phone that has) can later act as a MiTM between 5038 those two parties without triggering a cache mismatch, which means 5039 the users will not be alerted to do an SAS compare. This MiTM attack 5040 would be easily detected if the name stored with the cache entry is 5041 displayed for the user, so that the user can readily see that he is 5042 not connected to the remote party he expected. 5044 If the remote ZID originates from a PBX, the displayed name would be 5045 the name of that PBX, which might be the name of the company who owns 5046 that PBX. 5048 If it is desirable to associate some key material with a particular 5049 AOR, digital signatures (Section 7.2) may be used, with public key 5050 certificates that associate the signature key with an AOR. If more 5051 than one ZRTP endpoint shares the same AOR, they may all use the same 5052 signature key and provide the same public key certificate with their 5053 signatures. 5055 13. IANA Considerations 5057 This specification defines a new SDP [RFC4566] attribute in 5058 Section 8. 5060 Contact name: Philip Zimmermann 5062 Attribute name: "zrtp-hash" 5064 Type of attribute: Media level 5066 Subject to charset: Not 5068 Purpose of attribute: The 'zrtp-hash' indicates that a UA supports 5069 the ZRTP protocol and provides a hash of the 5070 ZRTP Hello message. The ZRTP protocol 5071 version number is also specified. 5073 Allowed attribute values: Hex 5075 14. Media Security Requirements 5077 This section discuses how ZRTP meets all RTP security requirements 5078 discussed in the Media Security Requirements [RFC5479] document 5079 without any dependencies on other protocols or extensions, unlike 5080 DTLS-SRTP [RFC5764] which requires additional protocols and 5081 mechanisms. 5083 R-FORK-RETARGET is met since ZRTP is a media path key agreement 5084 protocol. 5086 R-DISTINCT is met since ZRTP uses ZIDs and allows multiple 5087 independent ZRTP exchanges to proceed. 5089 R-HERFP is met since ZRTP is a media path key agreement protocol. 5091 R-REUSE is met using the Multistream and Preshared modes. 5093 R-AVOID-CLIPPING is met since ZRTP is a media path key agreement 5094 protocol. 5096 R-RTP-CHECK is met since the ZRTP packet format does not pass the 5097 RTP validity check. 5099 R-ASSOC is met using the a=zrtp-hash SDP attribute in INVITEs and 5100 responses (Section 8.1). 5102 R-NEGOTIATE is met using the Commit message. 5104 R-PSTN is met since ZRTP can be implemented in Gateways. 5106 R-PFS is met using ZRTP Diffie-Hellman key agreement methods. 5108 R-COMPUTE is met using the Hello/Commit ZRTP exchange. 5110 R-CERTS is met using the verbal comparison of the SAS. 5112 R-FIPS is met since ZRTP uses only FIPS-approved algorithms in all 5113 relevant categories. The authors believe ZRTP is compliant with 5114 [NIST-SP800-56A], [NIST-SP800-108], [FIPS-198-1], [FIPS-180-3], 5115 [NIST-SP800-38A], [FIPS-197], and [NSA-Suite-B], which should meet 5116 the FIPS-140 validation requirements set by [FIPS-140-2-Annex-A] 5117 and [FIPS-140-2-Annex-D]. 5119 R-DOS is met since ZRTP does not introduce any new denial-of- 5120 service attacks. 5122 R-EXISTING is met since ZRTP can support the use of certificates 5123 or keys. 5125 R-AGILITY is met since the set of hash, cipher, SRTP 5126 authentication tag type, key agreement method, SAS type, and 5127 signature type can all be extended and negotiated. 5129 R-DOWNGRADE is met since ZRTP has protection against downgrade 5130 attacks. 5132 R-PASS-MEDIA is met since ZRTP prevents a passive adversary with 5133 access to the media path from gaining access to keying material 5134 used to protect SRTP media packets. 5136 R-PASS-SIG is met since ZRTP prevents a passive adversary with 5137 access to the signaling path from gaining access to keying 5138 material used to protect SRTP media packets. 5140 R-SIG-MEDIA is met using the a=zrtp-hash SDP attribute in INVITEs 5141 and responses. 5143 R-ID-BINDING is met using the a=zrtp-hash SDP attribute 5144 (Section 8.1). 5146 R-ACT-ACT is met using the a=zrtp-hash SDP attribute in INVITEs 5147 and responses. 5149 R-BEST-SECURE is met since ZRTP utilizes the RTP/AVP profile and 5150 hence best effort SRTP in every case. 5152 R-OTHER-SIGNALING is met since ZRTP can utilize modes in which 5153 there is no dependency on the signaling path. 5155 R-RECORDING is met using the ZRTP Disclosure flag. 5157 R-TRANSCODER is met if the transcoder operates as a trusted MitM 5158 (i.e., a PBX). 5160 R-ALLOW-RTP is met due to ZRTP's best effort encryption. 5162 15. Changes From RFC 6189 5164 This section summarizes the most important changes between this 5165 document and RFC 6189. Note that this document uses the same 5166 version, version "1.10", as RFC 6819 since the changes are backwards 5167 compatible, with one exception noted below. 5169 Additional details on cache updating. (See Section 4.6.1 and 5170 Section 12) 5172 Bits 4 and 5 of ZRTP message format changed from unused and set to 5173 zero, to set to zero. This is to help demultiplex ZRTP from RTP, 5174 STUN. (See Section 5) 5176 Skein auth tag clarification. (See Section 5.1.4) 5178 Relayed V flag clarification. (See Section 5.13) 5180 New section on Rational for Ping messages. (See Section 5.15.1) 5182 Clarification on empty SASrelay messages. (See Section 7.3) 5184 Clarification on PBX enrollment. (See Section 7.3.1) 5186 New section on Automated Methods of Authenticating the DH 5187 Exchange. (See Section 7.4) 5189 Clarification on leveraging an integrity protected SIP signaling 5190 channel. (See Section 8.1.1) 5192 New section on Combining ZRTP and SDP Security Descriptions 5193 (SDES). Note that this section has the only non-backwards 5194 compatible change from RFC 6189 in this document. (See 5195 Section 8.2) 5197 New section On Reducing PBX MiTM Behavior. (See Section 10.1) 5199 16. Security Considerations 5201 This document is all about securely keying SRTP sessions. As such, 5202 security is discussed in every section. 5204 Most secure phones rely on a Diffie-Hellman exchange to agree on a 5205 common session key. But since DH is susceptible to a MiTM attack, it 5206 is common practice to provide a way to authenticate the DH exchange. 5207 In some military systems, this is done by depending on digital 5208 signatures backed by a centrally managed PKI. A decade of industry 5209 experience has shown that deploying centrally managed PKIs can be a 5210 painful and often futile experience. PKIs are just too messy and 5211 require too much activation energy to get them started. Setting up a 5212 PKI requires somebody to run it, which is not practical for an 5213 equipment provider. A service provider, like a carrier, might 5214 venture down this path, but even then you have to deal with cross- 5215 carrier authentication, certificate revocation lists, and other 5216 complexities. It is much simpler to avoid PKIs altogether, 5217 especially when developing secure commercial products. It is 5218 therefore more common for commercial secure phones in the PSTN world 5219 to augment the DH exchange with a Short Authentication String (SAS) 5220 combined with a hash commitment at the start of the key exchange, to 5221 shorten the length of SAS material that must be read aloud. No PKI 5222 is required for this approach to authenticating the DH exchange. The 5223 AT&T TSD 3600, Eric Blossom's COMSEC secure phones [comsec], 5224 [PGPfone], and the GSMK CryptoPhone are all examples of products that 5225 took this simpler lightweight approach. The main problem with this 5226 approach is inattentive users who may not execute the voice 5227 authentication procedure. 5229 Some questions have been raised about voice spoofing during the short 5230 authentication string (SAS) comparison. But it is a mistake to think 5231 this is simply an exercise in voice impersonation (perhaps this could 5232 be called the "Rich Little" attack). Although there are digital 5233 signal processing techniques for changing a person's voice, that does 5234 not mean a MiTM attacker can safely break into a phone conversation 5235 and inject his own SAS at just the right moment. He doesn't know 5236 exactly when or in what manner the users will choose to read aloud 5237 the SAS, or in what context they will bring it up or say it, or even 5238 which of the two speakers will say it, or if indeed they both will 5239 say it. In addition, some methods of rendering the SAS involve using 5240 a list of words such as the PGP word list[Juola2], in a manner 5241 analogous to how pilots use the NATO phonetic alphabet to convey 5242 information. This can make it even more complicated for the 5243 attacker, because these words can be worked into the conversation in 5244 unpredictable ways. If the session also includes video (an 5245 increasingly common usage scenario), the MiTM may be further deterred 5246 by the difficulty of making the lips sync with the voice-spoofed SAS. 5248 The PGP word list is designed to make each word phonetically 5249 distinct, which also tends to create distinctive lip movements. 5250 Remember that the attacker places a very high value on not being 5251 detected, and if he makes a mistake, he doesn't get to do it over. 5253 A question has been raised regarding the safety of the SAS procedure 5254 for people who don't know each other's voices, because it may allow 5255 an attack from a MiTM even if he lacks voice impersonation 5256 capabilities. This is not as much of a problem as it seems, because 5257 it isn't necessary that users recognize each other by their voice. 5258 It is only necessary that they detect that the voice used for the SAS 5259 procedure doesn't match the voice in the rest of the phone 5260 conversation. 5262 Special consideration must be given to secure phone calls with 5263 automated systems that cannot perform a verbal SAS comparison between 5264 two humans (e.g., a voice mail system). If a well-functioning PKI is 5265 available to all parties, it is recommended that credentials be 5266 provisioned at the automated system sufficient to use one of the 5267 automatic MiTM detection mechanisms from Section 8.1.1 or 5268 Section 7.2. Or rely on a previously established cached shared 5269 secret (pbxsecret or rs1 or both), backed by a human-executed SAS 5270 comparison during an initial call. Note that it is worse than 5271 useless and absolutely unsafe to rely on a robot voice from the 5272 remote endpoint to compare the SAS, because a robot voice can be 5273 trivially forged by a MiTM. However, a robot voice may be safe to 5274 use strictly locally for a different purpose. A ZRTP user agent may 5275 render its locally computed SAS to the local user via a robot voice 5276 if no visual display is available, provided the user can readily 5277 determine that the robot voice is generated locally, not from the 5278 remote endpoint. 5280 A popular and field-proven approach to MiTM protection is used by SSH 5281 (Secure Shell) [RFC4251], which Peter Gutmann likes to call the "baby 5282 duck" security model. SSH establishes a relationship by exchanging 5283 public keys in the initial session, when we assume no attacker is 5284 present, and this makes it possible to authenticate all subsequent 5285 sessions. A successful MiTM attacker has to have been present in all 5286 sessions all the way back to the first one, which is assumed to be 5287 difficult for the attacker. ZRTP's key continuity features are 5288 actually better than SSH, at least for VoIP, for reasons described in 5289 Section 16.1. All this is accomplished without resorting to a 5290 centrally managed PKI. 5292 We use an analogous baby duck security model to authenticate the DH 5293 exchange in ZRTP. We don't need to exchange persistent public keys, 5294 we can simply cache a shared secret and re-use it to authenticate a 5295 long series of DH exchanges for secure phone calls over a long period 5296 of time. If we verbally compare just one SAS, and then cache a 5297 shared secret for later calls to use for authentication, no new voice 5298 authentication rituals need to be executed. We just have to remember 5299 we did one already. 5301 If one party ever loses this cached shared secret, it is no longer 5302 available for authentication of DH exchanges. This cache mismatch 5303 situation is easy to detect by the party that still has a surviving 5304 shared secret cache entry. If it fails to match, either there is a 5305 MiTM attack or one side has lost their shared secret cache entry. 5306 The user agent that discovers the cache mismatch must alert the user 5307 that a cache mismatch has been detected, and that he must do a verbal 5308 comparison of the SAS to distinguish if the mismatch is because of a 5309 MiTM attack or because of the other party losing her cache (normative 5310 language is in Section 4.3.2). Voice confirmation is absolutely 5311 essential in this situation. From that point on, the two parties 5312 start over with a new cached shared secret. Then, they can go back 5313 to omitting the voice authentication on later calls. 5315 Precautions must be observed when using a trusted MiTM device such as 5316 a trusted PBX, as described in Section 7.3. Make sure you really 5317 trust that this PBX will never be compromised before establishing it 5318 as a trusted MiTM, because it is in a position to wiretap calls for 5319 any phone that trusts it. It is "licensed" to be in a position to 5320 wiretap. You are safer to try to arrange the connection topology to 5321 route the media directly between the two ZRTP peers, not through a 5322 trusted PBX. Real end-to-end encryption is preferred. 5324 The security of the SAS mechanism depends on the user verifying it 5325 verbally with his peer at the other endpoint. There is some risk the 5326 user will not be so diligent and may ignore the SAS. For a 5327 discussion on how users become habituated to security warnings in the 5328 PKI certificate world, see [Sunshine]. Part of the problems 5329 discussed in that paper are from the habituation syndrome common to 5330 most warning messages, and part of them are from the fact that users 5331 simply don't understand trust models. Fortunately, ZRTP doesn't need 5332 a trust model to use the SAS mechanism, so it's easier for the user 5333 to grasp the idea of comparing the SAS verbally with the other party; 5334 it's easier than understanding a trust model, at least. Also, the 5335 verbal comparison of the SAS gets both users involved, and they will 5336 notice a mismatch of the SAS. Also, the ZRTP user agent will know 5337 when the SAS has been previously verified because of the SAS verified 5338 flag (V) (Section 7.1), and only ask the user to verify it when 5339 needed. After it has been verified once, the key continuity features 5340 make it unnecessary to verify it again. 5342 16.1. Self-Healing Key Continuity Feature 5344 The key continuity features of ZRTP are analogous to those provided 5345 by SSH (Secure Shell) [RFC4251], but they differ in one respect. SSH 5346 caches public signature keys that never change, and uses a permanent 5347 private signature key that must be guarded from disclosure. If 5348 someone steals your SSH private signature key, they can impersonate 5349 you in all future sessions and can mount a successful MiTM attack any 5350 time they want. 5352 ZRTP caches symmetric key material used to compute secret session 5353 keys, and these values change with each session. If someone steals 5354 your ZRTP shared secret cache, they only get one chance to mount a 5355 MiTM attack, in the very next session. If they miss that chance, the 5356 retained shared secret is refreshed with a new value, and the window 5357 of vulnerability heals itself, which means they are locked out of any 5358 future opportunities to mount a MiTM attack. This gives ZRTP a 5359 "self-healing" feature if any cached key material is compromised. 5361 A MiTM attacker must always be in the media path. This presents a 5362 significant operational burden for the attacker in many VoIP usage 5363 scenarios, because being in the media path for every call is often 5364 harder than being in the signaling path. This will likely create 5365 coverage gaps in the attacker's opportunities to mount a MiTM attack. 5366 ZRTP's self-healing key continuity features are better than SSH at 5367 exploiting any temporary gaps in MiTM attack opportunities. Thus, 5368 ZRTP quickly recovers from any disclosure of cached key material. 5370 In systems that use a persistent private signature key, such as SSH, 5371 the stored signature key is usually protected from disclosure by 5372 encryption that requires a user-supplied high-entropy passphrase. 5373 This arrangement may be acceptable for a diligent user with a desktop 5374 computer sitting in an office with a full ASCII keyboard. But it 5375 would be prohibitively inconvenient and unsafe to type a high-entropy 5376 passphrase on a mobile phone's numeric keypad while driving a car. 5377 Users will reject any scheme that requires the use of a passphrase on 5378 such a platform, which means mobile phones carry an elevated risk of 5379 compromise of stored key material, and thus would especially benefit 5380 from the self-healing aspects of ZRTP's key continuity features. 5382 The infamous Debian OpenSSL weak key vulnerability [dsa-1571] 5383 (discovered and patched in May 2008) offers a real-world example of 5384 why ZRTP's self-healing scheme is a good way to do key continuity. 5385 The Debian bug resulted in the production of a lot of weak SSH (and 5386 TLS/SSL) keys, which continued to compromise security even after the 5387 bug had been patched. In contrast, ZRTP's key continuity scheme adds 5388 new entropy to the cached key material with every call, so old 5389 deficiencies in entropy are washed away with each new session. 5391 It should be noted that the addition of shared secret entropy from 5392 previous sessions can extend the strength of the new session key to 5393 AES-256 levels, even if the new session uses Diffie-Hellman keys no 5394 larger than DH-3072 or ECDH-256, provided the cached shared secrets 5395 were initially established when the wiretapper was not present. This 5396 is why AES-256 MAY be used with the smaller DH key sizes in 5397 Section 5.1.5, despite the key strength comparisons in Table 2 of 5398 [NIST-SP800-57-Part1]. 5400 Caching shared symmetric key material is also less CPU intensive 5401 compared with using digital signatures, which may be important for 5402 low-power mobile platforms. 5404 Unlike the long-lived non-updated key material used by SSH, the 5405 dynamically updated shared secrets of ZRTP may lose sync if 5406 traditional backup/restore mechanisms are used. This limitation is a 5407 consequence of the otherwise beneficial aspects of this approach to 5408 key continuity, and it is partially mitigated by ZRTP's built-in 5409 cache backup logic (Section 4.6.1). 5411 17. Acknowledgments 5413 Most of the text in this note comes from the original ZRTP 5414 specification, [RFC6189]. The authors would like to thank Jon Callas 5415 co-author of orginal ZRTP specification and everyone who contributed 5416 to that document. 5418 The authors would like to thank Bryce "Zooko" Wilcox-O'Hearn and 5419 Colin Plumb for their contributions to the design of this protocol. 5420 Also, thanks to Hal Finney, Viktor Krikun, Werner Dittmann, Dan Wing, 5421 Sagar Pai, David McGrew, Colin Perkins, Dan Harkins, David Black, Tim 5422 Polk, Richard Harris, Roni Even, Jon Peterson, and Robert Sparks for 5423 their helpful comments and suggestions. Thanks to Lily Chen at NIST 5424 for her assistance in ensuring compliance with NIST SP800-56A and 5425 SP800-108. 5427 The use of one-way hash chains to key HMACs in ZRTP is similar to 5428 Adrian Perrig's TESLA protocol [TESLA]. 5430 18. References 5432 18.1. Normative References 5434 [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- 5435 Hashing for Message Authentication", RFC 2104, 5436 DOI 10.17487/RFC2104, February 1997, 5437 . 5439 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 5440 Requirement Levels", BCP 14, RFC 2119, 5441 DOI 10.17487/RFC2119, March 1997, 5442 . 5444 [RFC3526] Kivinen, T. and M. Kojo, "More Modular Exponential (MODP) 5445 Diffie-Hellman groups for Internet Key Exchange (IKE)", 5446 RFC 3526, DOI 10.17487/RFC3526, May 2003, 5447 . 5449 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 5450 Jacobson, "RTP: A Transport Protocol for Real-Time 5451 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 5452 July 2003, . 5454 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 5455 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 5456 RFC 3711, DOI 10.17487/RFC3711, March 2004, 5457 . 5459 [RFC4231] Nystrom, M., "Identifiers and Test Vectors for HMAC-SHA- 5460 224, HMAC-SHA-256, HMAC-SHA-384, and HMAC-SHA-512", 5461 RFC 4231, DOI 10.17487/RFC4231, December 2005, 5462 . 5464 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 5465 Description Protocol", RFC 4566, DOI 10.17487/RFC4566, 5466 July 2006, . 5468 [RFC4880] Callas, J., Donnerhacke, L., Finney, H., Shaw, D., and R. 5469 Thayer, "OpenPGP Message Format", RFC 4880, 5470 DOI 10.17487/RFC4880, November 2007, 5471 . 5473 [RFC4960] Stewart, R., Ed., "Stream Control Transmission Protocol", 5474 RFC 4960, DOI 10.17487/RFC4960, September 2007, 5475 . 5477 [RFC5114] Lepinski, M. and S. Kent, "Additional Diffie-Hellman 5478 Groups for Use with IETF Standards", RFC 5114, 5479 DOI 10.17487/RFC5114, January 2008, 5480 . 5482 [RFC5479] Wing, D., Ed., Fries, S., Tschofenig, H., and F. Audet, 5483 "Requirements and Analysis of Media Security Management 5484 Protocols", RFC 5479, DOI 10.17487/RFC5479, April 2009, 5485 . 5487 [RFC5759] Solinas, J. and L. Zieglar, "Suite B Certificate and 5488 Certificate Revocation List (CRL) Profile", RFC 5759, 5489 DOI 10.17487/RFC5759, January 2010, 5490 . 5492 [RFC6188] McGrew, D., "The Use of AES-192 and AES-256 in Secure 5493 RTP", RFC 6188, DOI 10.17487/RFC6188, March 2011, 5494 . 5496 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 5497 (TLS) Protocol Version 1.2", RFC 5246, 5498 DOI 10.17487/RFC5246, August 2008, 5499 . 5501 [RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of 5502 Variable Bit Rate Audio with Secure RTP", RFC 6562, 5503 DOI 10.17487/RFC6562, March 2012, 5504 . 5506 [FIPS-140-2-Annex-A] 5507 "Annex A: Approved Security Functions for FIPS PUB 140-2", 5508 NIST FIPS PUB 140-2 Annex A, January 2011. 5510 [FIPS-140-2-Annex-D] 5511 "Annex D: Approved Key Establishment Techniques for FIPS 5512 PUB 140-2", NIST FIPS PUB 140-2 Annex D, January 2011. 5514 [FIPS-180-3] 5515 "Secure Hash Standard (SHS)", NIST FIPS PUB 180-3, October 5516 2008. 5518 [FIPS-186-3] 5519 "Digital Signature Standard (DSS)", NIST FIPS PUB 5520 186-3, June 2009. 5522 [FIPS-197] 5523 "Advanced Encryption Standard (AES)", NIST FIPS PUB 5524 197, November 2001. 5526 [FIPS-198-1] 5527 "The Keyed-Hash Message Authentication Code (HMAC)", NIST 5528 FIPS PUB 198-1, July 2008. 5530 [NIST-SP800-38A] 5531 Dworkin, M., "Recommendation for Block Cipher Modes of 5532 Operation", NIST Special Publication 800-38A, 2001 5533 Edition. 5535 [NIST-SP800-56A] 5536 Barker, E., Johnson, D., and M. Smid, "Recommendation for 5537 Pair-Wise Key Establishment Schemes Using Discrete 5538 Logarithm Cryptography", NIST Special Publication 5539 800-56A Revision 1, March 2007. 5541 [NIST-SP800-90] 5542 Barker, E. and J. Kelsey, "Recommendation for Random 5543 Number Generation Using Deterministic Random Bit 5544 Generators", NIST Special Publication 800-90 (Revised), 5545 March 2007. 5547 [NIST-SP800-108] 5548 Chen, L., "Recommendation for Key Derivation Using 5549 Pseudorandom Functions", NIST Special Publication 5550 800-108, October 2009. 5552 [NSA-Suite-B] 5553 "NSA Suite B Cryptography", NSA Information Assurance 5554 Directorate, NSA Suite B Cryptography. 5556 [NSA-Suite-B-Guide-56A] 5557 "Suite B Implementer's Guide to NIST SP 800-56A", Suite B 5558 Implementer's Guide to NIST SP 800-56A, 28 July 2009. 5560 [TwoFish] Schneier, B., Kelsey, J., Whiting, D., Hall, C., and N. 5561 Ferguson, "Twofish: A 128-Bit Block Cipher", June 1998, 5562 . 5564 [Skein] Ferguson, N., Lucks, S., Schneier, B., Whiting, D., 5565 Bellare, M., Kohno, T., Callas, J., and J. Walker, "The 5566 Skein Hash Function Family, Version 1.3 - 1 Oct 2010", 5567 . 5570 [pgpwordlist] 5571 "PGP Word List", December 2010, 5572 . 5575 [I-D.ietf-avtcore-rfc5764-mux-fixes] 5576 Petit-Huguenin, M. and G. Salgueiro, "Multiplexing Scheme 5577 Updates for Secure Real-time Transport Protocol (SRTP) 5578 Extension for Datagram Transport Layer Security (DTLS)", 5579 draft-ietf-avtcore-rfc5764-mux-fixes-10 (work in 5580 progress), July 2016. 5582 18.2. Informative References 5584 [RFC6189] Zimmermann, P., Johnston, A., Ed., and J. Callas, "ZRTP: 5585 Media Path Key Agreement for Unicast Secure RTP", 5586 RFC 6189, DOI 10.17487/RFC6189, April 2011, 5587 . 5589 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 5590 DOI 10.17487/RFC1191, November 1990, 5591 . 5593 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 5594 for IP version 6", RFC 1981, DOI 10.17487/RFC1981, August 5595 1996, . 5597 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 5598 A., Peterson, J., Sparks, R., Handley, M., and E. 5599 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 5600 DOI 10.17487/RFC3261, June 2002, 5601 . 5603 [RFC3514] Bellovin, S., "The Security Flag in the IPv4 Header", 5604 RFC 3514, DOI 10.17487/RFC3514, April 2003, 5605 . 5607 [RFC3824] Peterson, J., Liu, H., Yu, J., and B. Campbell, "Using 5608 E.164 numbers with the Session Initiation Protocol (SIP)", 5609 RFC 3824, DOI 10.17487/RFC3824, June 2004, 5610 . 5612 [RFC4086] Eastlake 3rd, D., Schiller, J., and S. Crocker, 5613 "Randomness Requirements for Security", BCP 106, RFC 4086, 5614 DOI 10.17487/RFC4086, June 2005, 5615 . 5617 [RFC4251] Ylonen, T. and C. Lonvick, Ed., "The Secure Shell (SSH) 5618 Protocol Architecture", RFC 4251, DOI 10.17487/RFC4251, 5619 January 2006, . 5621 [RFC4474] Peterson, J. and C. Jennings, "Enhancements for 5622 Authenticated Identity Management in the Session 5623 Initiation Protocol (SIP)", RFC 4474, 5624 DOI 10.17487/RFC4474, August 2006, 5625 . 5627 [RFC4475] Sparks, R., Ed., Hawrylyshen, A., Johnston, A., Rosenberg, 5628 J., and H. Schulzrinne, "Session Initiation Protocol (SIP) 5629 Torture Test Messages", RFC 4475, DOI 10.17487/RFC4475, 5630 May 2006, . 5632 [RFC4567] Arkko, J., Lindholm, F., Naslund, M., Norrman, K., and E. 5633 Carrara, "Key Management Extensions for Session 5634 Description Protocol (SDP) and Real Time Streaming 5635 Protocol (RTSP)", RFC 4567, DOI 10.17487/RFC4567, July 5636 2006, . 5638 [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session 5639 Description Protocol (SDP) Security Descriptions for Media 5640 Streams", RFC 4568, DOI 10.17487/RFC4568, July 2006, 5641 . 5643 [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol 5644 (SIP) Call Control - Conferencing for User Agents", 5645 BCP 119, RFC 4579, DOI 10.17487/RFC4579, August 2006, 5646 . 5648 [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, 5649 DOI 10.17487/RFC5117, January 2008, 5650 . 5652 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 5653 (ICE): A Protocol for Network Address Translator (NAT) 5654 Traversal for Offer/Answer Protocols", RFC 5245, 5655 DOI 10.17487/RFC5245, April 2010, 5656 . 5658 [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer 5659 Security (DTLS) Extension to Establish Keys for the Secure 5660 Real-time Transport Protocol (SRTP)", RFC 5764, 5661 DOI 10.17487/RFC5764, May 2010, 5662 . 5664 [RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand 5665 Key Derivation Function (HKDF)", RFC 5869, 5666 DOI 10.17487/RFC5869, May 2010, 5667 . 5669 [RFC6090] McGrew, D., Igoe, K., and M. Salter, "Fundamental Elliptic 5670 Curve Cryptography Algorithms", RFC 6090, 5671 DOI 10.17487/RFC6090, February 2011, 5672 . 5674 [RFC6637] Jivsov, A., "Elliptic Curve Cryptography (ECC) in 5675 OpenPGP", RFC 6637, DOI 10.17487/RFC6637, June 2012, 5676 . 5678 [SRTP-AES-GCM] 5679 McGrew, D., "AES-GCM and AES-CCM Authenticated Encryption 5680 in Secure RTP (SRTP)", Work in Progress, January 2011. 5682 [SIP-IDENTITY] 5683 Wing, D. and H. Kaplan, "SIP Identity using Media Path", 5684 Work in Progress, February 2008. 5686 [NIST-SP800-57-Part1] 5687 Barker, E., Barker, W., Burr, W., Polk, W., and M. Smid, 5688 "Recommendation for Key Management - Part 1: General 5689 (Revised)", NIST Special Publication 800-57 - Part 5690 1 Revised March 2007. 5692 [NIST-SP800-131A] 5693 Barker, E. and A. Roginsky, "Recommendation for the 5694 Transitioning of Cryptographic Algorithms and Key 5695 Lengths", NIST Special Publication 800-131A January 2011. 5697 [SHA-3] "Cryptographic Hash Algorithm Competition", NIST Computer 5698 Security Resource Center Cryptographic Hash Project. 5700 [Skein1] "The Skein Hash Function Family - Web site", 5701 . 5703 [XEP-0262] 5704 Saint-Andre, P., "Use of ZRTP in Jingle RTP Sessions", XSF 5705 XEP 0262, August 2010. 5707 [Ferguson] 5708 Ferguson, N. and B. Schneier, "Practical Cryptography", 5709 Wiley Publishing, 2003. 5711 [Juola1] Juola, P. and P. Zimmermann, "Whole-Word Phonetic 5712 Distances and the PGPfone Alphabet", Proceedings of the 5713 International Conference of Spoken Language Processing 5714 (ICSLP-96), 1996. 5716 [Juola2] Juola, P., "Isolated Word Confusion Metrics and the 5717 PGPfone Alphabet", Proceedings of New Methods in Language 5718 Processing, 1996. 5720 [PGPfone] Zimmermann, P., "PGPfone", July 1996, 5721 . 5723 [Zfone] Zimmermann, P., "Zfone Project", 2006, 5724 . 5726 [Byzantine] 5727 "The Two Generals' Problem", March 2011, 5728 . 5731 [TESLA] Perrig, A., Canetti, R., Tygar, J., and D. Song, "The 5732 TESLA Broadcast Authentication Protocol", October 2002, 5733 . 5736 [comsec] Blossom, E., "The VP1 Protocol for Voice Privacy Devices 5737 Version 1.2", . 5739 [Wright1] Wright, C., Ballard, L., Coull, S., Monrose, F., and G. 5740 Masson, "Spot me if you can: Uncovering spoken phrases in 5741 encrypted VoIP conversations", Proceedings of the 2008 5742 IEEE Symposium on Security and Privacy 2008, 5743 . 5745 [Sunshine] 5746 Sunshine, J., Egelman, S., Almuhimedi, H., Atri, N., and 5747 L. Cranor, "Crying Wolf: An Empirical Study of SSL Warning 5748 Effectiveness", USENIX Security Symposium 2009, 5749 . 5751 [dsa-1571] 5752 "Debian Security Advisory - OpenSSL predictable random 5753 number generator", May 2008, 5754 . 5756 [I-D.johnston-rtcweb-zrtp] 5757 Johnston, A., Zimmermann, P., Callas, J., Cross, T., and 5758 J. Yoakum, "Using ZRTP to Secure WebRTC", draft-johnston- 5759 rtcweb-zrtp-02 (work in progress), July 2015. 5761 Authors' Addresses 5763 Philip Zimmermann 5764 Silent Circle 5765 Geneva, Switzerland 5767 EMail: prz@mit.edu 5768 URI: http://philzimmermann.com 5769 Alan Johnston (editor) 5770 Unaffiliated 5771 Bellevue, WA 5773 EMail: alan.b.johnston@gmail.com 5775 Travis Cross 5776 OfficeTone 5778 EMail: tc@traviscross.com