idnits 2.17.1 draft-zimmermann-avt-zrtp-22.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 17, 2010) is 5055 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '4' on line 2379 ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) -- Obsolete informational reference (is this intentional?): RFC 1981 (Obsoleted by RFC 8201) -- Obsolete informational reference (is this intentional?): RFC 4474 (Obsoleted by RFC 8224) -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) -- Obsolete informational reference (is this intentional?): RFC 5245 (Obsoleted by RFC 8445, RFC 8839) Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group P. Zimmermann 3 Internet-Draft Zfone Project 4 Intended status: Informational A. Johnston, Ed. 5 Expires: December 19, 2010 Avaya 6 J. Callas 7 Apple, Inc. 8 June 17, 2010 10 ZRTP: Media Path Key Agreement for Unicast Secure RTP 11 draft-zimmermann-avt-zrtp-22 13 Abstract 15 This document defines ZRTP, a protocol for media path Diffie-Hellman 16 exchange to agree on a session key and parameters for establishing 17 unicast Secure Real-time Transport Protocol (SRTP) sessions for VoIP 18 applications. The ZRTP protocol is media path keying because it is 19 multiplexed on the same port as RTP and does not require support in 20 the signaling protocol. ZRTP does not assume a Public Key 21 Infrastructure (PKI) or require the complexity of certificates in end 22 devices. For the media session, ZRTP provides confidentiality, 23 protection against man-in-the-middle (MiTM) attacks, and, in cases 24 where the signaling protocol provides end-to-end integrity 25 protection, authentication. ZRTP can utilize a Session Description 26 Protocol (SDP) attribute to provide discovery and authentication 27 through the signaling channel. To provide best effort SRTP, ZRTP 28 utilizes normal RTP/AVP profiles. ZRTP secures media sessions which 29 include a voice media stream, and can also secure media sessions 30 which do not include voice by using an optional digital signature. 32 Status of this Memo 34 This Internet-Draft is submitted to IETF in full conformance with the 35 provisions of BCP 78 and BCP 79. 37 Internet-Drafts are working documents of the Internet Engineering 38 Task Force (IETF), its areas, and its working groups. Note that 39 other groups may also distribute working documents as Internet- 40 Drafts. 42 Internet-Drafts are draft documents valid for a maximum of six months 43 and may be updated, replaced, or obsoleted by other documents at any 44 time. It is inappropriate to use Internet-Drafts as reference 45 material or to cite them other than as "work in progress." 47 The list of current Internet-Drafts can be accessed at 48 http://www.ietf.org/ietf/1id-abstracts.txt. 50 The list of Internet-Draft Shadow Directories can be accessed at 51 http://www.ietf.org/shadow.html. 53 This Internet-Draft will expire on December 19, 2010. 55 Copyright Notice 57 Copyright (c) 2010 IETF Trust and the persons identified as the 58 document authors. All rights reserved. 60 This document is subject to BCP 78 and the IETF Trust's Legal 61 Provisions Relating to IETF Documents 62 (http://trustee.ietf.org/license-info) in effect on the date of 63 publication of this document. Please review these documents 64 carefully, as they describe your rights and restrictions with respect 65 to this document. Code Components extracted from this document must 66 include Simplified BSD License text as described in Section 4.e of 67 the Trust Legal Provisions and are provided without warranty as 68 described in the BSD License. 70 This document may contain material from IETF Documents or IETF 71 Contributions published or made publicly available before November 72 10, 2008. The person(s) controlling the copyright in some of this 73 material may not have granted the IETF Trust the right to allow 74 modifications of such material outside the IETF Standards Process. 75 Without obtaining an adequate license from the person(s) controlling 76 the copyright in such materials, this document may not be modified 77 outside the IETF Standards Process, and derivative works of it may 78 not be created outside the IETF Standards Process, except to format 79 it for publication as an RFC or to translate it into languages other 80 than English. 82 Table of Contents 84 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 6 85 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 86 3. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 7 87 3.1. Key Agreement Modes . . . . . . . . . . . . . . . . . . . 8 88 3.1.1. Diffie-Hellman Mode Overview . . . . . . . . . . . . 8 89 3.1.2. Preshared Mode Overview . . . . . . . . . . . . . . . 10 90 3.1.3. Multistream Mode Overview . . . . . . . . . . . . . . 10 91 4. Protocol Description . . . . . . . . . . . . . . . . . . . . 11 92 4.1. Discovery . . . . . . . . . . . . . . . . . . . . . . . . 12 93 4.1.1. Protocol Version Negotiation . . . . . . . . . . . . 12 94 4.1.2. Algorithm Negotiation . . . . . . . . . . . . . . . . 14 95 4.2. Commit Contention . . . . . . . . . . . . . . . . . . . . 15 96 4.3. Matching Shared Secret Determination . . . . . . . . . . 16 97 4.3.1. Calculation and comparison of hashes of shared 98 secrets . . . . . . . . . . . . . . . . . . . . . . . 18 99 4.3.2. Handling a Shared Secret Cache Mismatch . . . . . . . 18 100 4.4. DH and non-DH key agreements . . . . . . . . . . . . . . 20 101 4.4.1. Diffie-Hellman Mode . . . . . . . . . . . . . . . . . 20 102 4.4.1.1. Hash Commitment in Diffie-Hellman Mode . . . . . 21 103 4.4.1.2. Responder Behavior in Diffie-Hellman Mode . . . . 22 104 4.4.1.3. Initiator Behavior in Diffie-Hellman Mode . . . . 22 105 4.4.1.4. Shared Secret Calculation for DH Mode . . . . . . 23 106 4.4.2. Preshared Mode . . . . . . . . . . . . . . . . . . . 25 107 4.4.2.1. Commitment in Preshared Mode . . . . . . . . . . 26 108 4.4.2.2. Initiator Behavior in Preshared Mode . . . . . . 26 109 4.4.2.3. Responder Behavior in Preshared Mode . . . . . . 27 110 4.4.2.4. Shared Secret Calculation for Preshared Mode . . 28 111 4.4.3. Multistream Mode . . . . . . . . . . . . . . . . . . 28 112 4.4.3.1. Commitment in Multistream Mode . . . . . . . . . 29 113 4.4.3.2. Shared Secret Calculation for Multistream Mode . 30 114 4.5. Key Derivations . . . . . . . . . . . . . . . . . . . . . 31 115 4.5.1. The ZRTP Key Derivation Function . . . . . . . . . . 31 116 4.5.2. Deriving ZRTPSess Key and SAS in DH or Preshared 117 modes . . . . . . . . . . . . . . . . . . . . . . . . 32 118 4.5.3. Deriving the rest of the keys from s0 . . . . . . . . 33 119 4.6. Confirmation . . . . . . . . . . . . . . . . . . . . . . 35 120 4.6.1. Updating the Cache of Shared Secrets . . . . . . . . 36 121 4.6.1.1. Cache Update Following a Cache Mismatch . . . . . 36 122 4.7. Termination . . . . . . . . . . . . . . . . . . . . . . . 37 123 4.7.1. Termination via Error message . . . . . . . . . . . . 37 124 4.7.2. Termination via GoClear message . . . . . . . . . . . 38 125 4.7.2.1. Key Destruction for GoClear message . . . . . . . 39 126 4.7.3. Key Destruction at Termination . . . . . . . . . . . 40 127 4.8. Random Number Generation . . . . . . . . . . . . . . . . 40 128 4.9. ZID and Cache Operation . . . . . . . . . . . . . . . . . 41 129 4.9.1. Cacheless implementations . . . . . . . . . . . . . . 42 131 5. ZRTP Messages . . . . . . . . . . . . . . . . . . . . . . . . 42 132 5.1. ZRTP Message Formats . . . . . . . . . . . . . . . . . . 44 133 5.1.1. Message Type Block . . . . . . . . . . . . . . . . . 44 134 5.1.2. Hash Type Block . . . . . . . . . . . . . . . . . . . 45 135 5.1.2.1. Negotiated Hash and MAC algorithm . . . . . . . . 46 136 5.1.2.2. Implicit Hash and MAC algorithm . . . . . . . . . 47 137 5.1.3. Cipher Type Block . . . . . . . . . . . . . . . . . . 47 138 5.1.4. Auth Tag Type Block . . . . . . . . . . . . . . . . . 48 139 5.1.5. Key Agreement Type Block . . . . . . . . . . . . . . 49 140 5.1.6. SAS Type Block . . . . . . . . . . . . . . . . . . . 51 141 5.1.7. Signature Type Block . . . . . . . . . . . . . . . . 52 142 5.2. Hello message . . . . . . . . . . . . . . . . . . . . . . 53 143 5.3. HelloACK message . . . . . . . . . . . . . . . . . . . . 55 144 5.4. Commit message . . . . . . . . . . . . . . . . . . . . . 55 145 5.5. DHPart1 message . . . . . . . . . . . . . . . . . . . . . 58 146 5.6. DHPart2 message . . . . . . . . . . . . . . . . . . . . . 60 147 5.7. Confirm1 and Confirm2 messages . . . . . . . . . . . . . 62 148 5.8. Conf2ACK message . . . . . . . . . . . . . . . . . . . . 64 149 5.9. Error message . . . . . . . . . . . . . . . . . . . . . . 65 150 5.10. ErrorACK message . . . . . . . . . . . . . . . . . . . . 67 151 5.11. GoClear message . . . . . . . . . . . . . . . . . . . . . 67 152 5.12. ClearACK message . . . . . . . . . . . . . . . . . . . . 67 153 5.13. SASrelay message . . . . . . . . . . . . . . . . . . . . 68 154 5.14. RelayACK message . . . . . . . . . . . . . . . . . . . . 70 155 5.15. Ping message . . . . . . . . . . . . . . . . . . . . . . 71 156 5.16. PingACK message . . . . . . . . . . . . . . . . . . . . . 72 157 6. Retransmissions . . . . . . . . . . . . . . . . . . . . . . . 73 158 7. Short Authentication String . . . . . . . . . . . . . . . . . 76 159 7.1. SAS Verified Flag . . . . . . . . . . . . . . . . . . . . 77 160 7.2. Signing the SAS . . . . . . . . . . . . . . . . . . . . . 78 161 7.2.1. OpenPGP Signatures . . . . . . . . . . . . . . . . . 79 162 7.2.2. NSA Suite B Signatures with X.509v3 Certs . . . . . . 81 163 7.2.3. Signing the SAS without a PKI . . . . . . . . . . . . 82 164 7.3. Relaying the SAS through a PBX . . . . . . . . . . . . . 83 165 7.3.1. PBX Enrollment and the PBX Enrollment Flag . . . . . 85 166 8. Signaling Interactions . . . . . . . . . . . . . . . . . . . 86 167 8.1. Binding the media stream to the signaling layer via 168 the Hello Hash . . . . . . . . . . . . . . . . . . . . . 88 169 8.1.1. Integrity-protected signaling enables 170 integrity-protected DH exchange . . . . . . . . . . . 89 171 8.2. Deriving the SRTP secret (srtps) from the signaling 172 layer . . . . . . . . . . . . . . . . . . . . . . . . . . 91 173 8.3. Codec Selection for Secure Media . . . . . . . . . . . . 92 174 9. False ZRTP Packet Rejection . . . . . . . . . . . . . . . . . 92 175 10. Intermediary ZRTP Devices . . . . . . . . . . . . . . . . . . 94 176 11. The ZRTP Disclosure flag . . . . . . . . . . . . . . . . . . 95 177 11.1. Guidelines on Proper Implementation of the Disclosure 178 Flag . . . . . . . . . . . . . . . . . . . . . . . . . . 97 180 12. Mapping between ZID and AOR (SIP URI) . . . . . . . . . . . . 98 181 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 99 182 14. Media Security Requirements . . . . . . . . . . . . . . . . . 99 183 15. Security Considerations . . . . . . . . . . . . . . . . . . . 101 184 15.1. Self-healing Key Continuity Feature . . . . . . . . . . . 104 185 16. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 106 186 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 106 187 17.1. Normative References . . . . . . . . . . . . . . . . . . 106 188 17.2. Informative References . . . . . . . . . . . . . . . . . 109 189 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 112 191 1. Introduction 193 ZRTP is a key agreement protocol which performs Diffie-Hellman key 194 exchange during call setup in the media path, and is transported over 195 the same port as the Real-time Transport Protocol (RTP) [RFC3550] 196 media stream which has been established using a signaling protocol 197 such as Session Initiation Protocol (SIP) [RFC3261]. This generates 198 a shared secret which is then used to generate keys and salt for a 199 Secure RTP (SRTP) [RFC3711] session. ZRTP borrows ideas from PGPfone 200 [pgpfone]. A reference implementation of ZRTP is available as Zfone 201 [zfone]. 203 The ZRTP protocol has some nice cryptographic features lacking in 204 many other approaches to media session encryption. Although it uses 205 a public key algorithm, it does not rely on a public key 206 infrastructure (PKI). In fact, it does not use persistent public 207 keys at all. It uses ephemeral Diffie-Hellman (DH) with hash 208 commitment, and allows the detection of man-in-the-middle (MiTM) 209 attacks by displaying a short authentication string (SAS) for the 210 users to read and verbally compare over the phone. It has Perfect 211 Forward Secrecy, meaning the keys are destroyed at the end of the 212 call, which precludes retroactively compromising the call by future 213 disclosures of key material. But even if the users are too lazy to 214 bother with short authentication strings, we still get reasonable 215 authentication against a MiTM attack, based on a form of key 216 continuity. It does this by caching some key material to use in the 217 next call, to be mixed in with the next call's DH shared secret, 218 giving it key continuity properties analogous to SSH. All this is 219 done without reliance on a PKI, key certification, trust models, 220 certificate authorities, or key management complexity that bedevils 221 the email encryption world. It also does not rely on SIP signaling 222 for the key management, and in fact does not rely on any servers at 223 all. It performs its key agreements and key management in a purely 224 peer-to-peer manner over the RTP packet stream. 226 ZRTP can be used and discovered without being declared or indicated 227 in the signaling path. This provides a best effort SRTP capability. 228 Also, this reduces the complexity of implementations and minimizes 229 interdependency between the signaling and media layers. However, 230 when ZRTP is indicated in the signaling via the zrtp-hash SDP 231 attribute, ZRTP has additional useful properties. By sending a hash 232 of the ZRTP Hello message in the signaling, ZRTP provides a useful 233 binding between the signaling and media paths, which is explained in 234 Section 8.1. When this is done through a signaling path that has 235 end-to-end integrity protection, the DH exchange is automatically 236 protected from a MiTM attack, which is explained in Section 8.1.1. 238 ZRTP is designed for unicast media sessions in which there is a voice 239 media stream. For multiparty secure conferencing, separate ZRTP 240 sessions may be negotiated between each party and the conference 241 bridge. For sessions lacking a voice media stream, MiTM protection 242 may be provided by the mechanisms in Section 8.1.1 or Section 7.2. 243 In terms of the RTP topologies defined in [RFC5117], ZRTP is designed 244 for Point to Point topologies only. 246 2. Terminology 248 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 249 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 250 document are to be interpreted as described in [RFC2119]. 252 In this document, a "call" is synonymous with a "session". 254 3. Overview 256 This section provides a description of how ZRTP works. This 257 description is non-normative in nature but is included to build 258 understanding of the protocol. 260 ZRTP is negotiated the same way a conventional RTP session is 261 negotiated in an offer/answer exchange using the standard RTP/AVP 262 profile. The ZRTP protocol begins after two endpoints have utilized 263 a signaling protocol such as SIP and are ready to exchange media. If 264 ICE [RFC5245] is being used, ZRTP begins after ICE has completed its 265 connectivity checks. 267 ZRTP is multiplexed on the same ports as RTP. It uses a unique 268 header that makes it clearly differentiable from RTP or STUN. 270 ZRTP support can be discovered in the signaling path by the presence 271 of a ZRTP SDP attribute. However, even in cases where this is not 272 received in the signaling, an endpoint can still send ZRTP Hello 273 messages to see if a response is received. If a response is not 274 received, no more ZRTP messages will be sent during this session. 275 This is safe because ZRTP has been designed to be clearly different 276 from RTP and have a similar structure to STUN packets received 277 (sometimes by non-supporting endpoints) during an ICE exchange. 279 Both ZRTP endpoints begin the ZRTP exchange by sending a ZRTP Hello 280 message to the other endpoint. The purpose of the Hello message is 281 to confirm the endpoint supports the protocol and to see what 282 algorithms the two ZRTP endpoints have in common. 284 The Hello message contains the SRTP configuration options, and the 285 ZID. Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID 286 that is generated once at installation time. ZIDs are discovered 287 during the Hello message exchange. The received ZID is used to look 288 up retained shared secrets from previous ZRTP sessions with the 289 endpoint. 291 A response to a ZRTP Hello message is a ZRTP HelloACK message. The 292 HelloACK message simply acknowledges receipt of the Hello. Since RTP 293 commonly uses best effort UDP transport, ZRTP has retransmission 294 timers in case of lost datagrams. There are two timers, both with 295 exponential backoff mechanisms. One timer is used for 296 retransmissions of Hello messages and the other is used for 297 retransmissions of all other messages after receipt of a HelloACK. 299 If an integrity protected signaling channel is available, a hash of 300 the Hello message can be sent. This allows rejection of false 301 injected ZRTP Hello messages by an attacker. 303 Hello and other ZRTP messages also contain a hash image that is used 304 to link the messages together. This allows rejection of false 305 injected ZRTP messages during an exchange. 307 3.1. Key Agreement Modes 309 After both endpoints exchange Hello and HelloACK messages, the key 310 agreement exchange can begin with the ZRTP Commit message. ZRTP 311 supports a number of key agreement modes including both Diffie- 312 Hellman and non-Diffie-Hellman modes as described in the following 313 sections. 315 The Commit message may be sent immediately after both endpoints have 316 completed the Hello/HelloACK discovery handshake. Or it may be 317 deferred until later in the call, after the participants engage in 318 some unencrypted conversation. The Commit message may be manually 319 activated by a user interface element, such as a GO SECURE button, 320 which becomes enabled after the Hello/HelloACK discovery phase. This 321 emulates the user experience of a number of secure phones in the PSTN 322 world [comsec]. However, it is expected that most simple ZRTP user 323 agents will omit such buttons and proceed directly to secure mode by 324 sending a Commit message immediately after the Hello/HelloACK 325 handshake. 327 3.1.1. Diffie-Hellman Mode Overview 329 An example ZRTP call flow is shown in Figure 1 below. Note that the 330 order of the Hello/HelloACK exchanges in F1/F2 and F3/F4 may be 331 reversed. That is, either Alice or Bob might send the first Hello 332 message. Note that the endpoint which sends the Commit message is 333 considered the initiator of the ZRTP session and drives the key 334 agreement exchange. The Diffie-Hellman public values are exchanged 335 in the DHPart1 and DHPart2 messages. SRTP keys and salts are then 336 calculated. 338 The initiator needs to generate its ephemeral key pair before sending 339 the Commit, and the responder generates its key pair before sending 340 DHPart1. 342 Alice Bob 343 | | 344 | Alice and Bob establish a media session. | 345 | They initiate ZRTP on media ports | 346 | | 347 | F1 Hello (version, options, Alice's ZID) | 348 |-------------------------------------------------->| 349 | HelloACK F2 | 350 |<--------------------------------------------------| 351 | Hello (version, options, Bob's ZID) F3 | 352 |<--------------------------------------------------| 353 | F4 HelloACK | 354 |-------------------------------------------------->| 355 | | 356 | Bob acts as the initiator | 357 | | 358 | Commit (Bob's ZID, options, hash value) F5 | 359 |<--------------------------------------------------| 360 | F6 DHPart1 (pvr, shared secret hashes) | 361 |-------------------------------------------------->| 362 | DHPart2 (pvi, shared secret hashes) F7 | 363 |<--------------------------------------------------| 364 | | 365 | Alice and Bob generate SRTP session key. | 366 | | 367 | F8 Confirm1 (MAC, D,A,V,E flags, sig) | 368 |-------------------------------------------------->| 369 | Confirm2 (MAC, D,A,V,E flags, sig) F9 | 370 |<--------------------------------------------------| 371 | F10 Conf2ACK | 372 |-------------------------------------------------->| 373 | SRTP begins | 374 |<=================================================>| 375 | | 377 Figure 1: Establishment of an SRTP session using ZRTP 379 ZRTP authentication uses a Short Authentication String (SAS) which is 380 ideally displayed for the human user. Alternatively, the SAS can be 381 authenticated by exchanging an OPTIONAL digital signature (sig) over 382 the short authentication string in the Confirm1 or Confirm2 messages 383 (described in Section 7.2). 385 The ZRTP Confirm1 and Confirm2 messages are sent for a number of 386 reasons, not the least of which is they confirm that all the key 387 agreement calculations were successful and thus the encryption will 388 work. They also carry other information such as the Disclosure flag 389 (D), the Allow Clear flag (A), the SAS Verified flag (V), and the PBX 390 Enrollment flag (E). All flags are encrypted to shield them from a 391 passive observer. 393 3.1.2. Preshared Mode Overview 395 In the Preshared Mode, endpoints can skip the DH calculation if they 396 have a shared secret from a previous ZRTP session. Preshared mode is 397 indicated in the Commit message and results in the same call flow as 398 Multistream mode. The principal difference between Multistream mode 399 and Preshared mode is that Preshared mode uses a previously cached 400 shared secret, rs1, instead of an active ZRTP Session key as the 401 initial keying material. 403 This mode could be useful for slow processor endpoints so that a DH 404 calculation does not need to be performed every session. Or, this 405 mode could be used to rapidly re-establish an earlier session that 406 was recently torn down or interrupted without the need to perform 407 another DH calculation. 409 Preshared mode has forward secrecy properties. If a phone's cache is 410 captured by an opponent, the cached shared secrets cannot be used to 411 recover earlier encrypted calls, because the shared secrets are 412 replaced with new ones in each new call, as in DH mode. However, the 413 captured secrets can be used by a passive wiretapper in the media 414 path to decrypt the next call, if the next call is in Preshared mode. 415 This differs from DH mode, which requires an active MiTM wiretapper 416 to exploit captured secrets in the next call. However, if the next 417 call is missed by the wiretapper, he cannot wiretap any further 418 calls. It thus preserves most of the self-healing properties 419 (Section 15.1) of key continuity enjoyed by DH mode. 421 3.1.3. Multistream Mode Overview 423 Multistream mode is an alternative key agreement method when two 424 endpoints have an established SRTP media stream between them with an 425 active ZRTP Session key. ZRTP can derive multiple SRTP keys from a 426 single DH exchange. For example, an established secure voice call 427 that adds a video stream uses Multistream mode to quickly initiate 428 the video stream without a second DH exchange. 430 When Multistream mode is indicated in the Commit message, a call flow 431 similar to Figure 1 is used, but no DH calculation is performed by 432 either endpoint and the DHPart1 and DHPart2 messages are omitted. 433 The Confirm1, Confirm2, and Conf2ACK messages are still sent. Since 434 the cache is not affected during this mode, multiple Multistream ZRTP 435 exchanges can be performed in parallel between two endpoints. 437 When adding additional media streams to an existing call, only 438 Multistream mode is used. Only one DH operation is performed, just 439 for the first media stream. 441 4. Protocol Description 443 This section begins the normative description of the protocol. 445 ZRTP MUST be multiplexed on the same ports as the RTP media packets. 447 To support best effort encryption from the Media Security 448 Requirements [RFC5479], ZRTP uses normal RTP/AVP profile (AVP) media 449 lines in the initial offer/answer exchange. The ZRTP SDP attribute 450 a=zrtp-hash defined in Section 8 SHOULD be used in all offers and 451 answers to indicate support for the ZRTP protocol. 453 ZRTP can be utilized by endpoints that do not have a common 454 signaling protocol but both support SRTP and are relying on a 455 gateway for conversion. As such, it is not always possible for 456 the signaling protocol to relay the zrtp-hash as can be done using 457 SIP. 459 The Secure RTP/AVP (SAVP) profile MAY be used in subsequent offer/ 460 answer exchanges after a successful ZRTP exchange has resulted in an 461 SRTP session, or if it is known the other endpoint supports this 462 profile. Other profiles MAY also be used. 464 The use of the RTP/SAVP profile has caused failures in negotiating 465 best effort SRTP due to the limitations on negotiating profiles 466 using SDP. This is why ZRTP supports the RTP/AVP profile and 467 includes its own discovery mechanisms. 469 In all key agreement modes, the initiator SHOULD NOT send RTP media 470 after sending the Commit message, and MUST NOT send SRTP media before 471 receiving either the Conf2ACK or the first SRTP media (with a valid 472 SRTP auth tag) from the responder. The responder SHOULD NOT send RTP 473 media after receiving the Commit message, and MUST NOT send SRTP 474 media before receiving the Confirm2 message. 476 4.1. Discovery 478 During the ZRTP discovery phase, a ZRTP endpoint discovers if the 479 other endpoint supports ZRTP and the supported algorithms and 480 options. This information is transported in a Hello message, 481 described in Section 5.2. 483 ZRTP endpoints SHOULD include the SDP attribute a=zrtp-hash in offers 484 and answers, as defined in Section 8. 486 The Hello message includes the ZRTP version, hash type, cipher type, 487 authentication method and tag length, key agreement type, and Short 488 Authentication String (SAS) algorithms that are supported. The Hello 489 message also includes a hash image as described in Section 9. In 490 addition, each endpoint sends and discovers ZIDs. The received ZID 491 is used later in the protocol as an index into a cache of shared 492 secrets that were previously negotiated and retained between the two 493 parties. 495 A Hello message can be sent at any time, but is usually sent at the 496 start of an RTP session to determine if the other endpoint supports 497 ZRTP, and also if the SRTP implementations are compatible. A Hello 498 message is retransmitted using timer T1 and an exponential backoff 499 mechanism detailed in Section 6 until the receipt of a HelloACK 500 message or a Commit message. 502 The use of the a=zrtp-hash SDP attribute to authenticate the Hello 503 message is described in Section 8.1. 505 If a Hello message or any other ZRTP message indicates that there is 506 an SSRC collision, an Error message (Section 5.9) MUST be sent with 507 the Error Code indicating SSRC collision, and the ZRTP negotiation 508 MUST be terminated. The procedures of RFC 3550 Section 8.2 [RFC3550] 509 SHOULD be followed by both endpoints to resolve this condition, and 510 if it is resolved, a new ZRTP secure session SHOULD be negotiated. 512 4.1.1. Protocol Version Negotiation 514 This specification defines ZRTP version 1.10. Since new versions of 515 ZRTP may be developed in the future, this specification defines a 516 protocol version negotiation in this section. 518 Each party declares what version of the ZRTP protocol they support 519 via the version field in the Hello message (Section 5.2). If both 520 parties have the same version number in their Hello messages, they 521 can proceed with the rest of the protocol. To facilitate both 522 parties reaching this state of protocol version agreement in their 523 Hello messages, ZRTP should use information provided in the signaling 524 layer, if available. If a ZRTP endpoint supports more than one 525 version of the protocol, it SHOULD declare them all in a list of SIP 526 SDP a=zrtp-hash attributes (defined in Section 8), listing separate 527 hashes, with separate ZRTP version numbers in each item in the list. 529 Both parties should inspect the list of ZRTP version numbers supplied 530 by the other party in the SIP SDP a=zrtp-hash attributes. Both 531 parties SHOULD choose the highest version number that appear in both 532 parties' list of a=zrtp-hash version numbers, and use that version 533 for their Hello messages. If both parties use the SIP signaling in 534 this manner, their initial Hello messages will have the same ZRTP 535 version number, provided they both have at least one supported 536 protocol version in common. Before the ZRTP key agreement can 537 proceed, an endpoint MUST have sent and received Hellos with the same 538 protocol version. 540 It is best if the signaling layer is used to negotiate the protocol 541 version number. However, the a=zrtp-hash SDP attribute is not always 542 present in the SIP packet, as explained in Section 8.1. In the 543 absence of any guidance from the signaling layer, an endpoint MUST 544 send the highest supported version in initial Hello messages. If the 545 two parties send different protocol version numbers in their Hello 546 messages, they can reach agreement to use a common version, if one 547 exists. They iteratively apply the following rules until they both 548 have matching version fields in their Hello messages and the key 549 agreement can proceed: 551 o If an endpoint receives a Hello message with an unsupported 552 version number that is higher than the endpoint's current Hello 553 message version, the received Hello message MUST be ignored. The 554 endpoint continues to retransmit Hello messages on the standard 555 retry schedule (Section 6). 556 o If an endpoint receives a Hello message with a version number that 557 is lower than the endpoint's current Hello message, and the 558 endpoint supports a version that is less than or equal to the 559 received version number, the endpoint MUST stop retransmitting the 560 old version number and MUST start sending a Hello message with the 561 highest supported version number that is less than or equal to the 562 received version number. 563 o If an endpoint receives a Hello message with an unsupported 564 version number that is lower than the endpoint's current Hello 565 message, the endpoint MUST send an Error message (Section 5.9) 566 indicating failure to support this ZRTP version. 568 The above comparisons are iterated until the version numbers match, 569 or until it exits on a failure to match. 571 For example, assume that Alice supports protocol version 1.10 and 572 2.00, and Bob supports version 1.10 and 1.20. Alice initially 573 sends a Hello with version 2.00, and Bob initially sends a Hello 574 with version 1.20. Bob ignores Alice's 2.00 Hello and continues 575 to send his 1.20 Hello. Alice detects that Bob does not support 576 2.00 and she stops sending her 2.00 Hellos and starts sending a 577 stream of 1.10 Hellos. Bob sees the 1.10 Hello from Alice and 578 stops sending his 1.20 Hellos and switches to sending 1.10 Hellos. 579 At that point, they have converged on using version 1.10 and the 580 protocol proceeds on that basis. 582 When comparing protocol versions, a ZRTP endpoint MUST include only 583 the first three octets of the version field in the comparison. The 584 final octet is ignored, because it is not significant for 585 interoperability. For example, "1.1 ", "1.10", "1.11", or "1.1a" are 586 all regarded as a version match, because they would all be 587 interoperable versions. 589 Changes in protocol version numbers are expected to be infrequent 590 after version 1.10. Supporting multiple versions adds code 591 complexity and may introduce security weaknesses in the 592 implementation. The old adage about keeping it simple applies 593 especially to implementing security protocols. Endpoints SHOULD NOT 594 support protocol versions earlier than version 1.10. 596 4.1.2. Algorithm Negotiation 598 A method is provided to allow the two parties to mutually and 599 deterministically choose the same DH key size and algorithm before a 600 Commit message is sent. 602 Each Hello message lists the algorithms in the order of preference 603 for that ZRTP endpoint. Endpoints eliminate the non-intersecting 604 choices from each of their own lists, resulting in each endpoint 605 having a list of algorithms in common that might or might not be 606 ordered the same as the other endpoint's list. Each endpoint 607 compares the first item on their own list with the first item on the 608 other endpoint's list, and SHOULD choose the faster of the two 609 algorithms. For example: 611 o Alice's full list: DH2K, DH3K, EC25 612 o Bob's full list: EC38, EC25, DH3K 613 o Alice's intersecting list: DH3K, EC25 614 o Bob's intersecting list: EC25, DH3K 615 o Alice's first preference is DH3K, and Bob's first preference is 616 EC25. 618 o Thus, both parties choose EC25 (ECDH-256), because it's faster. 620 To decide which DH algorithm is faster, the following ranking is 621 defined: DH-2048, ECDH-256, DH-3072, ECDH-384, ECDH-521. These are 622 all defined in Section 5.1.5. 624 If both endpoints follow this method, they may each start their DH 625 calculations as soon as they receive the Hello message, and there 626 will be no need for either endpoint to discard their DH calculation 627 if the other endpoint becomes the initiator. 629 This method is used only to negotiate DH key size. For the rest of 630 the algorithm choices, it's simply whatever the initiator selects 631 from the algorithms in common. Note that the DH key size influences 632 the hash type and the size of the symmetric cipher key, as explained 633 in Section 5.1.5. 635 Unfavorable choices will never be made by this method, because each 636 endpoint will omit from their respective lists choices that are too 637 slow or not secure enough to meet their security policy. 639 4.2. Commit Contention 641 After both parties have received compatible Hello messages, a Commit 642 message (Section 5.4) can be sent to begin the ZRTP key exchange. 643 The endpoint that sends the Commit is known as the initiator, while 644 the receiver of the Commit is known as the responder. 646 If both sides send Commit messages initiating a secure session at the 647 same time the following rules are used to break the tie: 649 o If one Commit is for a DH mode while the other is for Preshared 650 mode, then the Preshared Commit MUST be discarded and the DH 651 Commit proceeds. 652 o If the two Commits are both Preshared mode, and one party has set 653 the MiTM (M) flag in the Hello message and the other has not, the 654 Commit message from the party who set the (M) flag MUST be 655 discarded, and the one who has not set the (M) flag becomes the 656 initiator, regardless of the nonce values. In other words, for 657 Preshared mode, the phone is the initiator and the PBX is the 658 responder. 659 o If the two Commits are either both DH modes or both non-DH modes, 660 then the Commit message with the lowest hvi (hash value of 661 initiator) value (for DH Commits), or lowest nonce value (for 662 non-DH Commits), MUST be discarded and the other side is the 663 initiator, and the protocol proceeds with the initiator's Commit. 664 The two hvi or nonce values are compared as large unsigned 665 integers in network byte order. 667 If one Commit is for Multistream mode while the other is for non- 668 Multistream (DH or Preshared) mode, a software error has occurred and 669 the ZRTP negotiation should be terminated. This should never occur 670 because of the constraints on Multistream mode described in 671 Section 4.4.3. 673 In the event that Commit messages are sent by both ZRTP endpoints at 674 the same time, but are received in different media streams, the same 675 resolution rules apply as if they were received on the same stream. 676 The media stream in which the Commit was received or sent will 677 proceed through the ZRTP exchange while the media stream with the 678 discarded Commit must wait for the completion of the other ZRTP 679 exchange. 681 If a commit contention forces a DH Commit message to be discarded, 682 the responder's DH public value should only be discarded if it does 683 not match the initiator's DH key size. This will not happen if both 684 endpoints choose a common key size via the method described in 685 Section 4.1.2. 687 4.3. Matching Shared Secret Determination 689 The following sections describe how ZRTP endpoints generate and/or 690 use the set of shared secrets s1, auxsecret, and pbxsecret through 691 the exchange of the DHPart1 and DHPart2 messages. This doesn't cover 692 the Diffie-Hellman calculations. It only covers the method whereby 693 the two parties determine if they already have shared secrets in 694 common in their caches. 696 Each ZRTP endpoint maintains a long-term cache of shared secrets that 697 it has previously negotiated with the other party. The ZID of the 698 other party, received in the other party's Hello message, is used as 699 an index into this cache to find the set of shared secrets, if any 700 exist. This cache entry may contain previously retained shared 701 secrets, rs1 and rs2, which give ZRTP its key continuity features. 702 If the other party is a PBX, the cache may also contain a trusted 703 MiTM PBX shared secret, called pbxsecret, defined in Section 7.3.1. 705 The DHPart1 and DHPart2 messages contain a list of hashes of these 706 shared secrets to allow the two endpoints to compare the hashes with 707 what they have in their caches to detect whether the two sides share 708 any secrets that can be used in the calculation of the session key. 709 The use of this shared secret cache is described in Section 4.9. 711 If no secret of a given type is available, a random value is 712 generated and used for that secret to ensure a mismatch in the hash 713 comparisons in the DHPart1 and DHPart2 messages. This prevents an 714 eavesdropper from knowing which types of shared secrets are available 715 between the endpoints. 717 Section 4.3.1 refers to the auxiliary shared secret auxsecret. The 718 auxsecret shared secret may be defined by the VoIP user agent out-of- 719 band from the ZRTP protocol. In some cases it may be provided by the 720 signaling layer as srtps, which is defined in Section 8.2. If it is 721 not provided by the signaling layer, the auxsecret shared secret may 722 be manually provisioned in other application-specific ways that are 723 out-of-band, such as computed from a hashed pass phrase by prior 724 agreement between the two parties, or supplied by a hardware token. 725 Or it may be a family key used by an institution that the two parties 726 both belong to. It is a generalized mechanism for providing a shared 727 secret that is agreed to between the two parties out of scope of the 728 ZRTP protocol. It is expected that most typical ZRTP endpoints will 729 rarely use auxsecret. 731 For both the initiator and the responder, the shared secrets s1, s2, 732 and s3 will be calculated so that they can all be used later to 733 calculate s0 in Section 4.4.1.4. Here is how s1, s2, and s3 are 734 calculated by both parties: 736 The shared secret s1 will be either the initiator's rs1 or the 737 initiator's rs2, depending on which of them can be found in the 738 responder's cache. If the initiator's rs1 matches the responder's 739 rs1 or rs2, then s1 MUST be set to the initiator's rs1. If and only 740 if that match fails, then if the initiator's rs2 matches the 741 responder's rs1 or rs2, then s1 MUST be set to the initiator's rs2. 742 If that match also fails, then s1 MUST be set to null. The 743 complexity of the s1 calculation is to recover from any loss of cache 744 sync from an earlier aborted session, due to the Two Generals' 745 Problem [Byzantine]. 747 The shared secret s2 MUST be set to the value of auxsecret if and 748 only if both parties have matching values for auxsecret, as 749 determined by comparing the hashes of auxsecret sent in the DH 750 messages. If they don't match, s2 MUST be set to null. 752 The shared secret s3 MUST be set to the value of pbxsecret if and 753 only if both parties have matching values for pbxsecret, as 754 determined by comparing the hashes of pbxsecret sent in the DH 755 messages. If they don't match, s3 MUST be set to null. 757 If s1, s2, or s3 have null values, they are assumed to have a zero 758 length for the purposes of hashing them later during the s0 759 calculation in Section 4.4.1.4. 761 The comparison of hashes of rs1, rs2, auxsecret, and pbxsecret is 762 described below in Section 4.3.1. 764 4.3.1. Calculation and comparison of hashes of shared secrets 766 Both parties calculate a set of non-invertible hashes (implemented 767 via the MAC defined in Section 5.1.2.1) of shared secrets that may be 768 present in each of their caches. These hashes are truncated to the 769 leftmost 64 bits: 771 rs1IDr = MAC(rs1, "Responder") 772 rs2IDr = MAC(rs2, "Responder") 773 auxsecretIDr = MAC(auxsecret, Responder's H3) 774 pbxsecretIDr = MAC(pbxsecret, "Responder") 775 rs1IDi = MAC(rs1, "Initiator") 776 rs2IDi = MAC(rs2, "Initiator") 777 auxsecretIDi = MAC(auxsecret, Initiator's H3) 778 pbxsecretIDi = MAC(pbxsecret, "Initiator") 780 The responder sends rs1IDr, rs2IDr, auxsecretIDr, and pbxsecretIDr in 781 the DHPart1 message. The initiator sends rs1IDi, rs2IDi, 782 auxsecretIDi, and pbxsecretIDi in the DHPart2 message. 784 The responder uses the locally computed rs1IDi, rs2IDi, auxsecretIDi, 785 and pbxsecretIDi to compare against the corresponding fields in the 786 received DHPart2 message. The initiator uses the locally computed 787 rs1IDr, rs2IDr, auxsecretIDr, and pbxsecretIDr to compare against the 788 corresponding fields in the received DHPart1 message. 790 From these comparisons, s1, s2, and s3 are calculated per the methods 791 described above in Section 4.3. The secrets corresponding to 792 matching hashes are kept while the secrets corresponding to the non- 793 matching ones are replaced with a null, which is assumed to have a 794 zero length for the purposes of hashing them later. The resulting 795 s1, s2, and s3 values are used later to calculate s0 in 796 Section 4.4.1.4. 798 For example, consider two ZRTP endpoints who share secrets rs1 and 799 pbxsecret (defined in Section 7.3.1). During the comparison, rs1ID 800 and pbxsecretID will match but auxsecretID will not. As a result, s1 801 = rs1, s2 will be null, and s3 = pbxsecret. 803 4.3.2. Handling a Shared Secret Cache Mismatch 805 A shared secret cache mismatch is defined to mean that we expected a 806 cache match because rs1 exists in our local cache, but we computed a 807 null value for s1 (per the method described in Section 4.3). 809 If one party has a cached shared secret and the other party does not, 810 this indicates one of two possible situations. Either there is a 811 man-in-the-middle (MiTM) attack, or one of the legitimate parties has 812 lost their cached shared secret by some mishap. Perhaps they 813 inadvertently deleted their cache, or their cache was lost or 814 disrupted due to restoring their disk from an earlier backup copy. 815 The party that has the surviving cache entry can easily detect that a 816 cache mismatch has occurred, because they expect their own cached 817 secret to match the other party's cached secret, but it does not 818 match. It is possible for both parties to detect this condition if 819 both parties have surviving cached secrets that have fallen out of 820 sync, due perhaps to one party restoring from a disk backup. 822 If either party discovers a cache mismatch, the user agent who makes 823 this discovery must treat this as a possible security event and MUST 824 alert their own user that there is a heightened risk of a MiTM 825 attack, and that the user should verbally compare the SAS with the 826 other party to ascertain that no MiTM attack has occurred. If a 827 cache mismatch is detected and it is not possible to compare the SAS, 828 either because the user interface does not support it or because one 829 or both endpoints are unmanned devices, and no other SAS comparison 830 mechanism is available, the session MAY be terminated. 832 The session need not be terminated on a cache mismatch event if: 834 o the mechanism described in Section 8.1.1 is available, which 835 allows authentication of the DH exchange without human assistance, 836 or 837 o any mechanism is available to determine if the SAS matches. This 838 would require either circumstances that allow human verbal 839 comparisons of the SAS, or by using the OPTIONAL digital signature 840 feature on the SAS hash, as described in Section 7.2. 842 Even if the user interface does not permit an SAS comparison, the 843 human user MUST be warned, and may elect to proceed with the call at 844 their own risk. 846 If and only if a cache mismatch event occurs, the cache update 847 mechanism in Section 4.6.1 is affected, requiring the user to verify 848 the SAS before the cache is updated. The user will thus be alerted 849 of this security condition on every call until the SAS is verified. 850 This is described in Section 4.6.1.1. 852 Here is a non-normative example of a cache-mismatch alert message 853 from a ZRTP user agent (specifically, Zfone [zfone]), designed for a 854 desktop PC graphical user interface environment. It is by no means 855 required that the alert be this detailed: 857 "We expected the other party to have a shared secret cached from a 858 previous call, but they don't have it. This may mean your partner 859 simply lost his cache of shared secrets, but it could also mean 860 someone is trying to wiretap you. To resolve this question you 861 must check the authentication string with your partner. If it 862 doesn't match, it indicates the presence of a wiretapper." 863 If the alert is rendered by a robot voice instead of a GUI, 864 brevity may be more important: "Something's wrong. You must check 865 the authentication string with your partner. If it doesn't match, 866 it indicates the presence of a wiretapper." 868 A mismatch of auxsecret is handled differently than a mismatch of 869 rs1. An auxsecret mismatch is defined to mean that auxsecret exists 870 locally, but we computed a null value for s2 (per the method 871 described in Section 4.3). This mismatch should be made visible to 872 whichever user has auxsecret defined. The mismatch should be made 873 visible to both users if they both have auxsecret defined but they 874 fail to match. The severity of the user notification is 875 implementation dependent. Aborting the session is not required. If 876 auxsecret matches, it should not excuse a mismatch of rs1, which 877 still requires a strong warning to the user. 879 4.4. DH and non-DH key agreements 881 The next step is the generation of a secret for deriving SRTP keying 882 material. ZRTP uses Diffie-Hellman and two non-Diffie-Hellman modes, 883 described in the following sections. 885 4.4.1. Diffie-Hellman Mode 887 The purpose of the Diffie-Hellman (either Finite Field Diffie-Hellman 888 or Elliptic Curve Diffie-Hellman) exchange is for the two ZRTP 889 endpoints to generate a new shared secret, s0. In addition, the 890 endpoints discover if they have any cached or previously stored 891 shared secrets in common, and uses them as part of the calculation of 892 the session keys. 894 Because the DH exchange affects the state of the retained shared 895 secret cache, only one in-process ZRTP DH exchange may occur at a 896 time between two ZRTP endpoints. Otherwise, race conditions and 897 cache integrity problems will result. When multiple media streams 898 are established in parallel between the same pair of ZRTP endpoints 899 (determined by the ZIDs in the Hello Messages), only one can be 900 processed. Once that exchange completes with Confirm2 and Conf2ACK 901 messages, another ZRTP DH exchange can begin. This constraint does 902 not apply when Multistream mode key agreement is used since the 903 cached shared secrets are not affected. 905 4.4.1.1. Hash Commitment in Diffie-Hellman Mode 907 From the intersection of the algorithms in the sent and received 908 Hello messages, the initiator chooses a hash, cipher, auth tag, key 909 agreement type, and SAS type to be used. 911 A Diffie-Hellman mode is selected by setting the Key Agreement Type 912 in the Commit to one of the DH or ECDH values from the table in 913 Section 5.1.5. In this mode, the key agreement begins with the 914 initiator choosing a fresh random Diffie-Hellman (DH) secret value 915 (svi) based on the chosen key agreement type value, and computing the 916 public value. (Note that to speed up processing, this computation 917 can be done in advance.) For guidance on generating random numbers, 918 see Section 4.8. 920 For Finite Field Diffie-Hellman, the value for the DH generator g, 921 the DH prime p, and the length of the DH secret value, svi, are 922 defined in Section 5.1.5. 924 pvi = g^svi mod p 926 where g and p are determined by the key agreement type value. The 927 pvi value is formatted as a big-endian octet string, fixed to the 928 bit-length of the DH prime, and leading zeros MUST NOT be truncated. 930 For Elliptic Curve DH, pvi is calculated and formatted according to 931 the ECDH specification in Section 5.1.5, which refers in detail to 932 certain sections of NIST SP 800-56A. 934 The hash commitment is performed by the initiator of the ZRTP 935 exchange. The hash value of the initiator, hvi, includes a hash of 936 the entire DHPart2 message as shown in Figure 9 (which includes the 937 Diffie-Hellman public value, pvi), and the responder's Hello message 938 (where '||' means concatenation). The hvi hash is truncated to 256 939 bits: 941 hvi = hash(initiator's DHPart2 message || responder's Hello 942 message) 944 Note that the Hello message includes the fields shown in Figure 3. 946 The information from the responder's Hello message is included in the 947 hash calculation to prevent a bid-down attack by modification of the 948 responder's Hello message. 950 The initiator sends hvi in the Commit message. 952 The use of hash commitment in the DH exchange constrains the attacker 953 to only one guess to generate the correct short authentication string 954 (SAS) (Section 7) in his attack, which means the SAS can be quite 955 short. A 16-bit SAS, for example, provides the attacker only one 956 chance out of 65536 of not being detected. 958 4.4.1.2. Responder Behavior in Diffie-Hellman Mode 960 Upon receipt of the Commit message, the responder generates its own 961 fresh random DH secret value, svr, and computes the public value. 962 (Note that to speed up processing, this computation can be done in 963 advance, with no need to discard this computation if both endpoints 964 chose the same algorithm via Section 4.1.2.) For guidance on random 965 number generation, see Section 4.8. 967 For Finite Field Diffie-Hellman, the value for the DH generator g, 968 the DH prime p, and the length of the DH secret value, svr, are 969 defined in Section 5.1.5. 971 pvr = g^svr mod p 973 The pvr value is formatted as a big-endian octet string, fixed to the 974 bit-length of the DH prime, and leading zeros MUST NOT be truncated. 976 For Elliptic Curve DH, pvr is calculated and formatted according to 977 the ECDH specification in Section 5.1.5, which refers in detail to 978 certain sections of NIST SP 800-56A. 980 Upon receipt of the DHPart2 message, the responder checks that the 981 initiator's public DH value is not equal to 1 or p-1. An attacker 982 might inject a false DHPart2 message with a value of 1 or p-1 for 983 g^svi mod p, which would cause a disastrously weak final DH result to 984 be computed. If pvi is 1 or p-1, the user SHOULD be alerted of the 985 attack and the protocol exchange MUST be terminated. Otherwise, the 986 responder computes its own value for the hash commitment using the 987 public DH value (pvi) received in the DHPart2 message and its Hello 988 message and compares the result with the hvi received in the Commit 989 message. If they are different, a MiTM attack is taking place and 990 the user is alerted and the protocol exchange terminated. 992 The responder then calculates the Diffie-Hellman result: 994 DHResult = pvi^svr mod p 996 4.4.1.3. Initiator Behavior in Diffie-Hellman Mode 998 Upon receipt of the DHPart1 message, the initiator checks that the 999 responder's public DH value is not equal to 1 or p-1. An attacker 1000 might inject a false DHPart1 message with a value of 1 or p-1 for 1001 g^svr mod p, which would cause a disastrously weak final DH result to 1002 be computed. If pvr is 1 or p-1, the user should be alerted of the 1003 attack and the protocol exchange MUST be terminated. 1005 The initiator then sends a DHPart2 message containing the initiator's 1006 public DH value and the set of calculated shared secret IDs as 1007 defined in Section 4.3.1. 1009 The initiator calculates the same Diffie-Hellman result using: 1011 DHResult = pvr^svi mod p 1013 4.4.1.4. Shared Secret Calculation for DH Mode 1015 A hash of the received and sent ZRTP messages in the current ZRTP 1016 exchange in the following order is calculated by both parties: 1018 total_hash = hash(Hello of responder || Commit || DHPart1 || 1019 DHPart2) 1021 Note that only the ZRTP messages (Figure 3, Figure 5, Figure 8, and 1022 Figure 9), not the entire ZRTP packets, are included in the 1023 total_hash. 1025 For both the initiator and responder, the DHResult is formatted as a 1026 big-endian octet string, fixed to the width of the DH prime, and 1027 leading zeros MUST NOT be truncated. For example, for a 3072-bit p, 1028 DHResult would be a 384 octet value, with the first octet the most 1029 significant. DHResult may also be the result of an ECDH calculation, 1030 which is discussed in Section 5.1.5. 1032 Key | Size of 1033 Agreement | DHResult 1034 ------------------------ 1035 DH-3072 | 384 octets 1036 ------------------------ 1037 DH-2048 | 256 octets 1038 ------------------------ 1039 ECDH P-256 | 32 octets 1040 ------------------------ 1041 ECDH P-384 | 48 octets 1042 ------------------------ 1044 The authors believe the calculation of the final shared secret, s0, 1045 is in compliance with the recommendations in sections 5.8.1 and 1046 6.1.2.1 of NIST SP 800-56A [SP800-56A]. This is done by hashing a 1047 concatenation of a number of items, including the DHResult, the ZID's 1048 of the initiator (ZIDi) and the responder (ZIDr), the total_hash, and 1049 the set of non-null shared secrets as described in Section 4.3. 1051 In section 5.8.1 of NIST SP 800-56A [SP800-56A], NIST requires 1052 certain parameters to be hashed together in a particular order, which 1053 NIST refers to as: Z, AlgorithmID, PartyUInfo, PartyVInfo, 1054 SuppPubInfo, and SuppPrivInfo. In our implementation, our DHResult 1055 corresponds to Z, "ZRTP-HMAC-KDF" corresponds to AlgorithmID, our 1056 ZIDi and ZIDr correspond to PartyUInfo and PartyVInfo, our total_hash 1057 corresponds to SuppPubInfo, and the set of three shared secrets s1, 1058 s2, and s3 corresponds to SuppPrivInfo. NIST also requires a 32-bit 1059 big-endian integer counter to be included in the hash each time the 1060 hash is computed, which we have set to the fixed value of 1, because 1061 we only compute the hash once. NIST refers to the final hash output 1062 as DerivedKeyingMaterial, which corresponds to our s0 in this 1063 calculation. 1065 s0 = hash(counter || DHResult || "ZRTP-HMAC-KDF" || ZIDi || ZIDr 1066 || total_hash || len(s1) || s1 || len(s2) || s2 || len(s3) || s3) 1068 Note that temporary values s1, s2, and s3 were calculated per the 1069 methods described above in Section 4.3. DHResult, s1, s2, and s3 1070 MUST all be erased from memory immediately after they are used to 1071 calculate s0. 1073 The length of the DHResult field was implicitly agreed to by the 1074 negotiated DH prime size. The length of total_hash is implicitly 1075 determined by the negotiated hash algorithm. All of the explicit 1076 length fields, len(), in the above hash are 32-bit big-endian 1077 integers, giving the length in octets of the field that follows. 1078 Some members of the set of shared secrets (s1, s2, and s3) may have 1079 lengths of zero if they are null (not shared), and are each preceded 1080 by a 4-octet length field. For example, if s2 is null, len(s2) is 1081 0x00000000, and s2 itself would be absent from the hash calculation, 1082 which means len(s3) would immediately follow len(s2). While 1083 inclusion of ZIDi and ZIDr may be redundant, because they are 1084 implicitly included in the total_hash, we explicitly include them 1085 here to follow NIST SP 800-56A. The fixed-length string "ZRTP-HMAC- 1086 KDF" (not null-terminated) identifies what purpose the resulting s0 1087 will be used for, which is to serve as the key derivation key for the 1088 ZRTP HMAC-based key derivation function (KDF) defined in 1089 Section 4.5.1 and used in Section 4.5.3. 1091 The authors believe ZRTP DH mode is in full compliance with two 1092 relevant NIST documents that cover key derivations. First, section 1093 5.8.1 of NIST SP 800-56A [SP800-56A] computes what NIST refers to as 1094 DerivedKeyingMaterial, which ZRTP refers to as s0. This s0 then 1095 serves as the key derivation key, which NIST refers to as KI in the 1096 key derivation function described in sections 5 and 5.1 of NIST SP 1097 800-108 [SP800-108], to derive all the rest of the subkeys needed by 1098 ZRTP. For ECDH mode, the authors believe the s0 calculation is also 1099 in compliance with section 3.1 of NSA's Suite B Implementer's Guide 1100 to NIST SP 800-56A [NSA-Suite-B-Guide-56A]. 1102 The ZRTP key derivation function (KDF) (Section 4.5.1) requires the 1103 use of a KDF Context field (per NIST SP 800-108 [SP800-108] 1104 guidelines) which should include the ZIDi, ZIDr, and a nonce value 1105 known to both parties. The total_hash qualifies as a nonce value, 1106 because its computation included nonce material from the initiator's 1107 Commit message and the responder's Hello message. 1109 KDF_Context = (ZIDi || ZIDr || total_hash) 1111 At this point in DH mode, the two endpoints proceed to the key 1112 derivations of ZRTPSess and the rest of the keys in Section 4.5.2, 1113 now that there is a defined s0. 1115 4.4.2. Preshared Mode 1117 The Preshared key agreement mode can be used to generate SRTP keys 1118 and salts without a DH calculation, instead relying on a shared 1119 secret from previous DH calculations between the endpoints. 1121 This key agreement mode is useful to rapidly re-establish a secure 1122 session between two parties who have recently started and ended a 1123 secure session that has already performed a DH key agreement, without 1124 performing another lengthy DH calculation, which may be desirable on 1125 slow processors in resource-limited environments. Preshared mode 1126 MUST NOT be used for adding additional media streams to an existing 1127 call. Multistream mode MUST be used for this purpose. 1129 In the most severe resource-limited environments, Preshared mode may 1130 be useful with processors that cannot perform a DH calculation in an 1131 ergonomically acceptable time limit. Shared key material may be 1132 manually provisioned between two such endpoints in advance and still 1133 allow a limited subset of functionality. Such a "better than 1134 nothing" implementation would have to be regarded as non-compliant 1135 with the ZRTP specification, but it could interoperate in Preshared 1136 (and if applicable, Multistream) mode with a compliant ZRTP endpoint. 1138 Because Preshared mode affects the state of the retained shared 1139 secret cache, only one in-process ZRTP Preshared exchange may occur 1140 at a time between two ZRTP endpoints. This rule is explained in more 1141 detail in Section 4.4.1, and applies for the same reasons as in DH 1142 mode. 1144 Preshared mode is only included in this specification to meet the 1145 R-REUSE requirement in the Media Security Requirements [RFC5479] 1146 document. A series of preshared-keyed calls between two ZRTP 1147 endpoints should use a DH key exchange periodically. Preshared mode 1148 is only used if a cached shared secret has been established in an 1149 earlier session by a DH exchange, as discussed in Section 4.9. 1151 4.4.2.1. Commitment in Preshared Mode 1153 Preshared mode is selected by setting the Key Agreement Type to 1154 Preshared in the Commit message. This results in the same call flow 1155 as Multistream mode. The principal difference between Multistream 1156 mode and Preshared mode is that Preshared mode uses a previously 1157 cached shared secret, rs1, instead of an active ZRTP Session key, 1158 ZRTPSess, as the initial keying material. 1160 Preshared mode depends on having a reliable shared secret in its 1161 cache. Before Preshared mode is used, the initial DH exchange that 1162 gave rise to the shared secret SHOULD have used at least one of these 1163 anti-MiTM mechanisms: 1) A verbal comparison of the SAS, evidenced by 1164 the SAS Verified flag, or 2) an end-to-end integrity-protected 1165 delivery of the a=zrtp-hash in the signaling (Section 8.1.1), or 3) a 1166 digital signature on the sashash (Section 7.2). 1168 4.4.2.2. Initiator Behavior in Preshared Mode 1170 The Commit message (Figure 7) is sent by the initiator of the ZRTP 1171 exchange. From the intersection of the algorithms in the sent and 1172 received Hello messages, the initiator chooses a hash, cipher, auth 1173 tag, key agreement type, and SAS type to be used. 1175 To assemble a Preshared commit, we must first construct a temporary 1176 preshared_key, which is constructed from one of several possible 1177 combinations of cached key material, depending on what is available 1178 in the shared secret cache. If rs1 is not available in the 1179 initiator's cache, then Preshared mode MUST NOT be used. 1181 preshared_key = hash(len(rs1) || rs1 || len(auxsecret) || 1182 auxsecret || len(pbxsecret) || pbxsecret) 1184 All of the explicit length fields, len(), in the above hash are 32- 1185 bit big-endian integers, giving the length in octets of the field 1186 that follows. Some members of the set of shared secrets (rs1, 1187 auxsecret, and pbxsecret) may have lengths of zero if they are null 1188 (not available), and are each preceded by a 4-octet length field. 1189 For example, if auxsecret is null, len(auxsecret) is 0x00000000, and 1190 auxsecret itself would be absent from the hash calculation, which 1191 means len(pbxsecret) would immediately follow len(auxsecret). 1193 In place of hvi in the Commit message, two smaller fields are 1194 inserted by the initiator: 1196 - A random nonce of length 4-words (16 octets). 1197 - A keyID = MAC(preshared_key, "Prsh") truncated to 64 bits. 1199 Note: Since the nonce is used to calculate different SRTP key and 1200 salt pairs for each session, a duplication will result in the same 1201 key and salt being generated for the two sessions, which would 1202 have disastrous security consequences. 1204 4.4.2.3. Responder Behavior in Preshared Mode 1206 The responder uses the received keyID to search for matching key 1207 material in its cache. It does this by computing a preshared_key 1208 value and keyID value using the same formula as the initiator, 1209 depending on what is available in the responder's local cache. If 1210 the locally computed keyID does not match the received keyID in the 1211 Commit, the responder recomputes a new preshared_key and keyID from a 1212 different subset of shared keys from the cache, dropping auxsecret or 1213 pbxsecret or both from the hash calculation, until a matching 1214 preshared_key is found or it runs out of possibilities. Note that 1215 rs2 is not included in the process. 1217 If it finds the appropriate matching shared key material, it is used 1218 to derive s0 and a new ZRTPSess key, as described in the next section 1219 on Shared Secret Calculation, Section 4.4.2.4. 1221 If the responder determines that it does not have a cached shared 1222 secret from a previous DH exchange, or it fails to match the keyID 1223 hash from the initiator with any combination of its shared keys, it 1224 SHOULD respond with its own DH Commit message. This would reverse 1225 the roles and the responder would become the initiator, because the 1226 DH Commit must always "trump" the Preshared Commit message as 1227 described in Section 4.2. The key exchange would then proceed using 1228 DH mode. However, if a severely resource-limited responder lacks the 1229 computing resources to respond in a reasonable time with a DH Commit, 1230 it MAY respond with a ZRTP Error message (Section 5.9) indicating 1231 that no shared secret is available. 1233 If both sides send Preshared Commit messages initiating a secure 1234 session at the same time, the contention is resolved and the 1235 initiator/responder roles are settled according to Section 4.2, and 1236 the protocol proceeds. 1238 In Preshared mode, both the DHPart1 and DHPart2 messages are skipped. 1239 After receiving the Commit message from the initiator, the responder 1240 sends the Confirm1 message after calculating this stream's SRTP keys, 1241 as described below. 1243 4.4.2.4. Shared Secret Calculation for Preshared Mode 1245 Preshared mode requires that the s0 and ZRTPSess keys be derived from 1246 the preshared_key, and this must be done in a way that guarantees 1247 uniqueness for each session. This is done by using nonce material 1248 from both parties: the explicit nonce in the initiator's Preshared 1249 Commit message (Figure 7) and the H3 field in the responder's Hello 1250 message (Figure 3). Thus both parties force the resulting shared 1251 secret to be unique for each session. 1253 A hash of the received and sent ZRTP messages in the current ZRTP 1254 exchange for the current media stream is calculated: 1256 total_hash = hash(Hello of responder || Commit) 1258 Note that only the ZRTP messages (Figure 3 and Figure 7), not the 1259 entire ZRTP packets, are included in the total_hash. 1261 The ZRTP key derivation function (KDF) (Section 4.5.1) requires the 1262 use of a KDF Context field (per NIST SP 800-108 [SP800-108] 1263 guidelines) which should include the ZIDi, ZIDr, and a nonce value 1264 known to both parties. The total_hash qualifies as a nonce value, 1265 because its computation included nonce material from the initiator's 1266 Commit message and the responder's Hello message. 1268 KDF_Context = (ZIDi || ZIDr || total_hash) 1270 The s0 key is derived via the ZRTP key derivation function 1271 (Section 4.5.1) from preshared_key and the nonces implicitly included 1272 in the total_hash. The nonces also ensure KDF_Context is unique for 1273 each session, which is critical for security. 1275 s0 = KDF(preshared_key, "ZRTP PSK", KDF_Context, negotiated hash 1276 length) 1278 The preshared_key MUST be erased as soon as it has been used to 1279 calculate s0. 1281 At this point in Preshared mode, the two endpoints proceed to the key 1282 derivations of ZRTPSess and the rest of the keys in Section 4.5.2, 1283 now that there is a defined s0. 1285 4.4.3. Multistream Mode 1287 The Multistream key agreement mode can be used to generate SRTP keys 1288 and salts for additional media streams established between a pair of 1289 endpoints. Multistream mode cannot be used unless there is an active 1290 SRTP session established between the endpoints which means a ZRTP 1291 Session key is active. This ZRTP Session key can be used to generate 1292 keys and salts without performing another DH calculation. In this 1293 mode, the retained shared secret cache is not used or updated. As a 1294 result, multiple ZRTP Multistream mode exchanges can be processed in 1295 parallel between two endpoints. 1297 Multistream mode is also used to resume a secure call that has gone 1298 clear using a GoClear message as described in Section 4.7.2.1. 1300 When adding additional media streams to an existing call, Multistream 1301 mode MUST be used. The first media stream MUST use either DH mode or 1302 Preshared mode. Only one DH exchange or Preshared exchange is 1303 performed, just for the first media stream. The DH exchange or 1304 Preshared exchange MUST be completed for the first media stream 1305 before Multistream mode is used to add any other media streams. In a 1306 Multistream session, a ZRTP endpoint MUST use the same ZID for all 1307 media streams, matching the ZID used in the first media stream. 1309 4.4.3.1. Commitment in Multistream Mode 1311 Multistream mode is selected by the initiator setting the Key 1312 Agreement Type to "Mult" in the Commit message (Figure 6). The 1313 Cipher Type, Auth Tag Length, and Hash in Multistream mode SHOULD be 1314 set by the initiator to the same as the values as in the initial DH 1315 Mode Commit. The SAS Type is ignored as there is no SAS 1316 authentication in this mode. 1318 Note: This requirement is needed since some endpoints cannot 1319 support different SRTP algorithms for different media streams. 1320 However, in the case of Multistream mode being used to go secure 1321 after a GoClear, the requirement to use the same SRTP algorithms 1322 is relaxed if there are no other active SRTP sessions. 1324 In place of hvi in the Commit, a random nonce of length 4-words (16 1325 octets) is chosen. Its value MUST be unique for all nonce values 1326 chosen for active ZRTP sessions between a pair of endpoints. If a 1327 Commit is received with a reused nonce value, the ZRTP exchange MUST 1328 be immediately terminated. 1330 Note: Since the nonce is used to calculate different SRTP key and 1331 salt pairs for each media stream, a duplication will result in the 1332 same key and salt being generated for the two media streams, which 1333 would have disastrous security consequences. 1335 If a Commit is received selecting Multistream mode, but the responder 1336 does not have a ZRTP Session Key available, the exchange MUST be 1337 terminated. Otherwise, the responder proceeds to the next section on 1338 Shared Secret Calculation, Section 4.4.3.2. 1340 If both sides send Multistream Commit messages at the same time, the 1341 contention is resolved and the initiator/responder roles are settled 1342 according to Section 4.2, and the protocol proceeds. 1344 In Multistream mode, both the DHPart1 and DHPart2 messages are 1345 skipped. After receiving the Commit message from the initiator, the 1346 responder sends the Confirm1 message after calculating this stream's 1347 SRTP keys, as described below. 1349 4.4.3.2. Shared Secret Calculation for Multistream Mode 1351 In Multistream mode, each media stream requires that a set of keys be 1352 derived from the ZRTPSess key, and this must be done in a way that 1353 guarantees uniqueness for each media stream. This is done by using 1354 nonce material from both parties: the explicit nonce in the 1355 initiator's Multistream Commit message (Figure 6) and the H3 field in 1356 the responder's Hello message (Figure 3). Thus both parties force 1357 the resulting shared secret to be unique for each media stream. 1359 A hash of the received and sent ZRTP messages in the current ZRTP 1360 exchange for the current media stream is calculated: 1362 total_hash = hash(Hello of responder || Commit) 1364 This refers to the Hello and Commit messages for the current media 1365 stream which is using Multistream mode, not the original media stream 1366 that included a full DH key agreement. Note that only the ZRTP 1367 messages (Figure 3 and Figure 6), not the entire ZRTP packets, are 1368 included in the hash. 1370 The ZRTP key derivation function (KDF) (Section 4.5.1) requires the 1371 use of a KDF Context field (per NIST SP 800-108 [SP800-108] 1372 guidelines) which should include the ZIDi, ZIDr, and a nonce value 1373 known to both parties. The total_hash qualifies as a nonce value, 1374 because its computation included nonce material from the initiator's 1375 Commit message and the responder's Hello message. 1377 KDF_Context = (ZIDi || ZIDr || total_hash) 1379 The current stream's SRTP keys and salts for the initiator and 1380 responder are calculated using the ZRTP Session Key ZRTPSess and the 1381 nonces implicitly included in the total_hash. The nonces also ensure 1382 KDF_Context will be unique for each media stream, which is critical 1383 for security. For each additional media stream, a separate s0 is 1384 derived from ZRTPSess via the ZRTP key derivation function 1385 (Section 4.5.1): 1387 s0 = KDF(ZRTPSess, "ZRTP MSK", KDF_Context, negotiated hash 1388 length) 1390 Note that the ZRTPSess key was previously derived from material that 1391 also includes a different and more inclusive total_hash from the 1392 entire packet sequence that performed the original DH exchange for 1393 the first media stream in this ZRTP session. 1395 At this point in Multistream mode, the two endpoints begin key 1396 derivations in Section 4.5.3. 1398 4.5. Key Derivations 1400 4.5.1. The ZRTP Key Derivation Function 1402 To derive keys from a shared secret, ZRTP uses an HMAC-based key 1403 derivation function, or KDF. It is used throughout Section 4.5.3 and 1404 in other sections. The HMAC function for the KDF is based on the 1405 negotiated hash algorithm defined in Section 5.1.2. 1407 The authors believe the ZRTP KDF is in full compliance with the 1408 recommendations in NIST SP 800-108 [SP800-108]. Section 7.5 of the 1409 NIST document describes "key separation", which is a security 1410 requirement for the cryptographic keys derived from the same key 1411 derivation key. The keys shall be separate in the sense that the 1412 compromise of some derived keys will not degrade the security 1413 strength of any of the other derived keys, or the security strength 1414 of the key derivation key. Strong preimage resistance is provided. 1416 The ZRTP KDF runs the NIST pseudorandom function (PRF) in counter 1417 mode, with only a single iteration of the counter. The NIST PRF is 1418 based on the HMAC function. The ZRTP KDF never has to generate more 1419 than 256 bits (or 384 bits for Suite B applications) of output key 1420 material, so only a single invocation of the HMAC function is needed. 1422 The ZRTP KDF is defined in this manner, per sections 5 and 5.1 of 1423 NIST SP 800-108 [SP800-108]: 1425 KDF(KI, Label, Context, L) = HMAC(KI, i || Label || 0x00 || 1426 Context || L) 1428 The HMAC in the KDF is keyed by KI, which is a secret key derivation 1429 key that is unknown to the wiretapper (for example, s0). The HMAC is 1430 computed on a concatenated set of nonsecret fields that are defined 1431 as follows. The first field is a 32-bit big-endian integer counter 1432 (i) required by NIST to be included in the HMAC each time the HMAC is 1433 computed, which we have set to the fixed value of 0x000001, because 1434 we only compute the HMAC once. Label is a string of nonzero octets 1435 that identifies the purpose for the derived keying material. The 1436 octet 0x00 is a delimiter required by NIST. The NIST KDF formula has 1437 a "Context" field which includes ZIDi, ZIDr, and some optional nonce 1438 material known to both parties. L is a 32-bit big-endian positive 1439 integer, not to exceed the length in bits of the output of the HMAC. 1440 The output of the KDF is truncated to the leftmost L bits. If SHA- 1441 384 is the negotiated hash algorithm, the HMAC would be HMAC-SHA-384, 1442 thus the maximum value of L would be 384, the negotiated hash length. 1444 The ZRTP KDF is not to be confused with the SRTP KDF defined in 1445 [RFC3711]. 1447 4.5.2. Deriving ZRTPSess Key and SAS in DH or Preshared modes 1449 Both DH mode and Preshared mode (but not Multistream mode) come to 1450 this common point in the protocol to derive ZRTPSess and the SAS from 1451 s0, via the ZRTP Key Derivation Function (Section 4.5.1). At this 1452 point, s0 has been calculated, as well as KDF_Context. These 1453 calculations are done only for the first media stream, not for 1454 Multistream mode. 1456 The ZRTPSess key is used only for these two purposes: 1) to generate 1457 the additional s0 keys (Section 4.4.3.2) for adding additional media 1458 streams to this session in Multistream mode, and 2) to generate the 1459 pbxsecret (Section 7.3.1) that may be cached for use in future 1460 sessions. The ZRTPSess key is kept for the duration of the call 1461 signaling session between the two ZRTP endpoints. That is, if there 1462 are two separate calls between the endpoints (in SIP terms, separate 1463 SIP dialogs), then a ZRTP Session Key MUST NOT be used across the two 1464 call signaling sessions. ZRTPSess MUST be destroyed no later than 1465 the end of the call signaling session. 1467 ZRTPSess = KDF(s0, "ZRTP Session Key", KDF_Context, negotiated 1468 hash length) 1470 Note that KDF_Context is unique for each media stream, but only the 1471 first media stream is permitted to calculate ZRTPSess. 1473 There is only one Short Authentication String (SAS) (Section 7) 1474 computed per call, which is applicable to all media streams derived 1475 from a single DH key agreement in a ZRTP session. KDF_Context is 1476 unique for each media stream, but only the first media stream is 1477 permitted to calculate sashash. 1479 sashash = KDF(s0, "SAS", KDF_Context, 256) 1480 sasvalue = sashash [truncated to leftmost 32 bits] 1482 Despite the exposure of the SAS to the two parties, the rest of the 1483 keying material is protected by the key separation properties of the 1484 KDF (Section 4.5.1). 1486 ZRTP-enabled VoIP clients may need to support additional forms of 1487 communication, such as text chat, instant messaging, or file 1488 transfers. These other forms of communication may need to be 1489 encrypted, and would benefit from leveraging the ZRTP key exchange 1490 used for the VoIP part of the call. In that case, more key material 1491 MAY be derived and "exported" from the ZRTP protocol and provided as 1492 a shared secret to the VoIP client for these non-VoIP purposes. The 1493 application can use this exported key in application-specific ways, 1494 outside the scope of the ZRTP protocol. 1496 ExportedKey = KDF(s0, "Exported key", KDF_Context, negotiated hash 1497 length) 1499 Only one ExportedKey is computed per call. KDF_Context is unique for 1500 each media stream, but only the first media stream is permitted to 1501 calculate ExportedKey. 1503 The application may use this exported key to derive other subkeys for 1504 various non-ZRTP purposes, via a KDF using separate KDF label strings 1505 defined by the application. This key or its derived subkeys can be 1506 used for encryption, or used to authenticate other key exchanges 1507 carried out by the application, protected by ZRTP's MiTM defense 1508 umbrella. The exported key and its descendants may be used for as 1509 long as needed by the application, maintained in a separate crypto 1510 context that may outlast the VoIP session. 1512 At this point in DH mode or Preshared mode, the two endpoints proceed 1513 on to the key derivations in Section 4.5.3, now that there is a 1514 defined s0 and ZRTPSess key. 1516 4.5.3. Deriving the rest of the keys from s0 1518 DH mode, Multistream mode, and Preshared mode all come to this common 1519 point in the protocol to derive a set of keys from s0. It can be 1520 assumed that s0 has been calculated, as well the ZRTPSess key and 1521 KDF_Context. A separate s0 key is associated with each media stream. 1523 Subkeys are not drawn directly from s0, as done in NIST SP 800-56A. 1524 To enhance key separation, ZRTP uses s0 to key a Key Derivation 1525 Function (Section 4.5.1) based on NIST SP 800-108 [SP800-108]. Since 1526 s0 already included total_hash in its derivation, it is redundant to 1527 use total_hash again in the KDF Context in all the invocations of the 1528 KDF keyed by s0. Nonetheless, NIST SP 800-108 always requires KDF 1529 Context to be defined for the KDF, and nonce material is required in 1530 some KDF invocations (especially for Multistream mode and Preshared 1531 mode), so total_hash is included as a nonce in the KDF Context. 1533 Separate SRTP master keys and master salts are derived for use in 1534 each direction for each media stream. Unless otherwise specified, 1535 ZRTP uses SRTP with no MKI, 32 bit authentication using HMAC-SHA1, 1536 AES-CM 128 or 256 bit key length, 112 bit session salt key length, 1537 2^48 key derivation rate, and SRTP prefix length 0. Secure RTCP 1538 (SRTCP) is also used, deriving the SRTCP keys from the same master 1539 keys and salts as SRTP, using the mechanisms specified in [RFC3711], 1540 without requiring a separate ZRTP negotiation for RTCP. 1542 The ZRTP initiator encrypts and the ZRTP responder decrypts packets 1543 by using srtpkeyi and srtpsalti, while the ZRTP responder encrypts 1544 and the ZRTP initiator decrypts packets by using srtpkeyr and 1545 srtpsaltr. The SRTP key and salt values are truncated (taking the 1546 leftmost bits) to the length determined by the chosen SRTP profile. 1547 These are generated by: 1549 srtpkeyi = KDF(s0, "Initiator SRTP master key", KDF_Context, 1550 negotiated AES key length) 1551 srtpsalti = KDF(s0, "Initiator SRTP master salt", KDF_Context, 1552 112) 1553 srtpkeyr = KDF(s0, "Responder SRTP master key", KDF_Context, 1554 negotiated AES key length) 1555 srtpsaltr = KDF(s0, "Responder SRTP master salt", KDF_Context, 1556 112) 1558 The MAC keys are the same length as the output of the underlying hash 1559 function in the KDF, and are thus generated without truncation. They 1560 are used only by ZRTP and not by SRTP. Different MAC keys are needed 1561 for the initiator and the responder to ensure that GoClear messages 1562 in each direction are unique and can not be cached by an attacker and 1563 reflected back to the endpoint. 1565 mackeyi = KDF(s0, "Initiator HMAC key", KDF_Context, negotiated 1566 hash length) 1567 mackeyr = KDF(s0, "Responder HMAC key", KDF_Context, negotiated 1568 hash length) 1570 ZRTP keys are generated for the initiator and responder to use to 1571 encrypt the Confirm1 and Confirm2 messages. They are truncated to 1572 the same size as the negotiated SRTP key size. 1574 zrtpkeyi = KDF(s0, "Initiator ZRTP key", KDF_Context, negotiated 1575 AES key length) 1576 zrtpkeyr = KDF(s0, "Responder ZRTP key", KDF_Context, negotiated 1577 AES key length) 1579 All key material is destroyed as soon as it is no longer needed, no 1580 later than the end of the call. s0 is erased in Section 4.6.1, and 1581 the rest of the session key material is erased in Section 4.7.2.1 and 1582 Section 4.7.3. 1584 4.6. Confirmation 1586 The Confirm1 and Confirm2 messages (Figure 10) contain the cache 1587 expiration interval (defined in Section 4.9) for the newly generated 1588 retained shared secret. The flagoctet is an 8 bit unsigned integer 1589 made up of these flags: the PBX Enrollment flag (E) defined in 1590 Section 7.3.1, SAS Verified flag (V) defined in Section 7.1, Allow 1591 Clear flag (A) defined in Section 4.7.2, and Disclosure flag (D) 1592 defined in Section 11. 1594 flagoctet = (E * 2^3) + (V * 2^2) + (A * 2^1) + (D * 2^0) 1596 Part of the Confirm1 and Confirm2 messages are encrypted using full- 1597 block Cipher Feedback Mode, and contain a 128-bit random CFB 1598 Initialization Vector (IV). The Confirm1 and Confirm2 messages also 1599 contain a MAC covering the encrypted part of the Confirm1 or Confirm2 1600 message which includes a string of zeros, the signature length, flag 1601 octet, cache expiration interval, signature type block (if present) 1602 and signature (Section 7.2) (if present). For the responder: 1604 confirm_mac = MAC(mackeyr, encrypted part of Confirm1) 1606 For the initiator: 1608 confirm_mac = MAC(mackeyi, encrypted part of Confirm2) 1610 The mackeyi and mackeyr keys are computed in Section 4.5.3. 1612 The exchange is completed when the responder sends either the 1613 Conf2ACK message or the responder's first SRTP media packet (with a 1614 valid SRTP auth tag). The initiator MUST treat the first valid SRTP 1615 media from the responder as equivalent to receiving a Conf2ACK. The 1616 responder may respond to Confirm2 with either SRTP media or Conf2ACK, 1617 or both, in whichever order the responder chooses (or whichever order 1618 the "cloud" chooses to deliver them). 1620 4.6.1. Updating the Cache of Shared Secrets 1622 After receiving the Confirm messages, both parties must now update 1623 their retained shared secret rs1 in their respective caches, provided 1624 the following conditions hold: 1626 1) This key exchange is either DH or Preshared mode, not 1627 Multistream mode, which does not update the cache. 1628 2) Depending on the values of the cache expiration intervals that 1629 are received in the two Confirm messages, there are some scenarios 1630 that do not update the cache, as explained in Section 4.9. 1631 3) The responder MUST receive the initiator's Confirm2 message 1632 before updating the responder's cache. 1633 4) The initiator MUST receive either the responder's Conf2ACK 1634 message or the responder's SRTP media (with a valid SRTP auth tag) 1635 before updating the initiator's cache. 1637 The cache update may also be affected by a cache mismatch, according 1638 to Section 4.6.1.1. 1640 For DH mode only, before updating the retained shared secret rs1 in 1641 the cache, each party first discards their old rs2 and copies their 1642 old rs1 to rs2. The old rs1 is saved to rs2 because of the risk of 1643 session interruption after one party has updated his own rs1 but 1644 before the other party has enough information to update her own rs1. 1645 If that happens, they may regain cache sync in the next session by 1646 using rs2 (per Section 4.3). This mitigates the well-known Two 1647 Generals' Problem [Byzantine]. The old rs1 value is not saved in 1648 Preshared mode. 1650 For DH mode and Preshared mode, both parties compute a new rs1 value 1651 from s0 via the ZRTP key derivation function (Section 4.5.1): 1653 rs1 = KDF(s0, "retained secret", KDF_Context, 256) 1655 Note that KDF_Context is unique for each media stream, but only the 1656 first media stream is permitted to update rs1. 1658 Each media stream has its own s0. At this point in the protocol for 1659 each media stream, the corresponding s0 MUST be erased. 1661 4.6.1.1. Cache Update Following a Cache Mismatch 1663 If a shared secret cache mismatch (as defined in Section 4.3.2) is 1664 detected in the current session, it indicates a possible MiTM attack. 1665 However, there may be evidence to the contrary, if either one of the 1666 following conditions are met: 1668 o Successful use of the mechanism described in Section 8.1.1, but 1669 only if fully supported by end-to-end integrity-protected delivery 1670 of the a=zrtp-hash in the signaling via SIP Identity [RFC4474] or 1671 better still, Dan Wing's SIP Identity using Media Path 1672 [I-D.wing-sip-identity-media]. This allows authentication of the 1673 DH exchange without human assistance. 1674 o A good signature is received and verified using the digital 1675 signature feature on the SAS hash, as described in Section 7.2, if 1676 this feature is supported. 1678 If there is a cache mismatch in the absence of the aforementioned 1679 mitigating evidence, the cache update MUST be delayed in the current 1680 session until the user verbally compares the SAS with his partner 1681 during the call and confirms a successful SAS verify via his user 1682 interface as described in Section 7.1. If the session ends before 1683 that happens, the cache update is not performed, leaving the rs1/rs2 1684 values unmodified in the cache. Regardless of whether a cache 1685 mismatch occurs, s0 must still be erased. 1687 If no cache entry exists, as is the case in the initial call, the 1688 cache update is handled in the normal fashion. 1690 4.7. Termination 1692 A ZRTP session is normally terminated at the end of a call, but it 1693 may be terminated early by either the Error message or the GoClear 1694 message. 1696 4.7.1. Termination via Error message 1698 The Error message (Section 5.9) is used to terminate an in-progress 1699 ZRTP exchange due to an error. The Error message contains an integer 1700 Error Code for debugging purposes. The termination of a ZRTP key 1701 agreement exchange results in no updates to the cached shared secrets 1702 and deletion of all crypto context for that media stream. The ZRTP 1703 Session key, ZRTPSess, is only deleted if all ZRTP media streams 1704 which are using it are terminated. 1706 Because no key agreement has been reached, the Error message cannot 1707 use the same MAC protection as the GoClear message. A denial of 1708 service is possible by injecting fake Error messages. (However, even 1709 if the Error message were somehow designed with integrity protection, 1710 it would raise other questions. What would a badly formed Error 1711 message mean if it were sent to report a badly formed message? A 1712 good message?) 1714 4.7.2. Termination via GoClear message 1716 The GoClear message (Section 5.11) is used to switch from SRTP to 1717 RTP, usually because the user has chosen to do that by pressing a 1718 button. The GoClear uses a MAC of the Message Type Block sent in the 1719 GoClear Message computed with the mackey derived from the shared 1720 secret. This MAC is truncated to the leftmost 64 bits. When sent by 1721 the initiator: 1723 clear_mac = MAC(mackeyi, "GoClear ") 1725 When sent by the responder: 1727 clear_mac = MAC(mackeyr, "GoClear ") 1729 Both of these MACs are calculated across the 8-octet "GoClear " 1730 Message Type Block, including the trailing space. 1732 A GoClear message which does not receive a ClearACK response must be 1733 resent. If a GoClear message is received with a bad MAC, ClearACK 1734 MUST NOT be sent and the GoClear MUST NOT be acted on by the 1735 recipient, but MAY be processed as a security exception, perhaps by 1736 logging or alerting the user. 1738 A ZRTP endpoint MAY choose to accept GoClear messages after the 1739 session has switched to SRTP, allowing the session to revert to RTP. 1740 This is indicated in the Confirm1 or Confirm2 messages (Figure 10) by 1741 setting the Allow Clear flag (A). If an endpoint sets the Allow 1742 Clear (A) flag in their Confirm message, it indicates that they 1743 support receiving GoClear messages. 1745 A ZRTP endpoint that receives a GoClear MUST authenticate the message 1746 by checking the clear_mac. If the message authenticates, the 1747 endpoint stops sending SRTP packets, and generates a ClearACK in 1748 response. It MUST also delete all the crypto key material for all 1749 the SRTP media streams, as defined in Section 4.7.2.1. 1751 Until confirmation from the user is received (e.g. clicking a button, 1752 pressing a DTMF key, etc.), the ZRTP endpoint MUST NOT resume sending 1753 RTP packets. The endpoint then renders to the user an indication 1754 that the media session has switched to clear mode, and waits for 1755 confirmation from the user. This blocks the flow of sensitive 1756 discourse until the user is forced to take notice that he's no longer 1757 protected by encryption. To prevent pinholes from closing or NAT 1758 bindings from expiring, the ClearACK message MAY be resent at regular 1759 intervals (e.g. every 5 seconds) while waiting for confirmation from 1760 the user. After confirmation of the notification is received from 1761 the user, the sending of RTP packets may begin. 1763 After sending a GoClear message, the ZRTP endpoint stops sending SRTP 1764 packets. When a ClearACK is received, the ZRTP endpoint deletes the 1765 crypto context for the SRTP session, as defined in Section 4.7.2.1, 1766 and may then resume sending RTP packets. 1768 In the event a ClearACK is not received before the retransmissions of 1769 GoClear are exhausted, the key material is deleted, as defined in 1770 Section 4.7.2.1. 1772 After the users have transitioned from SRTP media back to RTP media 1773 (clear mode), they may decide later to return to secure mode by 1774 manual activation, usually by pressing a GO SECURE button. In that 1775 case, a new secure session is initiated by the party that presses the 1776 button, by sending a new Commit message, leading to a new session key 1777 negotiation. It is not necessary to send another Hello message, as 1778 the two parties have already done that at the start of the call and 1779 thus have already discovered each other's ZRTP capabilities. It is 1780 possible for users to toggle back and forth between clear and secure 1781 modes multiple times in the same session, just as they could in the 1782 old days of secure PSTN phones. 1784 4.7.2.1. Key Destruction for GoClear message 1786 All SRTP session key material MUST be erased by the receiver of the 1787 GoClear message upon receiving a properly authenticated GoClear. The 1788 same key destruction MUST be done by the sender of GoClear message, 1789 upon receiving the ClearACK. This must be done for the key material 1790 for all of the media streams. 1792 All key material that would have been erased at the end of the SIP 1793 session MUST be erased, as described in Section 4.7.3, with the 1794 single exception of ZRTPSess. In this case, ZRTPSess is destroyed in 1795 a manner different from the other key material. Both parties replace 1796 ZRTPSess with a KDF-derived non-invertible function of itself: 1798 ZRTPSess = KDF(ZRTPSess, "New ZRTP Session", (ZIDi || ZIDr), 1799 negotiated hash length) 1801 ZRTPSess will be replaced twice if a session generates separate 1802 GoClear messages for both audio and video streams, and the two 1803 endpoints need not carry out the replacements in the same order. 1805 The destruction of key material meets the requirements of Perfect 1806 Forward Secrecy (PFS), but still preserves a new version of ZRTPSess, 1807 so that the user can later re-initiate secure mode during the same 1808 session without performing another Diffie-Hellman calculation using 1809 Multistream mode which requires and assumes the existence of ZRTPSess 1810 with the same value at both ZRTP endpoints. A new key negotiation 1811 after a GoClear SHOULD use a Multistream Commit message. 1813 Note: Multistream mode is preferred over a Diffie-Hellman mode 1814 since this does not require the generation of a new hash chain and 1815 a new signaling exchange to exchange new Hello Hash values. 1817 Later, at the end of the entire call, ZRTPSess is finally destroyed 1818 along with the other key material, as described in Section 4.7.3. 1820 4.7.3. Key Destruction at Termination 1822 All SRTP session key material MUST be erased by both parties at the 1823 end of the call. In particular, the destroyed key material includes 1824 the SRTP session keys and salts, SRTP master keys and salts, and all 1825 material sufficient to reconstruct the SRTP keys and salts, including 1826 ZRTPSess and s0 (although s0 should have been destroyed earlier, in 1827 Section 4.6.1). This must be done for the key material for all of 1828 the media streams. The only exceptions are the cached shared secrets 1829 needed for future sessions, including rs1, rs2, and pbxsecret. 1831 4.8. Random Number Generation 1833 The ZRTP protocol uses random numbers for cryptographic key material, 1834 notably for the DH secret exponents and nonces, which must be freshly 1835 generated with each session. Whenever a random number is needed, all 1836 of the following criteria must be satisfied: 1838 Random numbers MUST be freshly generated, meaning that it must not 1839 have been used in a previous calculation. 1841 When generating a random number k of L bits in length, k MUST be 1842 chosen with equal probability from the range of [1 < k < 2^L]. 1844 It MUST be derived from a physical entropy source, such as RF noise, 1845 acoustic noise, thermal noise, high resolution timings of 1846 environmental events, or other unpredictable physical sources of 1847 entropy. One possible source of entropy for a VoIP client would be 1848 microphone noise. For a detailed explanation of cryptographic grade 1849 random numbers and guidance for collecting suitable entropy, see 1850 [RFC4086] and Chapter 10 of Practical Cryptography [Ferguson]. The 1851 raw entropy must be distilled and processed through a deterministic 1852 random bit generator (DRBG). Examples of DRBGs may be found in NIST 1853 SP 800-90 [SP800-90], in [Ferguson], and in [RFC5869]. Failure to 1854 use true entropy from the physical environment as a basis for 1855 generating random cryptographic key material would lead to a 1856 disastrous loss of security. 1858 4.9. ZID and Cache Operation 1860 Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID that 1861 is generated once at installation time. It is used to look up 1862 retained shared secrets in a local cache. A single global ZID for a 1863 single installation is the simplest way to implement ZIDs. However, 1864 it is specifically not precluded for an implementation to use 1865 multiple ZIDs, up to the limit of a separate one per callee. This 1866 then turns it into a long-lived "association ID" that does not apply 1867 to any other associations between a different pair of parties. It is 1868 a goal of this protocol to permit both options to interoperate 1869 freely. A PBX acting as a trusted man in the middle will also 1870 generate a single ZID and use that ZID for all endpoints behind it, 1871 as described in Section 10. 1873 There is no protocol mechanism to invalidate a previously used ZID. 1874 An endpoint wishing to change ZID would simply generate a new one and 1875 begin using it. 1877 The ZID should not be hard coded or hard-defined in the firmware of a 1878 product. It should be randomly generated by the software and stored 1879 at installation or initialization time. It should be randomly 1880 generated rather than allocated from a preassigned range of ZID 1881 values, because 96 bits should be enough to avoid birthday collisions 1882 in realistic scenarios. 1884 Each time a new s0 is calculated, a new retained shared secret rs1 is 1885 generated and stored in the cache, indexed by the ZID of the other 1886 endpoint. This cache updating is described in Section 4.6.1. For 1887 the new retained shared secret, each endpoint chooses a cache 1888 expiration value which is an unsigned 32 bit integer of the number of 1889 seconds that this secret should be retained in the cache. The time 1890 interval is relative to when the Confirm1 message is sent or 1891 received. 1893 The cache intervals are exchanged in the Confirm1 and Confirm2 1894 messages (Figure 10). The actual cache interval used by both 1895 endpoints is the minimum of the values from the Confirm1 and Confirm2 1896 messages. A value of 0 seconds means the newly-computed shared 1897 secret SHOULD NOT be stored in the cache, and if a cache entry 1898 already exists from an earlier call, the stored cache interval should 1899 be set to 0. This means if either Confirm message contains a null 1900 cache expiration interval, and there is no cache entry already 1901 defined, no new cache entry is created. A value of 0xffffffff means 1902 the secret should be cached indefinitely and is the recommended 1903 value. If the ZRTP exchange is Multistream Mode, the field in the 1904 Confirm1 and Confirm2 is set to 0xffffffff and ignored, and the cache 1905 is not updated. 1907 The expiration interval need not be used to force the deletion of a 1908 shared secret from the cache when the interval has expired. It just 1909 means the shared secret MAY be deleted from that cache at any point 1910 after the interval has expired without causing the other party to 1911 note it as an unexpected security event when the next key negotiation 1912 occurs between the same two parties. This means there need not be 1913 perfectly synchronized deletion of expired secrets from the two 1914 caches, and makes it easy to avoid a race condition that might 1915 otherwise be caused by clock skew. 1917 If the expiration interval is not properly agreed to by both 1918 endpoints, it may later result in false alarms of MiTM attacks, due 1919 to apparent cache mismatches (Section 4.3.2). 1921 The relationship between a ZID and a SIP AOR is explained in 1922 Section 12. 1924 4.9.1. Cacheless implementations 1926 It is possible to implement a simplified but nonetheless useful (and 1927 still compliant) profile of the ZRTP protocol that does not support 1928 any caching of shared secrets. In this case the users would have to 1929 rely exclusively on the verbal SAS comparison for every call. That 1930 is, unless MiTM protection is provided by the mechanisms in 1931 Section 8.1.1 or Section 7.2, which introduce their own forms of 1932 complexity. 1934 If a ZRTP endpoint does not support caching of shared secrets, it 1935 MUST set the cache expiration interval to zero, and MUST set the SAS 1936 Verified (V) flag (Section 7.1) to false. In addition, because the 1937 ZID serves mainly as a cache index, the ZID would not be required to 1938 maintain the same value across separate SIP sessions, although there 1939 is no reason why it should not. 1941 Cacheless operation would sacrifice the key continuity (Section 15.1) 1942 features, as well as Preshared mode (Section 4.4.2). Further, if the 1943 pbxsecret is also not cached, there would be no PBX trusted MiTM 1944 (Section 7.3) features, including the PBX security enrollment 1945 (Section 7.3.1) mechanism. 1947 5. ZRTP Messages 1949 All ZRTP messages use the message format defined in Figure 2. All 1950 word lengths referenced in this specification are 32 bits or 4 1951 octets. All integer fields are carried in network byte order, that 1952 is, most significant byte (octet) first, commonly known as big- 1953 endian. 1955 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1956 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1957 |0 0 0 1|Not Used (set to zero) | Sequence Number | 1958 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1959 | Magic Cookie 'ZRTP' (0x5a525450) | 1960 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1961 | Source Identifier | 1962 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1963 | | 1964 | ZRTP Message (length depends on Message Type) | 1965 | . . . | 1966 | | 1967 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1968 | CRC (1 word) | 1969 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1971 Figure 2: ZRTP Packet Format 1973 The Sequence Number is a count that is incremented for each ZRTP 1974 packet sent. The count is initialized to a random value. This is 1975 useful in estimating ZRTP packet loss and also detecting when ZRTP 1976 packets arrive out of sequence. 1978 The ZRTP Magic Cookie is a 32 bit string that uniquely identifies a 1979 ZRTP packet, and has the value 0x5a525450. 1981 Source Identifier is the SSRC number of the RTP stream that this ZRTP 1982 packet relates to. For cases of forking or forwarding, RTP and hence 1983 ZRTP may arrive at the same port from several different sources - 1984 each of these sources will have a different SSRC and may initiate an 1985 independent ZRTP protocol session. SSRC collisions would be 1986 disruptive to ZRTP. SSRC collision handling procedures are described 1987 in Section 4.1. 1989 This format is clearly identifiable as non-RTP due to the first two 1990 bits being zero which looks like RTP version 0, which is not a valid 1991 RTP version number. It is clearly distinguishable from STUN since 1992 the magic cookies are different. The 12 not used bits are set to 1993 zero and MUST be ignored when received. 1995 The ZRTP Messages are defined in Figure 3 to Figure 17 and are of 1996 variable length. 1998 The ZRTP protocol uses a 32 bit CRC as defined in RFC 4960, Appendix 1999 B [RFC4960] in each ZRTP packet to detect transmission errors. ZRTP 2000 packets are typically transported by UDP, which carries its own 2001 built-in 16-bit checksum for integrity, but ZRTP does not rely on it. 2002 This is because of the effect of an undetected transmission error in 2003 a ZRTP message. For example, an undetected error in the DH exchange 2004 could appear to be an active man-in-the-middle attack. A false 2005 announcement of this by ZRTP clients can be psychologically 2006 distressing. The probability of such a false alarm hinges on a mere 2007 16-bit checksum that usually protects UDP packets, so more error 2008 detection is needed. For these reasons, this belt-and-suspenders 2009 approach is used to minimize the chance of a transmission error 2010 affecting the ZRTP key agreement. 2012 The CRC is calculated across the entire ZRTP packet shown in 2013 Figure 2, including the ZRTP Header and the ZRTP Message, but not 2014 including the CRC field. If a ZRTP message fails the CRC check, it 2015 is silently discarded. 2017 5.1. ZRTP Message Formats 2019 ZRTP messages are designed to simplify endpoint parsing requirements 2020 and to reduce the opportunities for buffer overflow attacks (a good 2021 goal of any security extension should be to not introduce new attack 2022 vectors). 2024 ZRTP uses a block of 8 octets (2 words) to encode the Message Type. 4 2025 octets (1 word) blocks are used to encode Hash Type, Cipher Type, and 2026 Key Agreement Type, and Authentication Tag Type. The values in the 2027 blocks are ASCII strings which are extended with spaces (0x20) to 2028 make them the desired length. Currently defined block values are 2029 listed in Tables 1-6 below. 2031 Additional block values may be defined and used. 2033 ZRTP uses this ASCII encoding to simplify debugging and make it 2034 "Wireshark (Ethereal) friendly". 2036 5.1.1. Message Type Block 2038 Currently 16 Message Type Blocks are defined - they represent the set 2039 of ZRTP message primitives. ZRTP endpoints MUST support the Hello, 2040 HelloACK, Commit, DHPart1, DHPart2, Confirm1, Confirm2, Conf2ACK, 2041 SASrelay, RelayACK, Error, ErrorACK, and PingACK message types. ZRTP 2042 endpoints MAY support the GoClear, ClearACK, and Ping messages. In 2043 order to generate a PingACK message, it is necessary to parse a Ping 2044 message. Additional messages may be defined in extensions to ZRTP. 2046 Message Type Block | Meaning 2047 --------------------------------------------------- 2048 "Hello " | Hello Message 2049 --------------------------------------------------- 2050 "HelloACK" | HelloACK Message 2051 --------------------------------------------------- 2052 "Commit " | Commit Message 2053 --------------------------------------------------- 2054 "DHPart1 " | DHPart1 Message 2055 --------------------------------------------------- 2056 "DHPart2 " | DHPart2 Message 2057 --------------------------------------------------- 2058 "Confirm1" | Confirm1 Message 2059 --------------------------------------------------- 2060 "Confirm2" | Confirm2 Message 2061 --------------------------------------------------- 2062 "Conf2ACK" | Conf2ACK Message 2063 --------------------------------------------------- 2064 "Error " | Error Message 2065 --------------------------------------------------- 2066 "ErrorACK" | ErrorACK Message 2067 --------------------------------------------------- 2068 "GoClear " | GoClear Message 2069 --------------------------------------------------- 2070 "ClearACK" | ClearACK Message 2071 --------------------------------------------------- 2072 "SASrelay" | SASrelay Message 2073 --------------------------------------------------- 2074 "RelayACK" | RelayACK Message 2075 --------------------------------------------------- 2076 "Ping " | Ping Message 2077 --------------------------------------------------- 2078 "PingACK " | PingACK Message 2079 --------------------------------------------------- 2081 Table 1. Message Type Block Values 2083 5.1.2. Hash Type Block 2085 The hash algorithm and its related MAC algorithm are negotiated via 2086 the Hash Type Block found in the Hello message (Section 5.2) and the 2087 Commit message (Section 5.4). 2089 All ZRTP endpoints MUST support a Hash Type of SHA-256 [FIPS-180-3]. 2090 SHA-384 SHOULD be supported, and MUST be supported if ECDH-384 is 2091 used. Additional Hash Types MAY be used, such as the NIST SHA-3 hash 2092 [SHA-3] when it becomes available. Note that the Hash Type refers to 2093 the hash algorithm that will be used throughout the ZRTP key 2094 exchange, not the hash algorithm to be used in the SRTP 2095 Authentication Tag. 2097 The choice of the negotiated Hash Type is coupled to the Key 2098 Agreement type, as explained in Section 5.1.5. 2100 Hash Type Block | Meaning 2101 ---------------------------------------------------------- 2102 "S256" | SHA-256 Hash defined in FIPS 180-3 2103 ---------------------------------------------------------- 2104 "S384" | SHA-384 Hash defined in FIPS 180-3 2105 ---------------------------------------------------------- 2106 "N256" | NIST SHA-3 256-bit hash (when published) 2107 ---------------------------------------------------------- 2108 "N384" | NIST SHA-3 384-bit hash (when published) 2109 ---------------------------------------------------------- 2111 Table 2. Hash Type Block Values 2113 At the time of this writing, the NIST SHA-3 hashes [SHA-3] are not 2114 yet available. NIST is expected to publish SHA-3 in 2012, as a 2115 successor to the SHA-2 hashes in [FIPS-180-3]. 2117 5.1.2.1. Negotiated Hash and MAC algorithm 2119 ZRTP makes use of message authentication codes (MACs) which are keyed 2120 hashes based on the negotiated Hash Type. For the SHA-2 and SHA-3 2121 hashes, the negotiated MAC is the HMAC based on the negotiated hash. 2122 This MAC function is also used in the ZRTP key derivation function 2123 (Section 4.5.1). 2125 The HMAC function is defined in [FIPS-198-1]. A discussion of the 2126 general security of the HMAC construction may be found in [RFC2104]. 2127 Test vectors for HMAC-SHA-256 and HMAC-SHA-384 may be found in 2128 [RFC4231]. 2130 The negotiated Hash Type does not apply to the hash used in the 2131 digital signature defined in Section 7.2. For example, even if the 2132 negotiated Hash Type is SHA-256, the digital signature may use SHA- 2133 384 if an ECDSA P-384 signature key is used. Digital signatures are 2134 optional in ZRTP. 2136 Except for the aforementioned digital signatures, and the special 2137 cases noted in Section 5.1.2.2, all the other hashes and MACs used 2138 throughout the ZRTP protocol will use the negotiated Hash Type. 2140 A future hash may include its own built-in MAC, not based on the HMAC 2141 construct, for example, the Skein hash function [Skein]. If NIST 2142 chooses such a hash as the SHA-3 winner, Hash Types "N256" and "N384" 2143 will still use the related HMAC as the negotiated MAC. If an 2144 implementor wishes to use Skein and its built-in MAC as the 2145 negotiated MAC, new Hash Types must be used. 2147 5.1.2.2. Implicit Hash and MAC algorithm 2149 While most of the hash and MAC usage in ZRTP is defined by the 2150 negotiated Hash Type (Section 5.1.2), some hashes and MACs must be 2151 precomputed prior to negotiations, and thus cannot have their 2152 algorithms negotiated during the ZRTP exchange. They are implicitly 2153 predetermined to use SHA-256 [FIPS-180-3] and HMAC-SHA-256. 2155 These are the hashes and MACs that MUST use the Implicit hash and MAC 2156 algorithm: 2158 The hash chain H0-H3 defined in Section 9. 2159 The MACs that are keyed by this hash chain, as defined in 2160 Section 8.1.1. 2161 The Hello Hash in the a=zrtp-hash attribute defined in 2162 Section 8.1. 2164 ZRTP defines a method for negotiating different ZRTP protocol 2165 versions (Section 4.1.1). SHA-256 is the Implicit Hash and HMAC-SHA- 2166 256 is the Implicit MAC for ZRTP protocol version 1.10. Future ZRTP 2167 protocol versions may, if appropriate, use another hash algorithm as 2168 the Implicit Hash, such as the NIST SHA-3 hash [SHA-3] when it 2169 becomes available. For example, a future SIP packet may list two 2170 a=zrtp-hash SDP attributes, one based on SHA-256 for ZRTP version 2171 1.10, and another based on SHA-3 for ZRTP version 2.00. 2173 5.1.3. Cipher Type Block 2175 All ZRTP endpoints MUST support AES-128 (AES1) and MAY support AES- 2176 192 (AES2), AES-256 (AES3), or other Cipher Types. The Advanced 2177 Encryption Standard is defined in [FIPS-197]. 2179 The use of AES-128 in SRTP is defined by [RFC3711]. The use of AES- 2180 192 and AES-256 in SRTP is defined by [I-D.ietf-avt-srtp-big-aes]. 2181 The choice of the AES key length is coupled to the Key Agreement 2182 type, as explained in Section 5.1.5. 2184 Other block ciphers may be supported that have the same block size 2185 and key sizes as AES. If implemented, they may be used anywhere in 2186 ZRTP or SRTP in place of the AES, in the same modes of operation and 2187 key size. Notably, in counter mode to replace AES-CM in [RFC3711] 2188 and [I-D.ietf-avt-srtp-big-aes], as well as in CFB mode to encrypt a 2189 portion of the Confirm message (Figure 10). ZRTP endpoints MAY 2190 support the TwoFish [TwoFish] block cipher. 2192 Cipher Type Block | Meaning 2193 ------------------------------------------------- 2194 "AES1" | AES with 128 bit keys 2195 ------------------------------------------------- 2196 "AES2" | AES with 192 bit keys 2197 ------------------------------------------------- 2198 "AES3" | AES with 256 bit keys 2199 ------------------------------------------------- 2200 "2FS1" | TwoFish with 128 bit keys 2201 ------------------------------------------------- 2202 "2FS2" | TwoFish with 192 bit keys 2203 ------------------------------------------------- 2204 "2FS3" | TwoFish with 256 bit keys 2205 ------------------------------------------------- 2207 Table 3. Cipher Type Block Values 2209 5.1.4. Auth Tag Type Block 2211 All ZRTP endpoints MUST support HMAC-SHA1 authentication tags for 2212 SRTP, with both 32 bit and 80 bit length tags as defined in 2213 [RFC3711]. 2215 ZRTP endpoints MAY support 32 bit and 64 bit SRTP authentication tags 2216 based on the Skein hash function [Skein]. The Skein-512-MAC key 2217 length is fixed at 256 bits for this application, and the output 2218 length is adjustable. The Skein MAC is defined in sections 2.6 and 2219 4.3 of [Skein], and is not based on the HMAC construct. Reference 2220 implementations for Skein may be found at [Skein1]. A Skein-based 2221 MAC is significantly more efficient than HMAC-SHA1, especially for 2222 short SRTP payloads. 2224 The Skein MAC key is computed by the SRTP key derivation function, 2225 which is also referred to as the AES-CM PRF, or pseudorandom 2226 function. This is defined either in [RFC3711] or in 2227 [I-D.ietf-avt-srtp-big-aes], depending on the selected SRTP AES key 2228 length. To compute a Skein MAC key, the SRTP PRF output for the 2229 authentication key is left untruncated at 256 bits, instead of the 2230 usual truncated length of 160 bits (the key length used by HMAC- 2231 SHA1). 2233 Auth Tag Type Block | Meaning 2234 ---------------------------------------------------------- 2235 "HS32" | 32 bit authentication tag based on 2236 | HMAC-SHA1 as defined in RFC 3711 2237 ---------------------------------------------------------- 2238 "HS80" | 80 bit authentication tag based on 2239 | HMAC-SHA1 as defined in RFC 3711 2240 ---------------------------------------------------------- 2241 "SK32" | 32 bit authentication tag based on 2242 | Skein-512-MAC as defined in [Skein], 2243 | with 256 bit key, 32 bit MAC length. 2244 ---------------------------------------------------------- 2245 "SK64" | 64 bit authentication tag based on 2246 | Skein-512-MAC as defined in [Skein], 2247 | with 256 bit key, 64 bit MAC length. 2248 ---------------------------------------------------------- 2250 Table 4. Auth Tag Type Values 2252 5.1.5. Key Agreement Type Block 2254 All ZRTP endpoints MUST support DH3k, SHOULD support Preshared, and 2255 MAY support EC25, EC38, and DH2k. 2257 If a ZRTP endpoint supports multiple concurrent media streams, such 2258 as audio and video, it MUST support Multistream (Section 4.4.3) mode. 2259 Also, if a ZRTP endpoint supports the GoClear message 2260 (Section 4.7.2), it SHOULD support Multistream, to be used if the two 2261 parties choose to return to the secure state after going Clear (as 2262 explained in Section 4.7.2.1). 2264 For Finite Field Diffie-Hellman, ZRTP endpoints MUST use the DH 2265 parameters defined in [RFC3526], as follows. DH3k uses the 3072-bit 2266 MODP group. DH2k uses the 2048-bit MODP group. The DH generator g 2267 is 2. The random Diffie-Hellman secret exponent SHOULD be twice as 2268 long as the AES key length. If AES-128 is used, the DH secret value 2269 SHOULD be 256 bits long. If AES-256 is used, the secret value SHOULD 2270 be 512 bits long. 2272 If Elliptic Curve DH is used, the ECDH algorithm and key generation 2273 is from NIST SP 800-56A [SP800-56A]. The curves used are from NSA 2274 Suite B [NSA-Suite-B], which uses the same curves as ECDSA defined by 2275 FIPS 186-3 [FIPS-186-3], and can also be found in RFC 5114 [RFC5114], 2276 sections 2.6 through 2.8. ECDH test vectors may be found in RFC 5114 2277 [RFC5114], sections A.6 through A.8. The validation procedures are 2278 from NIST SP 800-56A [SP800-56A] section 5.6.2.6, method 3, ECC 2279 Partial Validation. Both the X and Y coordinates of the point on the 2280 curve are sent, in the first and second half of the ECDH public 2281 value, respectively. The ECDH result returns only the X coordinate, 2282 as specified in SP 800-56A. Useful strategies for implementing ECC 2283 may be found in [I-D.mcgrew-fundamental-ecc]. 2285 The choice of the negotiated hash algorithm (Section 5.1.2) is 2286 coupled to the choice of key agreement type. If ECDH-384 (EC38) is 2287 chosen as the key agreement, the negotiated hash algorithm MUST be 2288 either SHA-384, or the corresponding SHA-3 successor. 2290 The choice of AES key length is coupled to the choice of key 2291 agreement type. If EC38 is chosen as the key agreement, AES-256 2292 (AES3) SHOULD be used but AES-192 MAY be used. If DH3K or EC25 is 2293 chosen, any AES key size MAY be used. Note that SRTP as defined in 2294 [RFC3711] only supports AES-128. 2296 DH2k is intended for low power applications, or for applications that 2297 require fast key negotiations, and SHOULD use AES-128. DH2k is not 2298 recommended for high security applications. Its security can be 2299 augmented by implementing ZRTP's key continuity features 2300 (Section 15.1). 2302 ECDH-521 SHOULD NOT be used, due to disruptive computational delays. 2303 These delays may lead to exhaustion of the retransmission schedule, 2304 unless both endpoints have very fast hardware. Note that ECDH-521 is 2305 not part of NSA Suite B. 2307 ZRTP also defines two non-DH modes, Multistream and Preshared, in 2308 which the SRTP key is derived from a shared secret and some nonce 2309 material. 2311 The table below lists the pv length in words and DHPart1 and DHPart2 2312 message length in words for each Key Agreement Type Block. 2314 Key Agreement | pv | message | Meaning 2315 Type Block | words | words | 2316 ----------------------------------------------------------- 2317 "DH3k" | 96 | 117 | DH mode with p=3072 bit prime 2318 | | | per RFC 3526, section 4. 2319 ----------------------------------------------------------- 2320 "DH2k" | 64 | 85 | DH mode with p=2048 bit prime 2321 | | | per RFC 3526, section 3. 2322 ----------------------------------------------------------- 2323 "EC25" | 16 | 37 | Elliptic Curve DH, P-256 2324 | | | per RFC 5114, section 2.6 2325 ----------------------------------------------------------- 2326 "EC38" | 24 | 45 | Elliptic Curve DH, P-384 2327 | | | per RFC 5114, section 2.7 2328 ----------------------------------------------------------- 2329 "EC52" | 33 | 54 | Elliptic Curve DH, P-521 2330 | | | per RFC 5114, section 2.8 2331 | | | (deprecated - do not use) 2332 ----------------------------------------------------------- 2333 "Prsh" | - | - | Preshared Non-DH mode 2334 | | | 2335 ----------------------------------------------------------- 2336 "Mult" | - | - | Multistream Non-DH mode 2337 | | | 2338 ----------------------------------------------------------- 2340 Table 5. Key Agreement Type Block Values 2342 5.1.6. SAS Type Block 2344 The SAS Type determines how the SAS is rendered to the user so that 2345 the user may verbally compare it with his partner over the voice 2346 channel. This allows detection of a man-in-the-middle (MiTM) attack. 2348 All ZRTP endpoints MUST support the base32 and MAY support the 2349 base256 rendering schemes for the Short Authentication String, and 2350 other SAS rendering schemes. See Section 4.5.2 for how the sasvalue 2351 is computed and Section 7 for how the SAS is used. 2353 SAS Type Block | Meaning 2354 --------------------------------------------------- 2355 "B32 " | Short Authentication String using 2356 | base32 encoding 2357 --------------------------------------------------- 2358 "B256" | Short Authentication String using 2359 | base256 encoding (PGP Word List) 2360 --------------------------------------------------- 2361 Table 6. SAS Type Block Values 2363 For the SAS Type of "B256", the leftmost 16 bits of the 32-bit 2364 sasvalue are rendered using the PGP Word List [pgpwordlist] 2365 [Juola1][Juola2]. 2367 For the SAS Type of "B32 ", the leftmost 20 bits of the 32-bit 2368 sasvalue are rendered as a form of base32 encoding, designed to 2369 represent bit sequences in a form that is convenient for human users 2370 to manipulate. The choice of characters and unusually permuted 2371 ordering are explained in the source document for the encoding scheme 2372 [z-base-32], which differs from RFC 4648. The leftmost 20 bits of 2373 the sasvalue results in four base32 characters which are rendered to 2374 both ZRTP endpoints. Here is a normative pseudocode implementation 2375 of the base32 function: 2377 char[4] base32(uint32 bits) 2378 { int i, n, shift; 2379 char result[4]; 2380 for (i=0,shift=27; i!=4; ++i,shift-=5) 2381 { n = (bits>>shift) & 31; 2382 result[i] = "ybndrfg8ejkmcpqxot1uwisza345h769"[n]; 2383 } 2384 return result; 2385 } 2387 5.1.7. Signature Type Block 2389 The Signature Type Block specifies what signature algorithm is used 2390 to sign the SAS as discussed in Section 7.2. The 4-octet Signature 2391 Type Block, along with the accompanying signature block, are OPTIONAL 2392 and may be present in the Confirm message (Figure 10) or the SASrelay 2393 message (Figure 16). The signature types are given in the table 2394 below. 2396 Signature | Meaning 2397 Type Block | 2398 ------------------------------------------------ 2399 "PGP " | OpenPGP Signature, per RFC 4880 2400 | or I-D.jivsov-openpgp-ecc 2401 ------------------------------------------------ 2402 "X509" | Suite B ECDSA, with X.509v3 cert 2403 | per FIPS 186-3 2404 ------------------------------------------------ 2406 Table 7. Signature Type Block Values 2408 Additional details on the signature and signing key format may be 2409 found in Section 7.2. OpenPGP signatures (Signature Type "PGP ") are 2410 discussed in Section 7.2.1. X.509v3 Suite B Signatures (Signature 2411 Type "X509") are discussed in Section 7.2.2. 2413 Other signature types may be defined in a future document. 2415 5.2. Hello message 2417 The Hello message has the format shown in Figure 3. 2419 All ZRTP messages begin with the preamble value 0x505a, then a 16 bit 2420 length in 32 bit words. This length includes only the ZRTP message 2421 (including the preamble and the length) but not the ZRTP packet 2422 header or CRC. The 8-octet Message Type follows the length field. 2424 Next is a 4 character string containing the version (ver) of the ZRTP 2425 protocol which is "1.10" for this specification. Next is the Client 2426 Identifier string (cid) which is 4 words long and identifies the 2427 vendor and release of the ZRTP software. The 256-bit hash image H3 2428 is defined in Section 9. The next parameter is the ZID, the 96 bit 2429 long unique identifier for the ZRTP endpoint, defined in Section 4.9. 2431 The next four bits are flag bits. The Signature-capable flag (S) 2432 indicates this Hello message is sent from a ZRTP endpoint which is 2433 able to parse and verify digital signatures, as described in 2434 Section 7.2. If signatures are not supported, the (S) flag MUST be 2435 set to zero. The MiTM flag (M) is a Boolean that is set to true if 2436 and only if this Hello message is sent from a device, usually a PBX, 2437 that has the capability to send an SASrelay message (Section 5.13). 2438 The Passive flag (P) is a Boolean normally set to False. A ZRTP 2439 endpoint which is configured to never initiate secure sessions is 2440 regarded as passive, and would set the P bit to True. The next 8 2441 bits are unused and SHOULD be set to zero when sent and MUST be 2442 ignored on receipt. 2444 Next is a list of supported Hash algorithms, Cipher algorithms, SRTP 2445 Auth Tag types, Key Agreement types, and SAS types. The number of 2446 listed algorithms are listed for each type: hc=hash count, cc=cipher 2447 count, ac=auth tag count, kc=key agreement count, and sc=sas count. 2448 The values for these algorithms are defined in Tables 2, 3, 4, 5, and 2449 6. A count of zero means that only the mandatory to implement 2450 algorithms are supported. Mandatory algorithms MAY be included in 2451 the list. The order of the list indicates the preferences of the 2452 endpoint. If a mandatory algorithm is not included in the list, it 2453 is added to the end of the list for preference. 2455 The 64-bit MAC at the end of the message is computed across the whole 2456 message, not including the MAC, using the MAC algorithm defined in 2457 Section 5.1.2.2. The MAC key is the sender's H2 (defined in 2458 Section 9), and thus the MAC cannot be checked by the receiving party 2459 until the sender's H2 value is known to the receiving party later in 2460 the protocol. 2462 0 1 2 3 2463 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2464 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2465 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length | 2466 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2467 | Message Type Block="Hello " (2 words) | 2468 | | 2469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2470 | version="1.10" (1 word) | 2471 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2472 | | 2473 | Client Identifier (4 words) | 2474 | | 2475 | | 2476 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2477 | | 2478 | Hash image H3 (8 words) | 2479 | . . . | 2480 | | 2481 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2482 | | 2483 | ZID (3 words) | 2484 | | 2485 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2486 |0|S|M|P| unused (zeros)| hc | cc | ac | kc | sc | 2487 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2488 | hash algorithms (0 to 7 values) | 2489 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2490 | cipher algorthms (0 to 7 values) | 2491 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2492 | auth tag types (0 to 7 values) | 2493 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2494 | key agreement types (0 to 7 values) | 2495 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2496 | SAS types (0 to 7 values) | 2497 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2498 | MAC (2 words) | 2499 | | 2500 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2502 Figure 3: Hello message format 2504 5.3. HelloACK message 2506 The HelloACK message is used to stop retransmissions of a Hello 2507 message. A HelloACK is sent regardless if the version number in the 2508 Hello is supported or the algorithm list supported. The receipt of a 2509 HelloACK stops retransmission of the Hello message. The format is 2510 shown in the Figure below. A Commit message may be sent in place of 2511 a HelloACK by an Initiator, if a Commit message is ready to be sent 2512 promptly. 2514 0 1 2 3 2515 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2516 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2517 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 2518 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2519 | Message Type Block="HelloACK" (2 words) | 2520 | | 2521 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2523 Figure 4: HelloACK message format 2525 5.4. Commit message 2527 The Commit message is sent to initiate the key agreement process 2528 after both sides have received a Hello message, which means it can 2529 only be sent after receiving both a Hello message and a HelloACK 2530 message. There are three subtypes of Commit messages, whose formats 2531 are shown in Figure 5, Figure 6, and Figure 7. 2533 The Commit message contains the Message Type Block, then the 256-bit 2534 hash image H2 which is defined in Section 9. The next parameter is 2535 the initiator's ZID, the 96 bit long unique identifier for the ZRTP 2536 endpoint, which MUST have the same value as was used in the Hello 2537 message. 2539 Next is a list of algorithms selected by the initiator (hash, cipher, 2540 auth tag type, key agreement, sas type). For a DH Commit, the hash 2541 value hvi is a hash of the DHPart2 of the Initiator and the 2542 Responder's Hello message, as explained in Section 4.4.1.1. 2544 The 64-bit MAC at the end of the message is computed across the whole 2545 message, not including the MAC, using the MAC algorithm defined in 2546 Section 5.1.2.2. The MAC key is the sender's H1 (defined in 2547 Section 9), and thus the MAC cannot be checked by the receiving party 2548 until the sender's H1 value is known to the receiving party later in 2549 the protocol. 2551 0 1 2 3 2552 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2553 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2554 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=29 words | 2555 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2556 | Message Type Block="Commit " (2 words) | 2557 | | 2558 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2559 | | 2560 | Hash image H2 (8 words) | 2561 | . . . | 2562 | | 2563 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2564 | | 2565 | ZID (3 words) | 2566 | | 2567 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2568 | hash algorithm | 2569 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2570 | cipher algorithm | 2571 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2572 | auth tag type | 2573 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2574 | key agreement type | 2575 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2576 | SAS type | 2577 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2578 | | 2579 | hvi (8 words) | 2580 | . . . | 2581 | | 2582 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2583 | MAC (2 words) | 2584 | | 2585 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2587 Figure 5: DH Commit message format 2589 0 1 2 3 2590 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2591 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2592 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=25 words | 2593 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2594 | Message Type Block="Commit " (2 words) | 2595 | | 2596 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2597 | | 2598 | Hash image H2 (8 words) | 2599 | . . . | 2600 | | 2601 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2602 | | 2603 | ZID (3 words) | 2604 | | 2605 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2606 | hash algorithm | 2607 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2608 | cipher algorithm | 2609 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2610 | auth tag type | 2611 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2612 | key agreement type = "Mult" | 2613 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2614 | SAS type | 2615 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2616 | | 2617 | nonce (4 words) | 2618 | . . . | 2619 | | 2620 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2621 | MAC (2 words) | 2622 | | 2623 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2625 Figure 6: Multistream Commit message format 2627 0 1 2 3 2628 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2629 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2630 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=27 words | 2631 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2632 | Message Type Block="Commit " (2 words) | 2633 | | 2634 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2635 | | 2636 | Hash image H2 (8 words) | 2637 | . . . | 2638 | | 2639 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2640 | | 2641 | ZID (3 words) | 2642 | | 2643 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2644 | hash algorithm | 2645 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2646 | cipher algorithm | 2647 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2648 | auth tag type | 2649 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2650 | key agreement type = "Prsh" | 2651 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2652 | SAS type | 2653 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2654 | | 2655 | nonce (4 words) | 2656 | . . . | 2657 | | 2658 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2659 | keyID (2 words) | 2660 | | 2661 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2662 | MAC (2 words) | 2663 | | 2664 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2666 Figure 7: Preshared Commit message format 2668 5.5. DHPart1 message 2670 The DHPart1 message shown in Figure 8 begins the DH exchange. It is 2671 sent by the Responder if a valid Commit message is received from the 2672 Initiator. The length of the pvr value and the length of the DHPart1 2673 message depends on the Key Agreement Type chosen. This information 2674 is contained in the table in Section 5.1.5. Note that for both 2675 Multistream and Preshared modes, no DHPart1 or DHPart2 message will 2676 be sent. 2678 The 256-bit hash image H1 is defined in Section 9. 2680 The next four parameters are non-invertible hashes (computed in 2681 Section 4.3.1) of potential shared secrets used in generating the 2682 ZRTP secret s0. The first two, rs1IDr and rs2IDr, are the hashes of 2683 the responder's two retained shared secrets, truncated to 64 bits. 2684 Next is auxsecretIDr, a hash of the responder's auxsecret (defined in 2685 Section 4.3), truncated to 64 bits. The last parameter is a hash of 2686 the trusted MiTM PBX shared secret pbxsecret, defined in 2687 Section 7.3.1. 2689 The 64-bit MAC at the end of the message is computed across the whole 2690 message, not including the MAC, using the MAC algorithm defined in 2691 Section 5.1.2.2. The MAC key is the sender's H0 (defined in 2692 Section 9), and thus the MAC cannot be checked by the receiving party 2693 until the sender's H0 value is known to the receiving party later in 2694 the protocol. 2696 0 1 2 3 2697 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2698 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2699 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on KA Type | 2700 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2701 | Message Type Block="DHPart1 " (2 words) | 2702 | | 2703 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2704 | | 2705 | Hash image H1 (8 words) | 2706 | . . . | 2707 | | 2708 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2709 | rs1IDr (2 words) | 2710 | | 2711 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2712 | rs2IDr (2 words) | 2713 | | 2714 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2715 | auxsecretIDr (2 words) | 2716 | | 2717 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2718 | pbxsecretIDr (2 words) | 2719 | | 2720 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2721 | | 2722 | pvr (length depends on KA Type) | 2723 | . . . | 2724 | | 2725 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2726 | MAC (2 words) | 2727 | | 2728 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2730 Figure 8: DHPart1 message format 2732 5.6. DHPart2 message 2734 The DHPart2 message shown in Figure 9 completes the DH exchange. It 2735 is sent by the Initiator if a valid DHPart1 message is received from 2736 the Responder. The length of the pvi value and the length of the 2737 DHPart2 message depends on the Key Agreement Type chosen. This 2738 information is contained in the table in Section 5.1.5. Note that 2739 for both Multistream and Preshared modes, no DHPart1 or DHPart2 2740 message will be sent. 2742 The 256-bit hash image H1 is defined in Section 9. 2744 The next four parameters are non-invertible hashes (computed in 2745 Section 4.3.1) of potential shared secrets used in generating the 2746 ZRTP secret s0. The first two, rs1IDi and rs2IDi, are the hashes of 2747 the initiator's two retained shared secrets, truncated to 64 bits. 2748 Next is auxsecretIDi, a hash of the initiator's auxsecret (defined in 2749 Section 4.3), truncated to 64 bits. The last parameter is a hash of 2750 the trusted MiTM PBX shared secret pbxsecret, defined in 2751 Section 7.3.1. 2753 The 64-bit MAC at the end of the message is computed across the whole 2754 message, not including the MAC, using the MAC algorithm defined in 2755 Section 5.1.2.2. The MAC key is the sender's H0 (defined in 2756 Section 9), and thus the MAC cannot be checked by the receiving party 2757 until the sender's H0 value is known to the receiving party later in 2758 the protocol. 2760 0 1 2 3 2761 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2762 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2763 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on KA Type | 2764 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2765 | Message Type Block="DHPart2 " (2 words) | 2766 | | 2767 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2768 | | 2769 | Hash image H1 (8 words) | 2770 | . . . | 2771 | | 2772 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2773 | rs1IDi (2 words) | 2774 | | 2775 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2776 | rs2IDi (2 words) | 2777 | | 2778 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2779 | auxsecretIDi (2 words) | 2780 | | 2781 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2782 | pbxsecretIDi (2 words) | 2783 | | 2784 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2785 | | 2786 | pvi (length depends on KA Type) | 2787 | . . . | 2788 | | 2789 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2790 | MAC (2 words) | 2791 | | 2792 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2794 Figure 9: DHPart2 message format 2796 5.7. Confirm1 and Confirm2 messages 2798 The Confirm1 message is sent by the Responder in response to a valid 2799 DHPart2 message after the SRTP session key and parameters have been 2800 negotiated. The Confirm2 message is sent by the Initiator in 2801 response to a Confirm1 message. The format is shown in Figure 10 2802 below. The message contains the Message Type Block "Confirm1" or 2803 "Confirm2". Next is the confirm_mac, a MAC computed over the 2804 encrypted part of the message (shown enclosed by "====" in 2805 Figure 10). This confirm_mac is keyed and computed according to 2806 Section 4.6. The next 16 octets contain the CFB Initialization 2807 Vector. The rest of the message is encrypted using CFB and protected 2808 by the confirm_mac. 2810 The first field inside the encrypted region is the hash preimage H0, 2811 which is defined in detail in Section 9. 2813 The next 15 bits are not used and SHOULD be set to zero when sent and 2814 MUST be ignored when received in Confirm1 or Confirm2 messages. 2816 The next 9 bits contain the signature length. If no SAS signature 2817 (described in Section 7.2) is present, all bits are set to zero. The 2818 signature length is in words and includes the signature type block. 2819 If the calculated signature octet count is not a multiple of 4, zeros 2820 are added to pad it out to a word boundary. If no signature is 2821 present, the overall length of the Confirm1 or Confirm2 Message will 2822 be set to 19 words. 2824 The next 8 bits are used for flags. Undefined flags are set to zero 2825 and ignored. Four flags are currently defined. The PBX Enrollment 2826 flag (E) is a Boolean bit defined in Section 7.3.1. The SAS Verified 2827 flag (V) is a Boolean bit defined in Section 7.1. The Allow Clear 2828 flag (A) is a Boolean bit defined in Section 4.7.2. The Disclosure 2829 Flag (D) is a Boolean bit defined in Section 11. The cache 2830 expiration interval is defined in Section 4.9. 2832 If the signature length (in words) is non-zero, a signature type 2833 block will be present along with a signature block. Next is the 2834 signature block. The signature block includes the signature and the 2835 key (or a link to the key) used to generate the signature 2836 (Section 7.2). 2838 CFB mode [SP800-38A] is applied with a feedback length of 128-bits, a 2839 full cipher block, and the final block is truncated to match the 2840 exact length of the encrypted data. The CFB Initialization Vector is 2841 a 128 bit random nonce. The block cipher algorithm and the key size 2842 is the same as what was negotiated for the media encryption. CFB is 2843 used to encrypt the part of the Confirm1 message beginning after the 2844 CFB IV to the end of the message (the encrypted region is enclosed by 2845 "====" in Figure 10). 2847 The responder uses the zrtpkeyr to encrypt the Confirm1 message. The 2848 initiator uses the zrtpkeyi to encrypt the Confirm2 message. 2850 0 1 2 3 2851 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2852 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2853 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=variable | 2854 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2855 | Message Type Block="Confirm1" or "Confirm2" (2 words) | 2856 | | 2857 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2858 | confirm_mac (2 words) | 2859 | | 2860 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2861 | | 2862 | CFB Initialization Vector (4 words) | 2863 | | 2864 | | 2865 +===============================================================+ 2866 | | 2867 | Hash preimage H0 (8 words) | 2868 | . . . | 2869 | | 2870 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2871 | Unused (15 bits of zeros) | sig len (9 bits)|0 0 0 0|E|V|A|D| 2872 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2873 | cache expiration interval (1 word) | 2874 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2875 | optional signature type block (1 word if present) | 2876 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2877 | | 2878 | optional signature block (variable length) | 2879 | . . . | 2880 | | 2881 | | 2882 +===============================================================+ 2884 Figure 10: Confirm1 and Confirm2 message format 2886 5.8. Conf2ACK message 2888 The Conf2ACK message is sent by the Responder in response to a valid 2889 Confirm2 message. The message format for the Conf2ACK is shown in 2890 the Figure below. The receipt of a Conf2ACK stops retransmission of 2891 the Confirm2 message. Note that the first SRTP media (with a valid 2892 SRTP auth tag) from the responder also stops retransmission of the 2893 Confirm2 message. 2895 0 1 2 3 2896 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2897 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2898 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 2899 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2900 | Message Type Block="Conf2ACK" (2 words) | 2901 | | 2902 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2904 Figure 11: Conf2ACK message format 2906 5.9. Error message 2908 The Error message is sent to terminate an in-process ZRTP key 2909 agreement exchange due to an error. The format is shown in the 2910 Figure below. The use of the Error message is described in 2911 Section 4.7.1. 2913 0 1 2 3 2914 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2915 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2916 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=4 words | 2917 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2918 | Message Type Block="Error " (2 words) | 2919 | | 2920 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2921 | Integer Error Code (1 word) | 2922 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2924 Figure 12: Error message format 2926 Defined hexadecimal values for the Error Code are listed in the table 2927 below. 2929 Error Code | Meaning 2930 ----------------------------------------------------------- 2931 0x10 | Malformed packet (CRC OK, but wrong structure) 2932 ----------------------------------------------------------- 2933 0x20 | Critical software error 2934 ----------------------------------------------------------- 2935 0x30 | Unsupported ZRTP version 2936 ----------------------------------------------------------- 2937 0x40 | Hello components mismatch 2938 ----------------------------------------------------------- 2939 0x51 | Hash type not supported 2940 ----------------------------------------------------------- 2941 0x52 | Cipher type not supported 2942 ----------------------------------------------------------- 2943 0x53 | Public key exchange not supported 2944 ----------------------------------------------------------- 2945 0x54 | SRTP auth. tag not supported 2946 ----------------------------------------------------------- 2947 0x55 | SAS rendering scheme not supported 2948 ----------------------------------------------------------- 2949 0x56 | No shared secret available, DH mode required 2950 ----------------------------------------------------------- 2951 0x61 | DH Error: bad pvi or pvr ( == 1, 0, or p-1) 2952 ----------------------------------------------------------- 2953 0x62 | DH Error: hvi != hashed data 2954 ----------------------------------------------------------- 2955 0x63 | Received relayed SAS from untrusted MiTM 2956 ----------------------------------------------------------- 2957 0x70 | Auth. Error: Bad Confirm pkt MAC 2958 ----------------------------------------------------------- 2959 0x80 | Nonce reuse 2960 ----------------------------------------------------------- 2961 0x90 | Equal ZIDs in Hello 2962 ----------------------------------------------------------- 2963 0x91 | SSRC collision 2964 ----------------------------------------------------------- 2965 0xA0 | Service unavailable 2966 ----------------------------------------------------------- 2967 0xB0 | Protocol timeout error 2968 ----------------------------------------------------------- 2969 0x100 | GoClear message received, but not allowed 2970 ----------------------------------------------------------- 2972 Table 8. ZRTP Error Codes 2974 5.10. ErrorACK message 2976 The ErrorACK message is sent in response to an Error message. The 2977 receipt of an ErrorACK stops retransmission of the Error message. 2978 The format is shown in the Figure below. 2980 0 1 2 3 2981 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2982 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2983 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 2984 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2985 | Message Type Block="ErrorACK" (2 words) | 2986 | | 2987 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2989 Figure 13: ErrorACK message format 2991 5.11. GoClear message 2993 Support for the GoClear message is OPTIONAL in the protocol, and it 2994 is sent to switch from SRTP to RTP. The format is shown in the 2995 Figure below. The clear_mac is used to authenticate the GoClear 2996 message so that bogus GoClear messages introduced by an attacker can 2997 be detected and discarded. The use of GoClear is described in 2998 Section 4.7.2. 3000 0 1 2 3 3001 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3002 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3003 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=5 words | 3004 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3005 | Message Type Block="GoClear " (2 words) | 3006 | | 3007 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3008 | clear_mac (2 words) | 3009 | | 3010 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3012 Figure 14: GoClear message format 3014 5.12. ClearACK message 3016 Support for the ClearACK message is OPTIONAL in the protocol, and it 3017 is sent to acknowledge receipt of a GoClear. A ClearACK is only sent 3018 if the clear_mac from the GoClear message is authenticated. 3019 Otherwise, no response is returned. The format is shown in the 3020 Figure below. 3022 0 1 2 3 3023 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3024 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3025 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 3026 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3027 | Message Type Block="ClearACK" (2 words) | 3028 | | 3029 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3031 Figure 15: ClearACK message format 3033 5.13. SASrelay message 3035 The SASrelay message is sent by a trusted Man in The Middle (MiTM), 3036 most often a PBX. It is not sent as a response to a packet, but is 3037 sent as a self-initiated packet by the trusted MiTM. It can only be 3038 sent after the rest of the ZRTP key negotiations have completed, 3039 after the Confirm messages and their ACKs. It can only be sent after 3040 the trusted MiTM has finished key negotiations with the other party, 3041 because it is the other party's SAS that is being relayed. It is 3042 sent with retry logic until a RelayACK message (Section 5.14) is 3043 received or the retry schedule has been exhausted. 3045 If a device, usually a PBX, sends an SASrelay message, it MUST have 3046 previously declared itself as a MiTM device by setting the MiTM (M) 3047 flag in the Hello message (Section 5.2). If the receiver of the 3048 SASrelay message did not previously receive a Hello message with the 3049 MiTM (M) flag set, the Relayed SAS SHOULD NOT be rendered. A 3050 RelayACK is still sent, but no Error message is sent. 3052 The SASrelay message format is shown in Figure 16 below. The message 3053 contains the Message Type Block "SASrelay". Next is a MAC computed 3054 over the encrypted part of the message (shown enclosed by "====" in 3055 Figure 16). This MAC is keyed the same way as the confirm_mac in the 3056 Confirm messages (see Section 4.6). The next 16 octets contain the 3057 CFB Initialization Vector. The rest of the message is encrypted 3058 using CFB and protected by the MAC. 3060 The next 15 bits are not used and SHOULD be set to zero when sent and 3061 MUST be ignored when received in SASrelay messages. 3063 The next 9 bits contain the signature length. The trusted MiTM MAY 3064 compute a digital signature on the SAS hash, as described in 3065 Section 7.2, using a persistent signing key owned by the trusted 3066 MiTM. If no SAS signature is present, all bits are set to zero. The 3067 signature length is in words and includes the signature type block. 3068 If the calculated signature octet count is not a multiple of 4, zeros 3069 are added to pad it out to a word boundary. If no signature block is 3070 present, the overall length of the SASrelay Message will be set to 19 3071 words. 3073 The next 8 bits are used for flags. Undefined flags are set to zero 3074 and ignored. Three flags are currently defined. The Disclosure Flag 3075 (D) is a Boolean bit defined in Section 11. The Allow Clear flag (A) 3076 is a Boolean bit defined in Section 4.7.2. The SAS Verified flag (V) 3077 is a Boolean bit defined in Section 7.1. These flags are updated 3078 values to the same flags provided earlier in the Confirm message, but 3079 they are updated to reflect the new flag information relayed by the 3080 PBX from the other party. 3082 The next 32 bit word contains the SAS rendering scheme for the 3083 relayed sashash, which will be the same rendering scheme used by the 3084 other party on the other side of the trusted MiTM. Section 7.3 3085 describes how the PBX determines whether the ZRTP client regards the 3086 PBX as a trusted MiTM. If the PBX determines that the ZRTP client 3087 trusts the PBX, the next 8 words contain the sashash relayed from the 3088 other party. The first 32-bit word of the sashash contains the 3089 sasvalue, which may be rendered to the user using the specified SAS 3090 rendering scheme. If this SASrelay message is being sent to a ZRTP 3091 client that does not trust this MiTM, the sashash will be ignored by 3092 the recipient and should be set to zeros by the PBX. 3094 If the signature length (in words) is non-zero, a signature type 3095 block will be present along with a signature block. Next is the 3096 signature block. The signature block includes the signature and the 3097 key (or a link to the key) used to generate the signature 3098 (Section 7.2). 3100 CFB mode [SP800-38A] is applied with a feedback length of 128-bits, a 3101 full cipher block, and the final block is truncated to match the 3102 exact length of the encrypted data. The CFB Initialization Vector is 3103 a 128 bit random nonce. The block cipher algorithm and the key size 3104 is the same as what was negotiated for the media encryption. CFB is 3105 used to encrypt the part of the SASrelay message beginning after the 3106 CFB IV to the end of the message (the encrypted region is enclosed by 3107 "====" in Figure 16). 3109 Depending on whether the trusted MiTM had taken the role of the 3110 initiator or the responder during the ZRTP key negotiation, the 3111 SASrelay message is encrypted with zrtpkeyi or zrtpkeyr. 3113 0 1 2 3 3114 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3115 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3116 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=variable | 3117 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3118 | Message Type Block="SASrelay" (2 words) | 3119 | | 3120 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3121 | MAC (2 words) | 3122 | | 3123 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3124 | | 3125 | CFB Initialization Vector (4 words) | 3126 | | 3127 | | 3128 +===============================================================+ 3129 | Unused (15 bits of zeros) | sig len (9 bits)|0 0 0 0|0|V|A|D| 3130 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3131 | rendering scheme of relayed SAS (1 word) | 3132 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3133 | | 3134 | Trusted MiTM relayed sashash (8 words) | 3135 | . . . | 3136 | | 3137 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3138 | optional signature type block (1 word if present) | 3139 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3140 | | 3141 | optional signature block (variable length) | 3142 | . . . | 3143 | | 3144 | | 3145 +===============================================================+ 3147 Figure 16: SASrelay message format 3149 5.14. RelayACK message 3151 The RelayACK message is sent in response to a valid SASrelay message. 3152 The message format for the RelayACK is shown in the Figure below. 3153 The receipt of a RelayACK stops retransmission of the SASrelay 3154 message. 3156 0 1 2 3 3157 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3158 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3159 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | 3160 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3161 | Message Type Block="RelayACK" (2 words) | 3162 | | 3163 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3165 Figure 17: RelayACK message format 3167 5.15. Ping message 3169 The Ping and PingACK messages are unrelated to the rest of the ZRTP 3170 protocol. No ZRTP endpoint is required to generate a Ping message, 3171 but every ZRTP endpoint MUST respond to a Ping message with a PingACK 3172 message. 3174 Although Ping and PingACK messages have no effect on the rest of the 3175 ZRTP protocol, their inclusion in this specification simplifies the 3176 design of "bump-in-the-wire" ZRTP proxies (Section 10) (notably, 3177 Zfone [zfone]). It enables proxies to be designed that do not rely 3178 on assistance from the signaling layer to map out the associations 3179 between media streams and ZRTP endpoints. 3181 Before sending a ZRTP Hello message, a ZRTP proxy MAY send a Ping 3182 message as a means to sort out which RTP media streams are connected 3183 to particular ZRTP endpoints. Ping messages are generated only by 3184 ZRTP proxies. If neither party is a ZRTP proxy, no Ping messages 3185 will be encountered. Ping retransmission behavior is discussed in 3186 Section 6. 3188 The Ping message (Figure 18) contains an "EndpointHash", defined in 3189 Section 5.16. 3191 The Ping message contains a version number that defines what version 3192 of PingACK is requested. If that version number is supported by the 3193 Ping responder, a PingACK with a format that matches that version 3194 will be received. Otherwise, a PingACK with a lower version number 3195 may be received. 3197 0 1 2 3 3198 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3199 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3200 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=6 words | 3201 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3202 | Message Type Block="Ping " (2 words) | 3203 | | 3204 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3205 | version="1.10" (1 word) | 3206 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3207 | EndpointHash (2 words) | 3208 | | 3209 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3211 Figure 18: Ping message format 3213 5.16. PingACK message 3215 A PingACK message is sent only in response to a Ping. A ZRTP 3216 endpoint MUST respond to a Ping with a PingACK message. The version 3217 of PingACK requested is contained in the Ping message. If that 3218 version number is supported, a PingACK with a format that matches 3219 that version MUST be sent. Otherwise, if the version number of the 3220 Ping is not supported, a PingACK SHOULD be sent in the format of the 3221 highest supported version known to the Ping responder. Only version 3222 "1.10" is supported in this specification. 3224 The PingACK message carries its own 64-bit EndpointHash, distinct 3225 from the EndpointHash of the other party's Ping message. It is 3226 REQUIRED that it be highly improbable for two participants in a call 3227 to have the same EndpointHash, and that an EndpointHash maintains a 3228 persistent value between calls. For a normal ZRTP endpoint such as a 3229 ZRTP-enabled VoIP client, the EndpointHash can be just the truncated 3230 ZID. For a ZRTP endpoint such as a PBX that has multiple endpoints 3231 behind it, the EndpointHash must be a distinct value for each 3232 endpoint behind it. It is recommended that the EndpointHash be a 3233 truncated hash of the ZID of the ZRTP endpoint concatenated with 3234 something unique about the actual endpoint or phone behind the PBX. 3235 This may be the SIP URI of the phone, the PBX extension number, or 3236 the local IP address of the phone, whichever is more readily 3237 available in the application environment: 3239 o EndpointHash = hash(ZID || SIP URI of the endpoint) 3240 o EndpointHash = hash(ZID || PBX extension number of the endpoint) 3241 o EndpointHash = hash(ZID || local IP address of the endpoint) 3243 Any of these formulae confers uniqueness for the simple case of 3244 terminating the ZRTP connection at the VoIP client, or the more 3245 complex case of a PBX terminating the ZRTP connection for multiple 3246 VoIP phones in a conference call, all sharing the PBX's ZID, but with 3247 separate IP addresses behind the PBX. There is no requirement for 3248 the same hash function to be used by both parties. 3250 The PingACK message contains the EndpointHash of the sender of the 3251 PingACK as well as the EndpointHash of the sender of the Ping. The 3252 Source Identifier (SSRC) received in the ZRTP header from the Ping 3253 packet (Figure 2) is copied into the PingACK message body 3254 (Figure 19). This SSRC is not the SSRC of the sender of the PingACK. 3256 0 1 2 3 3257 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3258 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3259 |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=9 words | 3260 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3261 | Message Type Block="PingACK " (2 words) | 3262 | | 3263 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3264 | version="1.10" (1 word) | 3265 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3266 | EndpointHash of PingACK Sender (2 words) | 3267 | | 3268 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3269 | EndpointHash of Received Ping (2 words) | 3270 | | 3271 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3272 | Source Identifier (SSRC) of Received Ping (1 word) | 3273 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3275 Figure 19: PingACK message format 3277 6. Retransmissions 3279 ZRTP uses two retransmission timers T1 and T2. T1 is used for 3280 retransmission of Hello messages, when the support of ZRTP by the 3281 other endpoint may not be known. T2 is used in retransmissions of 3282 all the other ZRTP messages. 3284 All message retransmissions MUST be identical to the initial message 3285 including nonces, public values, etc; otherwise, hashes of the 3286 message sequences may not agree. 3288 Practical experience has shown that RTP packet loss at the start of 3289 an RTP session can be extremely high. Since the entire ZRTP message 3290 exchange occurs during this period, the defined retransmission scheme 3291 is defined to be aggressive. Since ZRTP packets with the exception 3292 of the DHPart1 and DHPart2 messages are small, this should have 3293 minimal effect on overall bandwidth utilization of the media session. 3295 ZRTP endpoints MUST NOT exceed the bandwidth of the resulting media 3296 session as determined by the offer/answer exchange in the signaling 3297 layer. 3299 The Ping message (Section 5.15) may follow the same retransmission 3300 schedule as the Hello message, but this is not required in this 3301 specification. Ping message retransmission is subject to 3302 application-specific ZRTP proxy heuristics. 3304 Hello ZRTP messages are retransmitted at an interval that starts at 3305 T1 seconds and doubles after every retransmission, capping at 200ms. 3306 T1 has a recommended initial value of 50 ms. A Hello message is 3307 retransmitted 20 times before giving up, which means the entire retry 3308 schedule for Hello messages is exhausted after 3.75 seconds (50 + 100 3309 + 18*200 ms). Retransmission of a Hello ends upon receipt of a 3310 HelloACK or Commit message. 3312 The post-Hello ZRTP messages are retransmitted only by the session 3313 initiator - that is, only Commit, DHPart2, and Confirm2 are 3314 retransmitted if the corresponding message from the responder, 3315 DHPart1, Confirm1, and Conf2ACK, are not received. Note that the 3316 Confirm2 message retransmission can also be stopped by receiving the 3317 first SRTP media (with a valid SRTP auth tag) from the responder. 3319 The GoClear, Error, and SASrelay messages may be initiated and 3320 retransmitted by either party, and responded to by the other party, 3321 regardless of which party is the overall session initiator. They are 3322 retransmitted if the corresponding response message ClearACK, 3323 ErrorACK, and RelayACK, are not received. 3325 Non-Hello (and non-Ping) ZRTP messages are retransmitted at an 3326 interval that starts at T2 seconds and doubles after every 3327 retransmission, capping at 1200ms. T2 has a recommended initial 3328 value of 150 ms. Each non-Hello message is retransmitted 10 times 3329 before giving up, which means the entire retry schedule is exhausted 3330 after 9.45 seconds (150 + 300 + 600 + 7*1200 ms). Only the initiator 3331 performs retransmissions. Each message has a response message that 3332 stops retransmissions, as shown in the table below. The higher 3333 values of T2 means that retransmissions will likely occur only in the 3334 event of packet loss. 3336 Message Acknowledgement Message 3337 ------- ----------------------- 3338 Hello HelloACK or Commit 3339 Commit DHPart1 or Confirm1 3340 DHPart2 Confirm1 3341 Confirm2 Conf2ACK or SRTP media 3342 GoClear ClearACK 3343 Error ErrorACK 3344 SASrelay RelayACK 3345 Ping PingACK 3347 Table 9. Retransmitted ZRTP Messages and Responses 3349 The retry schedule must handle not only packet loss, but also slow or 3350 heavily loaded peers that need additional time to perform their DH 3351 calculations. The following mitigations are recommended: 3353 o Slow or heavily loaded ZRTP endpoints that are at risk of taking 3354 too long to perform their DH calculation SHOULD use a HelloACK 3355 message instead of a Commit message to reply to a Hello from the 3356 other party. 3357 o If a ZRTP endpoint has evidence that the other party is a ZRTP 3358 endpoint, by receiving a Hello message or Ping message, it SHOULD 3359 extend its own Hello retry schedule to span at least 12 seconds of 3360 retries. If this extended Hello retry schedule is exhausted 3361 without receiving a HelloACK or Commit message, a late Commit 3362 message from the peer SHOULD still be accepted. 3364 These recommended retransmission intervals are designed for a typical 3365 broadband Internet connection. In some high latency communication 3366 channels, such as those provided by some mobile phone environments or 3367 geostationary satellites, a different retransmission schedule may be 3368 used. The initial value for the T1 or T2 retransmission timer should 3369 be increased to be no less than the round trip time provided by the 3370 communications channel. It should take into account the time 3371 required to transmit the entire message and the entire reply, as well 3372 as a reasonable time estimate to perform the DH calculation. 3374 ZRTP has its own retransmission schedule because it is carried along 3375 with RTP, usually over UDP. In unusual cases, RTP can run over a 3376 non-UDP transport, such as TCP or DCCP, which provides its own 3377 built-in retransmission mechanism. It may be hard for the ZRTP 3378 endpoint to detect that TCP is being used if media relays are 3379 involved. The ZRTP endpoint may be sending only UDP, but there may 3380 be a media relay along the media path that converts from UDP to TCP 3381 for part of the journey. Or, if the ZRTP endpoint is sending TCP, 3382 the media relay might be converting from TCP to UDP. There have been 3383 empirical observations of this in the wild. In cases where TCP is 3384 used, ZRTP and TCP might together generate some extra 3385 retransmissions. It is tempting to avoid this effect by eliminating 3386 the ZRTP retransmission schedule when connected to a TCP channel, but 3387 that would risk failure of the protocol, because it may not be TCP 3388 all the way to the remote ZRTP endpoint. It only takes a few packets 3389 to complete a ZRTP exchange, so trying to optimize out the extra 3390 retransmissions in that scenario is not worth the risk. 3392 After receiving a Commit message, but before receiving a Confirm2 3393 message, if a ZRTP responder receives no ZRTP messages for more than 3394 10 seconds, the responder MAY send a protocol timeout Error message 3395 and terminate the ZRTP protocol. 3397 7. Short Authentication String 3399 This section will discuss the implementation of the Short 3400 Authentication String, or SAS in ZRTP. The SAS can be verbally 3401 compared by the human users reading the string aloud, or by 3402 validating an OPTIONAL digital signature (described in Section 7.2) 3403 exchanged in the Confirm1 or Confirm2 messages. 3405 The use of hash commitment in the DH exchange (Section 4.4.1.1) 3406 constrains the attacker to only one guess to generate the correct SAS 3407 in his attack, which means the SAS can be quite short. A 16-bit SAS, 3408 for example, provides the attacker only one chance out of 65536 of 3409 not being detected. 3411 There is only one SAS value computed per call. That is the SAS value 3412 for the first media stream established, which is calculated in 3413 Section 4.5.2. This SAS applies to all media streams for the same 3414 session. 3416 The SAS SHOULD be rendered to the user for authentication. The 3417 rendering of the SAS value through the user interface at both 3418 endpoints depends on the SAS Type agreed upon in the Commit message. 3419 See Section 5.1.6 for a description of how the SAS is rendered to the 3420 user. 3422 The SAS is not treated as a secret value, but it must be compared to 3423 see if it matches at both ends of the communications channel. The 3424 two users verbally compare it using their human voices, human ears, 3425 and human judgement. If it doesn't match, it indicates the presence 3426 of a man-in-the-middle (MiTM) attack. 3428 7.1. SAS Verified Flag 3430 The SAS Verified flag (V) is set based on the user indicating that 3431 SAS comparison has been successfully performed. The SAS Verified 3432 flag is exchanged securely in the Confirm1 and Confirm2 messages 3433 (Figure 10) of the next session. In other words, each party sends 3434 the SAS Verified flag from the previous session in the Confirm 3435 message of the current session. It is perfectly reasonable to have a 3436 ZRTP endpoint that never sets the SAS Verified flag, because it would 3437 require adding complexity to the user interface to allow the user to 3438 set it. The SAS Verified flag is not required to be set, but if it 3439 is available to the client software, it allows for the possibility 3440 that the client software could render to the user that the SAS verify 3441 procedure was carried out in a previous session. 3443 Regardless of whether there is a user interface element to allow the 3444 user to set the SAS Verified flag, it is worth caching a shared 3445 secret, because doing so reduces opportunities for an attacker in the 3446 next call. 3448 If at any time the users carry out the SAS comparison procedure, and 3449 it actually fails to match, then this means there is a very 3450 resourceful man-in-the-middle. If this is the first call, the MiTM 3451 was there on the first call, which is impressive enough. If it 3452 happens in a later call, it also means the MiTM must also know the 3453 cached shared secret, because you could not have carried out any 3454 voice traffic at all unless the session key was correctly computed 3455 and is also known to the attacker. This implies the MiTM must have 3456 been present in all the previous sessions, since the initial 3457 establishment of the first shared secret. This is indeed a 3458 resourceful attacker. It also means that if at any time he ceases 3459 his participation as a MiTM on one of your calls, the protocol will 3460 detect that the cached shared secret is no longer valid -- because it 3461 was really two different shared secrets all along, one of them 3462 between Alice and the attacker, and the other between the attacker 3463 and Bob. The continuity of the cached shared secrets make it possible 3464 for us to detect the MiTM when he inserts himself into the ongoing 3465 relationship, as well as when he leaves. Also, if the attacker tries 3466 to stay with a long lineage of calls, but fails to execute a DH MiTM 3467 attack for even one missed call, he is permanently excluded. He can 3468 no longer resynchronize with the chain of cached shared secrets. 3470 A user interface element (i.e. a checkbox or button) is needed to 3471 allow the user to tell the software the SAS verify was successful, 3472 causing the software to set the SAS Verified flag (V), which 3473 (together with our cached shared secret) obviates the need to perform 3474 the SAS procedure in the next call. An additional user interface 3475 element can be provided to let the user tell the software he detected 3476 an actual SAS mismatch, which indicates a MiTM attack. The software 3477 can then take appropriate action, clearing the SAS Verified flag, and 3478 erase the cached shared secret from this session. It is up to the 3479 implementer to decide if this added user interface complexity is 3480 warranted. 3482 If the SAS matches, it means there is no MiTM, which also implies it 3483 is now safe to trust a cached shared secret for later calls. If 3484 inattentive users don't bother to check the SAS, it means we don't 3485 know whether there is or is not a MiTM, so even if we do establish a 3486 new cached shared secret, there is a risk that our potential attacker 3487 may have a subsequent opportunity to continue inserting himself in 3488 the call, until we finally get around to checking the SAS. If the 3489 SAS matches, it means no attacker was present for any previous 3490 session since we started propagating cached shared secrets, because 3491 this session and all the previous sessions were also authenticated 3492 with a continuous lineage of shared secrets. 3494 7.2. Signing the SAS 3496 In most applications it is desirable to avoid the added complexity of 3497 a PKI-backed digital signature, which is why ZRTP is designed not to 3498 require it. Nonetheless, in some applications, it may be hard to 3499 arrange for two human users to verbally compare the SAS. Or an 3500 application may already be using an existing PKI and wants to use it 3501 to augment ZRTP. 3503 To handle these cases, ZRTP allows for an OPTIONAL signature feature, 3504 which allows the SAS to be checked without human participation. The 3505 SAS MAY be signed and the signature sent inside the Confirm1, 3506 Confirm2 (Figure 10), or SASrelay (Figure 16) messages. The 3507 signature type (Section 5.1.7), length of the signature and the key 3508 used to create the signature (or a link to it) are all sent along 3509 with the signature. The signature is calculated across the entire 3510 SAS hash result (sashash), from which the sasvalue was derived. The 3511 signatures exchanged in the encrypted Confirm1, Confirm2, or SASrelay 3512 messages MAY be used to authenticate the ZRTP exchange. A signature 3513 may be sent only in the initial media stream in a DH or ECDH ZRTP 3514 exchange, not in multistream mode. 3516 Although the signature is sent, the material that is signed, the 3517 sashash, is not sent with it in the Confirm message, since both 3518 parties have already independently calculated the sashash. That is 3519 not the case for the SASrelay message, which must relay the sashash. 3521 To avoid unnecessary signature calculations, a signature SHOULD NOT 3522 be sent if the other ZRTP endpoint did not set the (S) flag in the 3523 Hello message (Section 5.2). 3525 Note that the choice of hash algorithm used in the digital signature 3526 is independent of the hash used in the sashash. The sashash is 3527 determined by the negotiated Hash Type (Section 5.1.2), while the 3528 hash used by the digital signature is separately defined by the 3529 digital signature algorithm. For example, the sashash may be based 3530 on SHA-256, while the digital signature might use SHA-384, if an 3531 ECDSA P-384 key is used. 3533 If the ZRTP key exchange is ECDH, and the SAS is signed, then the 3534 signature SHOULD be ECDSA, using the same size curve as the ECDH 3535 exchange. NSA Suite B ECDSA algorithms may be used with either 3536 OpenPGP-formatted keys, or X.509v3 certificates. 3538 If a ZRTP endpoint supports incoming signatures (evidenced by setting 3539 the (S) flag in the Hello message), it MUST be able to parse 3540 signatures from the other endpoint in both formats (OpenPGP and 3541 X.509v3). If the incoming signature is in an unsupported format, the 3542 ZRTP user agent SHOULD inform the user that the connection is not 3543 known to be authenticated by a signature. The informed user may 3544 elect to proceed with the call at his discretion. 3546 ECDSA has a feature that allows most of the signature calculation to 3547 be done in advance of the session, reducing latency during call 3548 setup. This is useful for low power mobile handsets. 3550 ECDSA is also preferred because it has compact keys and signatures. 3551 If the signature along with its public key certificate are 3552 insufficiently compact, the Confirm message may become too long for 3553 the maximum transmission unit (MTU) size, and UDP fragmenation may 3554 result. Some firewalls and NATs may discard fragmented UDP packets, 3555 which would cause the ZRTP exchange to fail. It is RECOMMENDED that 3556 a ZRTP endpoint avoid sending signatures if they would cause UDP 3557 fragmentation. For a discussion on MTU size and PMTU discovery, see 3558 [RFC1191] and [RFC1981]. 3560 7.2.1. OpenPGP Signatures 3562 If the SAS Signature Type (Section 5.1.7) specifies an OpenPGP 3563 signature ("PGP "), the signature-related fields are arranged as 3564 follows. 3566 The first field after the 4-octet Signature Type Block is the OpenPGP 3567 signature. The format of this signature and the algorithms that 3568 create it are specified by [RFC4880] or [I-D.jivsov-openpgp-ecc]. 3569 The signature is comprised of a complete OpenPGP version 4 signature 3570 in binary form (not Radix-64), as specified in RFC 4880, section 3571 5.2.3, enclosed in the full OpenPGP packet syntax. The length of the 3572 OpenPGP signature is parseable from the signature, and depends on the 3573 type and length of the signing key. 3575 If OpenPGP signatures are supported, an implementation SHOULD NOT 3576 generate signatures using any other signature algorithm except DSA or 3577 ECDSA, but MAY accept other signature types from the other party. 3578 DSA signatures with keys shorter than 2048 bits or longer than 3072 3579 bits MUST NOT be generated. An implementation MUST use only NIST- 3580 approved hash algorithms in signatures, and MUST NOT use SHA1 in the 3581 signature. NIST-approved hash algorithms are found in [FIPS-180-3] 3582 or its SHA-3 successor. ECDSA OpenPGP signatures are specified in 3583 [I-D.jivsov-openpgp-ecc]. Signatures with ECDSA keys larger than 3584 P-384 or smaller than P-224 SHOULD NOT be generated. 3586 RFC 4880 section 5.2.3.18 specifies a way to embed, in an OpenPGP 3587 signature, a URI of the preferred key server. The URI should be 3588 fully specified to obtain the public key of the signing key that 3589 created the signature. This URI MUST be present. 3591 It is up to the recipient of the signature to obtain the public key 3592 of the signing key and determine its validity status using the 3593 OpenPGP trust model discussed in [RFC4880]. 3595 The contents of Figure 20 lie inside the encrypted region of the 3596 Confirm message (Figure 10) or the SASrelay message (Figure 16). 3598 The total length of all the material in Figure 20, including the key 3599 server URI, must not exceed 511 32-bit words (2044 octets). This 3600 length, in words, is stored in the signature length field in the 3601 Confirm or SASrelay message containing the signature. It is 3602 desirable to avoid UDP fragmentation, so the URI should be kept 3603 short. 3605 0 1 2 3 3606 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3607 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3608 | Signature Type Block = "PGP " (1 word) | 3609 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3610 | | 3611 | OpenPGP signature | 3612 | (variable length) | 3613 | . . . | 3614 | | 3615 +===============================================================+ 3617 Figure 20: OpenPGP Signature format 3619 7.2.2. NSA Suite B Signatures with X.509v3 Certs 3621 If the SAS Signature Type (Section 5.1.7) is "X509", the NSA Suite B 3622 signature-related fields are arranged as follows. 3624 The first field after the 4-octet Signature Type Block is the DER 3625 encoded X.509v3 certificate (the signed public key) of the ECDSA 3626 signing key that created the signature. The format of this 3627 certificate is specified by the NSA's Suite B Certificate and CRL 3628 Profile [RFC5759]. 3630 Following the X.509v3 certificate at the next word boundary is the 3631 ECDSA signature itself. The size of this field depends on the size 3632 and type of the public key in the aforementioned certificate. The 3633 format of this signature and the algorithms that create it are 3634 specified by [FIPS-186-3]. The signature is comprised of the ECDSA 3635 signature output parameters (r, s) in binary form, concatenated, in 3636 network byte order, with no truncation of leading zeros. The first 3637 half of the signature is r and the second half is s. If ECDSA P-256 3638 is specified, the signature fills 16 words (64 octets), 32 octets 3639 each for r and s. If ECDSA P-384 is specified, the signature fills 3640 24 words (96 octets), 48 octets each for r and s. 3642 It is up to the recipient of the signature to use information in the 3643 certificate and path discovery mechanisms to trace the chain back to 3644 the root CA. It is recommended that end user certificates issued for 3645 secure telephony should contain appropriate path discovery links to 3646 facilitate this. 3648 Figure 21 shows a certificate and an NSA Suite B ECDSA signature. 3649 All this material lies inside the encrypted region of the Confirm 3650 message (Figure 10) or the SASrelay message (Figure 16). 3652 The total length of all the material in Figure 21, including the 3653 X.509v3 certificate, must not exceed 511 32-bit words (2044 octets). 3654 This length, in words, is stored in the signature length field in the 3655 Confirm or SASrelay message containing the signature. It is 3656 desirable to avoid UDP fragmentation, so the certificate material 3657 should be kept to a much smaller size than this. End user certs 3658 issued for this purpose should minimize the size of extraneous 3659 material such as legal notices. 3661 0 1 2 3 3662 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3663 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3664 | Signature Type Block = "X509" (1 word) | 3665 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3666 | | 3667 | Signing key's X.509v3 certificate | 3668 | (variable length) | 3669 | . . . | 3670 | | 3671 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3672 | | 3673 | ECDSA P-256 or P-384 signature | 3674 | (16 words or 24 words) | 3675 | . . . | 3676 | | 3677 +===============================================================+ 3679 Figure 21: X.509v3 NSA Suite B Signature format 3681 7.2.3. Signing the SAS without a PKI 3683 It's not strictly necessary to use a PKI to back the public key that 3684 signs the SAS. For example, it is possible to use a self-signed 3685 X.509v3 certificate, or an OpenPGP key that is not signed by any 3686 other key. In this scenario, the same key continuity technique used 3687 by SSH [RFC4251] may be used. The public key is cached locally the 3688 first time it is encountered, and when the same public key is 3689 encountered again in subsequent sessions, it's deemed not to be a 3690 MiTM attack. If there is no MiTM attack in the first session, there 3691 cannot be a MiTM attack in any subsequent session. This is exactly 3692 how SSH does it. 3694 Of course, the security rests on the assumption that the MiTM did not 3695 attack in the first session. That assumption seems to work most of 3696 the time in the SSH world. The user would have to be warned the 3697 first time a public key is encountered, just as in SSH. If possible, 3698 the SAS should be checked before the user consents to caching the new 3699 public key. If the SAS matches in the first session, there is no 3700 MiTM, and it's safe to cache the public key. If no SAS comparison is 3701 possible, it's up to the user, or up to the application, to decide 3702 whether to take a leap of faith and proceed. That's how SSH works 3703 most of the time, because SSH users don't have the chance to verbally 3704 compare an SAS with anyone. 3706 7.3. Relaying the SAS through a PBX 3708 ZRTP is designed to use end-to-end encryption. The two parties' 3709 verbal comparison of the short authentication string (SAS) depends on 3710 this assumption. But in some PBX environments, such as Asterisk, 3711 there are usage scenarios that have the PBX acting as a trusted man- 3712 in-the-middle (MiTM), which means there are two back-to-back ZRTP 3713 connections with separate session keys and separate SAS's. 3715 For example, imagine that Bob has a ZRTP-enabled VoIP phone that has 3716 been registered with his company's PBX, so that it is regarded as an 3717 extension of the PBX. Alice, whose phone is not associated with the 3718 PBX, might dial the PBX from the outside, and a ZRTP connection is 3719 negotiated between her phone and the PBX. She then selects Bob's 3720 extension from the company directory in the PBX. The PBX makes a 3721 call to Bob's phone (which might be offsite, many miles away from the 3722 PBX through the Internet) and a separate ZRTP connection is 3723 negotiated between the PBX and Bob's phone. The two ZRTP sessions 3724 have different session keys and different SAS's, which would render 3725 the SAS useless for verbal comparison between Alice and Bob. They 3726 might even mistakenly believe that a wiretapper is present because of 3727 the SAS mismatch, causing undue alarm. 3729 ZRTP has a mechanism for solving this problem by having the PBX relay 3730 the Alice/PBX SAS to Bob, sending it through to Bob in a special 3731 SASrelay message as defined in Section 5.13, which is sent after the 3732 PBX/Bob ZRTP negotiation is complete, after the Confirm messages. 3733 Only the PBX, acting as a special trusted MiTM (trusted by the 3734 recipient of the SASrelay message), will relay the SAS. The SASrelay 3735 message protects the relayed SAS from tampering via an included MAC, 3736 similar to how the Confirm message is protected. Bob's ZRTP-enabled 3737 phone accepts the relayed SAS for rendering only because Bob's phone 3738 had previously been configured to trust the PBX. This special 3739 trusted relationship with the PBX can be established through a 3740 special security enrollment procedure. After that enrollment 3741 procedure, the PBX is treated by Bob as a special trusted MiTM. This 3742 results in Alice's SAS being rendered to Bob, so that Alice and Bob 3743 may verbally compare them and thus prevent a MiTM attack by any other 3744 untrusted MiTM. 3746 A real bad-guy MiTM cannot exploit this protocol feature to mount a 3747 MiTM attack and relay Alice's SAS to Bob, because Bob has not 3748 previously carried out a special registration ritual with the bad 3749 guy. The relayed SAS would not be rendered by Bob's phone, because 3750 it did not come from a trusted PBX. The recognition of the special 3751 trust relationship is achieved with the prior establishment of a 3752 special shared secret between Bob and his PBX, which is called 3753 pbxsecret (defined in Section 7.3.1), also known as the trusted MiTM 3754 key. 3756 The trusted MiTM key can be stored in a special cache at the time of 3757 the initial enrollment (which is carried out only once for Bob's 3758 phone), and Bob's phone associates this key with the ZID of the PBX, 3759 while the PBX associates it with the ZID of Bob's phone. After the 3760 enrollment has established and stored this trusted MiTM key, it can 3761 be detected during subsequent ZRTP session negotiations between the 3762 PBX and Bob's phone, because the PBX and the phone MUST pass the hash 3763 of the trusted MiTM key in the DH message. It is then used as part 3764 of the key agreement to calculate s0. 3766 The PBX can determine whether it is trusted by the ZRTP user agent of 3767 a phone. The presence of a shared trusted MiTM key in the key 3768 negotiation sequence indicates that the phone has been enrolled with 3769 this PBX and therefore trusts it to act as a trusted MiTM. During a 3770 key agreement with two other ZRTP endpoints, the PBX may have a 3771 shared trusted MiTM key with both endpoints, only one endpoint, or 3772 neither endpoint. If the PBX has a shared trusted MiTM key with 3773 neither endpoint, the PBX MUST NOT relay the SAS. If the PBX has a 3774 shared trusted MiTM key with only one endpoint, the PBX MUST relay 3775 the SAS from one party to the other by sending an SASrelay message to 3776 the endpoint with which it shares a trusted MiTM key. If the PBX has 3777 a shared trusted MiTM key with both endpoints, the PBX MUST relay the 3778 SAS to only one endpoint, not both endpoints. 3780 Note: In the case of sharing trusted MiTM key with both endpoints, 3781 it does not matter which endpoint receives the relayed SAS as long 3782 as only one endpoint receives it. 3784 The relayed SAS fields contain the SAS rendering type and the 3785 complete sashash. The receiver absolutely MUST NOT render the 3786 relayed SAS if it does not come from a specially trusted ZRTP 3787 endpoint. The security of the ZRTP protocol depends on not rendering 3788 a relayed SAS from an untrusted MiTM, because it may be relayed by a 3789 MiTM attacker. See the SASrelay message definition (Figure 16) for 3790 further details. 3792 To ensure that both Alice and Bob will use the same SAS rendering 3793 scheme after the keys are negotiated, the PBX also sends the SASrelay 3794 message to the unenrolled party (which does not regard this PBX as a 3795 trusted MiTM), conveying the SAS rendering scheme, but not the 3796 sashash, which it sets to zero. The unenrolled party will ignore the 3797 relayed SAS field, but will use the specified SAS rendering scheme. 3799 It is possible to route a call through two ZRTP-enabled PBXs using 3800 this scheme. Assume Alice is a ZRTP endpoint who trusts her local 3801 PBX in Atlanta, and Bob is a ZRTP endpoint who trusts his local PBX 3802 in Biloxi. The call is routed from Alice to the Atlanta PBX to the 3803 Biloxi PBX to Bob. Atlanta would relay the Atlanta-Biloxi SAS to 3804 Alice because Alice is enrolled with Atlanta, and Biloxi would relay 3805 the Atlanta-Biloxi SAS to Bob because Bob is enrolled with Biloxi. 3806 The two PBXs are not assumed to be enrolled with each other in this 3807 example. Both Alice and Bob would view and verbally compare the same 3808 relayed SAS, the Atlanta-Biloxi SAS. No more than two trusted MiTM 3809 nodes can be traversed with this relaying scheme. 3811 A ZRTP endpoint phone which trusts a PBX to act as a trusted MiTM is 3812 effectively delegating its own policy decisions of algorithm 3813 negotiation to the PBX. 3815 When a PBX is between two ZRTP endpoints and is terminating their 3816 media streams at the PBX, the PBX presents its own ZID to the two 3817 parties, eclipsing the ZIDs of the two parties from each other. For 3818 example, if several different calls are routed through such a PBX to 3819 several different ZRTP-enabled phones behind the PBX, only a single 3820 ZID is presented to the calling party in every case-- the ZID of the 3821 PBX itself. 3823 The next section describes the initial enrollment procedure that 3824 establishes a special shared secret, a trusted MiTM key, between a 3825 PBX and a phone, so that the phone will learn to recognize the PBX as 3826 a trusted MiTM. 3828 7.3.1. PBX Enrollment and the PBX Enrollment Flag 3830 Both the PBX and the endpoint need to know when enrollment is taking 3831 place. One way of doing this is to setup an enrollment extension on 3832 the PBX which a newly configured endpoint would call and establish a 3833 ZRTP session. The PBX would then play audio media that offers the 3834 user an opportunity to configure his phone to trust this PBX as a 3835 trusted MiTM. The PBX calculates and stores the trusted MiTM shared 3836 secret in its cache and associates it with this phone, indexed by the 3837 phone's ZID. The trusted MiTM PBX shared secret is derived from 3838 ZRTPSess via the ZRTP key derivation function (Section 4.5.1) in this 3839 manner: 3841 pbxsecret = KDF(ZRTPSess, "Trusted MiTM key", (ZIDi || ZIDr), 256) 3843 The pbxsecret is calculated for the whole ZRTP session, not for each 3844 stream within a session, thus the KDF Context field in this case does 3845 not include any stream-specific nonce material. 3847 The PBX signals the enrollment process by setting the PBX Enrollment 3848 flag (E) in the Confirm message (Figure 10). This flag is used to 3849 trigger the ZRTP endpoint's user interface to prompt the user if they 3850 want to trust this PBX and calculate and store the pbxsecret in the 3851 cache. If the user decides to respond by activating the appropriate 3852 user interface element (a menu item, checkbox, or button), his ZRTP 3853 user agent calculates pbxsecret using the same formula and saves it 3854 in a special cache entry associated with this PBX. 3856 During a PBX enrollment, the GoClear features are disabled. If the 3857 (E) flag is set by the PBX, the PBX MUST NOT set the Allow Clear (A) 3858 flag. Thus, (E) implies not (A). If a received Confirm message has 3859 the (E) flag set, the (A) flag MUST be disregarded and treated as 3860 false. 3862 If the user elects not to enroll, perhaps because he dialed a wrong 3863 number or does not yet feel comfortable with this PBX, he can simply 3864 hang up and not save the pbxsecret in his cache. The PBX will have 3865 it saved in the PBX cache, but that will do no harm. The SASrelay 3866 scheme does not depend on the PBX trusting the phone. It only 3867 depends on the phone trusting the PBX. It is the phone (the user) 3868 who is at risk if the PBX abuses its MiTM privileges. 3870 An endpoint MUST NOT store the pbxsecret in the cache without 3871 explicit user authorization. 3873 After this enrollment process, the PBX and the ZRTP-enabled phone 3874 both share a secret that enables the phone to recognize the PBX as a 3875 trusted MiTM in future calls. This means that when a future call 3876 from an outside ZRTP-enabled caller is relayed through the PBX to 3877 this phone, the phone will render a relayed SAS from the PBX. If the 3878 SASrelay message comes from a MiTM which does not know the pbxsecret, 3879 the phone treats it as a "bad guy" MiTM, and refuses to render the 3880 relayed SAS. Regardless of which party initiates any future phone 3881 calls through the PBX, the enrolled phone or the outside phone, the 3882 PBX will relay the SAS to the enrolled phone. 3884 There are other ways that ZRTP user agents can be configured to trust 3885 a PBX. Perhaps the pbxsecret can be configured into the phone by 3886 some automated provisioning process in large IT environments. This 3887 specification does not require that products be configured solely by 3888 this enrollment process. Any process that results in a pbxsecret to 3889 be computed and shared between the PBX and the phone will suffice. 3890 This is one such method that has been shown to work. 3892 8. Signaling Interactions 3894 This section discusses how ZRTP, SIP, and SDP work together. 3896 Note that ZRTP may be implemented without coupling with the SIP 3897 signaling. For example, ZRTP can be implemented as a "bump in the 3898 wire" or as a "bump in the stack" in which RTP sent by the SIP UA is 3899 converted to ZRTP. In these cases, the SIP UA will have no knowledge 3900 of ZRTP. As a result, the signaling path discovery mechanisms 3901 introduced in this section should not be definitive - they are a 3902 hint. Despite the absence of an indication of ZRTP support in an 3903 offer or answer, a ZRTP endpoint SHOULD still send Hello messages. 3905 ZRTP endpoints which have control over the signaling path include a 3906 ZRTP SDP attributes in their SDP offers and answers. The ZRTP 3907 attribute, a=zrtp-hash is used to indicate support for ZRTP and to 3908 convey a hash of the Hello message. The hash is computed according 3909 to Section 8.1. 3911 Aside from the advantages described in Section 8.1, there are a 3912 number of potential uses for this attribute. It is useful when 3913 signaling elements would like to know when ZRTP may be utilized by 3914 endpoints. It is also useful if endpoints support multiple methods 3915 of SRTP key management. The ZRTP attribute can be used to ensure 3916 that these key management approaches work together instead of against 3917 each other. For example, if only one endpoint supports ZRTP but both 3918 support another method to key SRTP, then the other method will be 3919 used instead. When used in parallel, an SRTP secret carried in an 3920 a=keymgt [RFC4567] or a=crypto [RFC4568] attribute can be used as a 3921 shared secret for the srtps computation defined in Section 8.2. The 3922 ZRTP attribute is also used to signal to an intermediary ZRTP device 3923 not to act as a ZRTP endpoint, as discussed in Section 10. 3925 The a=zrtp-hash attribute can only be included in the SDP at the 3926 media level since Hello messages sent in different media streams will 3927 have unique hashes. 3929 The ABNF for the ZRTP attribute is as follows: 3931 zrtp-attribute = "a=zrtp-hash:" zrtp-version zrtp-hash-value 3933 zrtp-version = token 3935 zrtp-hash-value = 1*(HEXDIG) 3937 Here's an example of the ZRTP attribute in an initial SDP offer or 3938 answer used at the media level, using the convention 3939 defined in RFC 4475, section 2.1 [RFC4475]: 3941 v=0 3942 o=bob 2890844527 2890844527 IN IP4 client.biloxi.example.com 3943 s= 3944 c=IN IP4 client.biloxi.example.com 3945 t=0 0 3946 m=audio 3456 RTP/AVP 97 33 3947 a=rtpmap:97 iLBC/8000 3948 a=rtpmap:33 no-op/8000 3949 3950 a=zrtp-hash:1.10 fe30efd02423cb054e50efd0248742ac7a52c8f91bc2 3951 df881ae642c371ba46df 3952 3954 A mechanism for carrying this same zrtp-hash information in the 3955 Jingle signaling protocol is defined in [XEP-0262]. 3957 It should be safe to send ZRTP messages even when there is no 3958 evidence in the signaling that the other party supports it, because 3959 ZRTP has been designed to be clearly different from RTP, having a 3960 similar structure to STUN packets sent during an ICE exchange. 3962 8.1. Binding the media stream to the signaling layer via the Hello Hash 3964 Tying the media stream to the signaling channel can help prevent a 3965 third party from inserting false media packets. If the signaling 3966 layer contains information that ties it to the media stream, false 3967 media streams can be rejected. 3969 To accomplish this, the entire Hello message (Figure 3) is hashed, 3970 using the hash algorithm defined in Section 5.1.2.2. The ZRTP packet 3971 framing from Figure 2 is not included in the hash. The resulting 3972 hash image is made available without truncation to the signaling 3973 layer, where it is transmitted as a hexadecimal value in the SIP 3974 channel using the SDP attribute a=zrtp-hash, defined in this 3975 specification. Assuming Section 5.1.2.2 defines a 256-bit hash 3976 length, the a=zrtp-hash field in the SDP attribute carries 64 3977 hexidecimal digits. Each media stream (audio or video) will have a 3978 separate Hello message, and thus will require a separate a=zrtp-hash 3979 in an SDP attribute. The recipient of the SIP/SDP message can then 3980 use this hash image to detect and reject false Hello messages in the 3981 media channel, as well as identify which media stream is associated 3982 with this SIP call. Each Hello message hashes uniquely, because it 3983 contains the H3 field derived from a random nonce, defined in 3984 Section 9. 3986 The Hello Hash as an SDP attribute is not a REQUIRED feature, because 3987 some ZRTP endpoints do not have the ability to add SDP attributes to 3988 the signaling. For example, if ZRTP is implemented in a hardware 3989 bump-in-the-wire device, it might only have the ability to modify the 3990 media packets, not the SIP packets, especially if the SIP packets are 3991 integrity protected and thus cannot be modified on the wire. If the 3992 SDP has no hash image of the ZRTP Hello message, the recipient's ZRTP 3993 user agent cannot check it, and thus will not be able to reject Hello 3994 messages based on this hash. 3996 After the Hello Hash is used to properly identify the ZRTP Hello 3997 message as belonging to this particular SIP call, the rest of the 3998 ZRTP message sequence is protected from false packet injection by 3999 other protection mechanisms, such as the hash chaining mechanism 4000 defined in Section 9. 4002 An attacker who controls only the signaling layer, such as an 4003 uncooperative VoIP service provider, may be able to deny service by 4004 corrupting the hash of the Hello message in the SDP attribute, which 4005 would force ZRTP to reject perfectly good Hello messages. If there 4006 is reason to believe this is happening, the ZRTP endpoint MAY allow 4007 Hello messages to be accepted that do not match the hash image in the 4008 SDP attribute. 4010 Even in the absence of SIP integrity protection, the inclusion of the 4011 a=zrtp-hash SDP attribute, when coupled with the hash chaining 4012 mechanism defined in Section 9, meets the R-ASSOC requirement in the 4013 Media Security Requirements [RFC5479], which requires: 4015 "...a mechanism for associating key management messages with both 4016 the signaling traffic that initiated the session and with 4017 protected media traffic. Allowing such an association also allows 4018 the SDP offerer to avoid performing CPU-consuming operations 4019 (e.g., Diffie-Hellman or public key operations) with attackers 4020 that have not seen the signaling messages." 4022 The a=zrtp-hash SDP attribute becomes especially useful if the SDP is 4023 integrity-protected end-to-end by SIP Identity [RFC4474] or better 4024 still, Dan Wing's SIP Identity using Media Path 4025 [I-D.wing-sip-identity-media]. This leads to an ability to stop MiTM 4026 attacks independent of ZRTP's SAS mechanism, as explained in 4027 Section 8.1.1 below. 4029 8.1.1. Integrity-protected signaling enables integrity-protected DH 4030 exchange 4032 If and only if the signaling path and the SDP is protected by some 4033 form of end-to-end integrity protection, such as one of the 4034 abovementioned mechanisms, so that it can guarantee delivery of the 4035 a=zrtp-hash attribute without any tampering by a third party, and if 4036 there is good reason to trust the signaling layer to protect the 4037 interests of the end user, it is possible to authenticate the key 4038 exchange and prevent a MiTM attack. This can be done without 4039 requiring the users to verbally compare the SAS, by using the hash 4040 chaining mechanism defined in Section 9 to provide a series of MAC 4041 keys that protect the entire ZRTP key exchange. Thus, an end-to-end 4042 integrity-protected signaling layer automatically enables an 4043 integrity-protected Diffie-Hellman exchange in ZRTP, which in turn 4044 means immunity from a MiTM attack. Here's how it works. 4046 The integrity-protected SIP SDP contains a hash commitment to the 4047 entire Hello message. The Hello message contains H3, which provides 4048 a hash commitment for the rest of the hash chain H0-H2 (Section 9). 4049 The Hello message is protected by a 64-bit MAC, keyed by H2. The 4050 Commit message is protected by a 64-bit MAC keyed by H1. The DHPart1 4051 or DHPart2 messages are protected by a 64-bit MAC keyed by H0. The 4052 MAC protecting the Confirm messages are computed by a different MAC 4053 key derived from the resulting key agreement. Each message's MAC is 4054 checked when the MAC key is received in the next message. If a bad 4055 MAC is discovered, it MUST be treated as a security exception 4056 indicating a MiTM attack, perhaps by logging or alerting the user, 4057 and MUST NOT be treated as a random error. Random errors are already 4058 discovered and quietly rejected by bad CRCs (Figure 2). 4060 The Hello message must be assembled before any hash algorithms are 4061 negotiated, so an implicit predetermined hash algorithm and MAC 4062 algorithm (both defined in Section 5.1.2.2) must be used. All of the 4063 aforementioned MACs keyed by the hashes in the aforementioned hash 4064 chain MUST be computed with the MAC algorithm defined in 4065 Section 5.1.2.2, with the MAC truncated to 64 bits. 4067 The Media Security Requirements [RFC5479] R-EXISTING requirement can 4068 be fully met by leveraging a certificate-backed PKI in the signaling 4069 layer to integrity-protect the delivery of the a=zrtp-hash SDP 4070 attribute. This would thereby protect ZRTP against a MiTM attack, 4071 without requiring the user to check the SAS, without adding any 4072 explicit signatures or signature keys to the ZRTP key exchange, and 4073 without any extra public key operations or extra packets. 4075 Without an end-to-end integrity protection mechanism in the signaling 4076 layer to guarantee delivery of the a=zrtp-hash SDP attribute without 4077 modification by a third party, these MACs alone will not prevent a 4078 MiTM attack. In that case, ZRTP's built-in SAS mechanism will still 4079 have to be used to authenticate the key exchange. At the time of 4080 this writing, very few deployed VoIP clients offer a fully 4081 implemented SIP stack that provides end-to-end integrity protection 4082 for the delivery of SDP attributes. Also, end-to-end signaling 4083 integrity becomes more problematic if E.164 numbers [RFC3824] are 4084 used in SIP. Thus, real-world implementations of ZRTP endpoints will 4085 continue to depend on SAS authentication for quite some time. Even 4086 after there is widespread availability of SIP user agents that offer 4087 integrity protected delivery of SDP attributes, many users will still 4088 be faced with the fact that the signaling path may be controlled by 4089 institutions that do not have the best interests of the end user in 4090 mind. In those cases, SAS authentication will remain the gold 4091 standard for the prudent user. 4093 Even without SIP integrity protection, the Media Security 4094 Requirements [RFC5479] R-ACT-ACT requirement can be met by ZRTP's SAS 4095 mechanism. Although ZRTP may benefit from an integrity-protected SIP 4096 layer, it is fortunate that ZRTP's self-contained MiTM defenses do 4097 not actually require an integrity-protected SIP layer. ZRTP can 4098 bypass the delays and problems that SIP integrity faces, such as 4099 E.164 number usage, and the complexity of building and maintaining a 4100 PKI. 4102 In contrast, DTLS-SRTP [RFC5764] appears to depend heavily on end-to- 4103 end integrity protection in the SIP layer. Further, DTLS-SRTP must 4104 bear the additional cost of a signature calculation of its own, in 4105 addition to the signature calculation the SIP layer uses to achieve 4106 its integrity protection. ZRTP needs no signature calculation of its 4107 own to leverage the signature calculation carried out in the SIP 4108 layer. 4110 8.2. Deriving the SRTP secret (srtps) from the signaling layer 4112 The shared secret calculations defined in Section 4.3 make use of the 4113 SRTP secret (srtps), if it is provided by the signaling layer. 4115 It is desirable for only one SRTP key negotiation protocol to be 4116 used, and that protocol should be ZRTP. But in the event the 4117 signaling layer negotiates its own SRTP master key and salt, using 4118 the SDP Security Descriptions (SDES [RFC4568]) or [RFC4567], it can 4119 be passed from the signaling to the ZRTP layer and mixed into ZRTP's 4120 own shared secret calculations, without compromising security by 4121 creating a dependency on the signaling for media encryption. 4123 ZRTP computes srtps from the SRTP master key and salt parameters 4124 provided by the signaling layer in this manner, truncating the result 4125 to 256 bits: 4127 srtps = KDF(SRTP master key, "SRTP Secret", (ZIDi || ZIDr || SRTP 4128 master salt), 256) 4130 It is expected that the srtps parameter will be rarely computed or 4131 used in typical ZRTP endpoints, because it is likely and desirable 4132 that ZRTP will be the sole means of negotiating SRTP keys, needing no 4133 help from [RFC4568] or [RFC4567]. If srtps is computed, it will be 4134 stored in the auxiliary shared secret auxsecret, defined in 4135 Section 4.3, and used in Section 4.3.1. 4137 8.3. Codec Selection for Secure Media 4139 Codec selection is negotiated in the signaling layer. If the 4140 signaling layer determines that ZRTP is supported by both endpoints, 4141 this should provide guidance in codec selection to avoid variable 4142 bit-rate (VBR) codecs that leak information. 4144 When voice is compressed with a VBR codec, the packet lengths vary 4145 depending on the types of sounds being compressed. This leaks a lot 4146 of information about the content even if the packets are encrypted, 4147 regardless of what encryption protocol is used [Wright1]. It is 4148 RECOMMENDED that VBR codecs be avoided in encrypted calls. It is not 4149 a problem if the codec adapts the bit rate to the available channel 4150 bandwidth. The vulnerable codecs are the ones that change their bit 4151 rate depending on the type of sound being compressed. 4153 It also appears that voice activity detection (VAD) leaks information 4154 about the content of the conversation, but to a lesser extent than 4155 VBR. This effect can be mitigated by lengthening the VAD hangover 4156 time by a random amount between 1 to 2 seconds, if this is feasible 4157 in your application. Only short bursts of speech would benefit from 4158 lengthening the VAD hangover time. 4160 The security problems of VBR and VAD are addressed in detail by the 4161 guidelines in [I-D.perkins-avt-srtp-vbr-audio]. It is RECOMMENDED 4162 that ZRTP endpoints follow these guidelines. 4164 9. False ZRTP Packet Rejection 4166 An attacker who is not in the media path may attempt to inject false 4167 ZRTP protocol packets, possibly to effect a denial of service attack, 4168 or to inject his own media stream into the call. VoIP by its nature 4169 invites various forms of denial of service attacks and requires 4170 protocol features to reject such attacks. While bogus SRTP packets 4171 may be easily rejected via the SRTP auth tag field, that can only be 4172 applied after a key agreement is completed. During the ZRTP key 4173 negotiation phase, other false packet rejection mechanisms are 4174 needed. One such mechanism is the use of the total_hash in the final 4175 shared secret calculation, but that can only detect false packets 4176 after performing the computationally expensive Diffie-Hellman 4177 calculation. 4179 A lot of work has been done on the analysis of denial of service 4180 attacks, especially from attackers who are not in the media path. 4181 Such an attacker might inject false ZRTP packets to force a ZRTP 4182 endpoint to engage in an endless series of pointless and expensive DH 4183 calculations. To detect and reject false packets cheaply and rapidly 4184 as soon as they are received, ZRTP uses a hash chain, which is a 4185 series of successive hash images. Before each session, the following 4186 values are computed: 4188 H0 = 256-bit random nonce (different for each party) 4189 H1 = hash (H0) 4190 H2 = hash (H1) 4191 H3 = hash (H2) 4193 The hash chain MUST use the hash algorithm defined in 4194 Section 5.1.2.2, truncated to 256 bits. Each 256-bit hash image is 4195 the preimage of the next, and the sequence of images is sent in 4196 reverse order in the ZRTP packet sequence. The hash image H3 is sent 4197 in the Hello message, H2 is sent in the Commit message, H1 is sent in 4198 the DHPart1 or DHPart2 messages, and H0 is sent in the Confirm1 or 4199 Confirm2 messages. The initial random H0 nonces that each party 4200 generates MUST be unpredictable to an attacker and unique within a 4201 ZRTP session, which thereby forces the derived hash images H1-H3 to 4202 also be unique and unpredictable. 4204 The recipient checks if the packet has the correct hash preimage, by 4205 hashing it and comparing the result with the hash image for the 4206 preceding packet. Packets which contain an incorrect hash preimage 4207 MUST NOT be used by the recipient, but MAY be processed as security 4208 exceptions, perhaps by logging or alerting the user. As long as 4209 these bogus packets are not used, and correct packets are still being 4210 received, the protocol SHOULD be allowed to run to completion, 4211 thereby rendering ineffective this denial of service attack. 4213 Note that since H2 is sent in the Commit message, and the initiator 4214 does not receive a Commit message, the initiator computes the 4215 responder's missing H2 by hashing the responder's H1. An analogous 4216 interpolation is performed by both parties to handle the skipped 4217 DHPart1 and DHPart2 messages in Preshared (Section 3.1.2) or 4218 Multistream (Section 3.1.3) modes. 4220 Because these hash images alone do not protect the rest of the 4221 contents of the packet they reside in, this scheme assumes the 4222 attacker cannot modify the packet contents from a legitimate party, 4223 which is a reasonable assumption for an attacker who is not in the 4224 media path. This covers an important range of denial-of-service 4225 attacks. For dealing with the remaining set of attacks that involve 4226 packet modification, other mechanisms are used, such as the 4227 total_hash in the final shared secret calculation, and the hash 4228 commitment in the Commit message. 4230 Hello messages injected by an attacker may be detected and rejected 4231 by the inclusion of a hash of the Hello message in the signaling, as 4232 described in Section 8. This mechanism requires that each Hello 4233 message be unique, and the inclusion of the H3 hash image meets that 4234 requirement. 4236 If and only if an integrity-protected signaling channel is available, 4237 the MACs that are keyed by this hash chaining scheme can be used to 4238 authenticate the entire ZRTP key exchange, and thereby prevent a MiTM 4239 attack, without relying on the users verbally comparing the SAS. See 4240 Section 8.1.1 for details. 4242 Some ZRTP user agents allow the user to manually switch to clear mode 4243 (via the GoClear message) in the middle of a secure call, and then 4244 later initiate secure mode again. Many consumer client products will 4245 omit this feature, but those that allow it may return to secure mode 4246 again in the same media stream. Although the same chain of hash 4247 images will be re-used and thus rendered ineffective the second time, 4248 no real harm is done because the new SRTP session keys will be 4249 derived in part from a cached shared secret, which was safely 4250 protected from the MiTM in the previous DH exchange earlier in the 4251 same session. 4253 10. Intermediary ZRTP Devices 4255 This section discusses the operation of a ZRTP endpoint which is 4256 actually an intermediary. For example, consider a device which 4257 proxies both signaling and media between endpoints. There are three 4258 possible ways in which such a device could support ZRTP. 4260 An intermediary device can act transparently to the ZRTP protocol. 4261 To do this, a device MUST pass non-RTP protocols multiplexed on the 4262 same port as RTP (to allow ZRTP and STUN). This is the RECOMMENDED 4263 behavior for intermediaries as ZRTP and SRTP are best when done end- 4264 to-end. 4266 An intermediary device could implement the ZRTP protocol and act as a 4267 ZRTP endpoint on behalf of non-ZRTP endpoints behind the intermediary 4268 device. The intermediary could determine on a call-by-call basis 4269 whether the endpoint behind it supports ZRTP based on the presence or 4270 absence of the ZRTP SDP attribute flag (a=zrtp-hash). For non-ZRTP 4271 endpoints, the intermediary device could act as the ZRTP endpoint 4272 using its own ZID and cache. This approach SHOULD only be used when 4273 there is some other security method protecting the confidentiality of 4274 the media between the intermediary and the inside endpoint, such as 4275 IPSec or physical security. 4277 The third mode, which is NOT RECOMMENDED, is for the intermediary 4278 device to attempt to back-to-back the ZRTP protocol. The only 4279 exception to this case is where the intermediary device is a trusted 4280 element providing services to one of the endpoints - e.g. a Private 4281 Branch Exchange or PBX. In this mode, the intermediary would attempt 4282 to act as a ZRTP endpoint towards both endpoints of the media 4283 session. This approach MUST NOT be used except as described in 4284 Section 7.3 as it will always result in a detected man-in-the-middle 4285 attack and will generate alarms on both endpoints and likely result 4286 in the immediate termination of the session. The PBX MUST uses a 4287 single ZID for all endpoints behind it. 4289 In cases where centralized media mixing is taking place, the SAS will 4290 not match when compared by the humans. This situation can sometimes 4291 be known in the SIP signaling by the presence of the isfocus feature 4292 tag [RFC4579]. As a result, when the isfocus feature tag is present, 4293 the DH exchange can be authenticated by the mechanism defined in 4294 Section 8.1.1 or by validating signatures (Section 7.2) in the 4295 Confirm or SASrelay messages. For example, consider an audio 4296 conference call with three participants Alice, Bob, and Carol hosted 4297 on a conference bridge in Dallas. There will be three ZRTP encrypted 4298 media streams, one encrypted stream between each participant and 4299 Dallas. Each will have a different SAS. Each participant will be 4300 able to validate their SAS with the conference bridge by using 4301 signatures optionally present in the Confirm messages (described in 4302 Section 7.2). Or, if the signaling path has end-to-end integrity 4303 protection, each DH exchange will have automatic MiTM protection by 4304 using the mechanism in Section 8.1.1. 4306 SIP feature tags can also be used to detect if a session is 4307 established with an automaton such as an IVR, voicemail system, or 4308 speech recognition system. The display of SAS strings to users 4309 should be disabled in these cases. 4311 It is possible that an intermediary device acting as a ZRTP endpoint 4312 might still receive ZRTP Hello and other messages from the inside 4313 endpoint. This could occur if there is another inline ZRTP device 4314 which does not include the ZRTP SDP attribute flag. An intermediary 4315 acting as a ZRTP endpoint receiving ZRTP Hello and other messages 4316 from the inside endpoint MUST NOT pass these ZRTP messages. 4318 11. The ZRTP Disclosure flag 4320 There are no back doors defined in the ZRTP protocol specification. 4321 The designers of ZRTP would like to discourage back doors in ZRTP- 4322 enabled products. However, despite the lack of back doors in the 4323 actual ZRTP protocol, it must be recognized that a ZRTP implementer 4324 might still deliberately create a rogue ZRTP-enabled product that 4325 implements a back door outside the scope of the ZRTP protocol. For 4326 example, they could create a product that discloses the SRTP session 4327 key generated using ZRTP out-of-band to a third party. They may even 4328 have a legitimate business reason to do this for some customers. 4330 For example, some environments have a need to monitor or record 4331 calls, such as stock brokerage houses who want to discourage insider 4332 trading, or special high security environments with special needs to 4333 monitor their own phone calls. We've all experienced automated 4334 messages telling us that "This call may be monitored for quality 4335 assurance". A ZRTP endpoint in such an environment might 4336 unilaterally disclose the session key to someone monitoring the call. 4337 ZRTP-enabled products that perform such out-of-band disclosures of 4338 the session key can undermine public confidence in the ZRTP protocol, 4339 unless we do everything we can in the protocol to alert the other 4340 user that this is happening. 4342 If one of the parties is using a product that is designed to disclose 4343 their session key, ZRTP requires them to confess this fact to the 4344 other party through a protocol message to the other party's ZRTP 4345 client, which can properly alert that user, perhaps by rendering it 4346 in a graphical user interface. The disclosing party does this by 4347 sending a Disclosure flag (D) in Confirm1 and Confirm2 messages as 4348 described in Section 5.7. 4350 Note that the intention here is to have the Disclosure flag identify 4351 products that are designed to disclose their session keys, not to 4352 identify which particular calls are compromised on a call-by-call 4353 basis. This is an important legal distinction, because most 4354 government sanctioned wiretap regulations require a VoIP service 4355 provider to not reveal which particular calls are wiretapped. But 4356 there is nothing illegal about revealing that a product is designed 4357 to be wiretap-friendly. The ZRTP protocol mandates that such a 4358 product "out" itself. 4360 You might be using a ZRTP-enabled product with no back doors, but if 4361 your own graphical user interface tells you the call is (mostly) 4362 secure, except that the other party is using a product that is 4363 designed in such a way that it may have disclosed the session key for 4364 monitoring purposes, you might ask him what brand of secure telephone 4365 he is using, and make a mental note not to purchase that brand 4366 yourself. If we create a protocol environment that requires such 4367 back-doored phones to confess their nature, word will spread quickly, 4368 and the "invisible hand" of the free market will act. The free 4369 market has effectively dealt with this in the past. 4371 Of course, a ZRTP implementer can lie about his product having a back 4372 door, but the ZRTP standard mandates that ZRTP-compliant products 4373 MUST adhere to the requirement that a back door be confessed by 4374 sending the Disclosure flag to the other party. 4376 There will be inevitable comparisons to Steve Bellovin's 2003 April 4377 fool's joke, when he submitted RFC 3514 [RFC3514] which defined the 4378 "Evil bit" in the IPV4 header, for packets with "evil intent". But 4379 we submit that a similar idea can actually have some merit for 4380 securing VoIP. Sure, one can always imagine that some implementer 4381 will not be fazed by the rules and will lie, but they would have lied 4382 anyway even without the Disclosure flag. There are good reasons to 4383 believe that it will improve the overall percentage of 4384 implementations that at least tell us if they put a back door in 4385 their products, and may even get some of them to decide not to put in 4386 a back door at all. From a civic hygiene perspective, we are better 4387 off with having the Disclosure flag in the protocol. 4389 If an endpoint stores or logs SRTP keys or information that can be 4390 used to reconstruct or recover SRTP keys after they are no longer in 4391 use (i.e. the session is active), or otherwise discloses or passes 4392 SRTP keys or information that can be used to reconstruct or recover 4393 SRTP keys to another application or device, the Disclosure flag D 4394 MUST be set in the Confirm1 or Confirm2 message. 4396 11.1. Guidelines on Proper Implementation of the Disclosure Flag 4398 Some implementers have asked for guidance on implementing the 4399 Disclosure Flag. Some people have incorrectly thought that a 4400 connection secured with ZRTP cannot be used in a call center, with 4401 voluntary voice recording, or even with a voicemail system. 4402 Similarly, some potential users of ZRTP have over considered the 4403 protection that ZRTP can give them. These guidelines clarify both 4404 concerns. 4406 The ZRTP Disclosure Flag only governs the ZRTP/SRTP stream itself. 4407 It does not govern the underlying RTP media stream, nor the actual 4408 media itself. Consequently, a PBX that uses ZRTP may provide 4409 conference calls, call monitoring, call recording, voicemail, or 4410 other PBX features and still say that it does not disclose the ZRTP 4411 key material. A video system may provide DVR features and still say 4412 that it does not disclose the ZRTP key material. The ZRTP Disclosure 4413 Flag, when not set, means only that the ZRTP cryptographic key 4414 material stays within the bounds of the ZRTP subsystem. 4416 If an application has a need to disclose the ZRTP cryptographic key 4417 material, the easiest way to comply with the protocol is to set the 4418 flag to the proper value. The next easiest way is to overestimate 4419 disclosure. For example, a call center that commonly records calls 4420 might choose to set the disclosure flag even though all recording is 4421 an analog recording of a call (and thus outside the ZRTP scope) 4422 because it sets an expectation with clients that their calls might be 4423 recorded. 4425 Note also that the ZRTP Disclosure Flag does not require an 4426 implementation to preclude hacking or malware. Malware that leaks 4427 ZRTP cryptographic key material does not create a liability for the 4428 implementor from non-compliance with the ZRTP specification. 4430 A user of ZRTP should note that ZRTP is not a panacea against 4431 unauthorized recording. ZRTP does not and cannot protect against an 4432 untrustworthy partner who holds a microphone up to the speaker. It 4433 does not protect against someone else being in the room. It does not 4434 protect against analog wiretaps in the phone or in the room. It does 4435 not mean your partner has not been hacked with spyware. It does not 4436 mean that the software has no flaws. It means that the ZRTP 4437 subsystem is not knowingly leaking ZRTP cryptographic key material. 4439 12. Mapping between ZID and AOR (SIP URI) 4441 The role of the ZID in the management of the local cache of shared 4442 secrets is explained in Section 4.9. A particular ZID is associated 4443 with a particular ZRTP endpoint, typically a VoIP client. A single 4444 SIP URI (also known as an Address-of-Record, or AOR) may be hosted on 4445 several different soft VoIP clients, desktop phones, and mobile 4446 handsets, and each of them will have a different ZID. Further, a 4447 single VoIP client may have several SIP URIs configured into its 4448 profiles, but only one ZID. There is not a one-to-one mapping 4449 between a ZID and a SIP URI. A single SIP URI may be associated with 4450 several ZIDs, and a single ZID may be associated with several SIP 4451 URIs on the same client. 4453 Not only that, but ZRTP is independent of which signaling protocol is 4454 used. It works equally well with SIP, Jingle, H.323, or P2P-SIP. 4455 Thus, a ZRTP ZID has little to do with SIP, per se, which means it 4456 has little to do with a SIP URI. 4458 Even though a ZID is associated with a device, not a human, it is 4459 often the case that a ZRTP endpoint is controlled mainly by a 4460 particular human. For example, it may be a mobile phone. To get the 4461 full benefit of the key continuity features, a local cache entry (and 4462 thus a ZID) should be associated with some sort of name of the remote 4463 party. That name could be a human name, or it could be made more 4464 precise by specifying which ZRTP endpoint he's using. For example 4465 "Jon Callas", or "Jon Callas on his iPhone", or "Jon on his iPad", or 4466 "Alice on her office phone". These name strings can be stored in the 4467 local cache, indexed by ZID, and may have been initially provided by 4468 the local user by hand. Or the local cache entry may contain a 4469 pointer to an entry in the local address book. When a secure session 4470 is established, if a prior session has established a cache entry, and 4471 the new session has a matching cache entry indexed by the same ZID, 4472 and the SAS has been previously verified, the person's name stored in 4473 that cache entry should be displayed. 4475 If the remote ZID originates from a PBX, the displayed name would be 4476 the name of that PBX, which might be the name of the company who owns 4477 that PBX. 4479 If it is desirable to associate some key material with a particular 4480 AOR, digital signatures (Section 7.2) may be used, with public key 4481 certificates that associate the signature key with an AOR. 4483 13. IANA Considerations 4485 This specification defines a new SDP [RFC4566] attribute in 4486 Section 8. 4488 Contact name: Philip Zimmermann 4490 Attribute name: "zrtp-hash". 4492 Type of attribute: Media level. 4494 Subject to charset: Not. 4496 Purpose of attribute: The 'zrtp-hash' indicates that a UA supports 4497 the ZRTP protocol and provides a hash of the 4498 ZRTP Hello message. The ZRTP protocol 4499 version number is also specified. 4501 Allowed attribute values: Hex. 4503 14. Media Security Requirements 4505 This section discuses how ZRTP meets all RTP security requirements 4506 discussed in the Media Security Requirements [RFC5479] document 4507 without any dependencies on other protocols or extensions, unlike 4508 DTLS-SRTP [RFC5764] which requires additional protocols and 4509 mechanisms. 4511 R-FORK-RETARGET is met since ZRTP is a media path key agreement 4512 protocol. 4514 R-DISTINCT is met since ZRTP uses ZIDs and allows multiple 4515 independent ZRTP exchanges to proceed. 4517 R-HERFP is met since ZRTP is a media path key agreement protocol. 4519 R-REUSE is met using the Multistream and Preshared modes. 4521 R-AVOID-CLIPPING is met since ZRTP is a media path key agreement 4522 protocol. 4524 R-RTP-CHECK is met since the ZRTP packet format does not pass the 4525 RTP validity check. 4527 R-ASSOC is met using the a=zrtp-hash SDP attribute in INVITEs and 4528 responses (Section 8.1). 4530 R-NEGOTIATE is met using the Commit message. 4532 R-PSTN is met since ZRTP can be implemented in Gateways. 4534 R-PFS is met using ZRTP Diffie-Hellman key agreement methods. 4536 R-COMPUTE is met using the Hello/Commit ZRTP exchange. 4538 R-CERTS is met using the verbal comparison of the SAS. 4540 R-FIPS is met since ZRTP uses only FIPS-approved algorithms in all 4541 relevant categories. The authors believe ZRTP is compliant with 4542 NIST SP 800-56A [SP800-56A], NIST SP 800-108 [SP800-108], NIST 4543 FIPS PUB 198-1 [FIPS-198-1], NIST FIPS PUB 180-3 [FIPS-180-3], 4544 NIST SP 800-38A [SP800-38A], NIST FIPS PUB 197 [FIPS-197], and NSA 4545 Suite B [NSA-Suite-B], which should meet the FIPS-140 validation 4546 requirements set by NIST FIPS PUB 140-2 Annex A 4547 [FIPS-140-2-Annex-A] and NIST FIPS PUB 140-2 Annex D 4548 [FIPS-140-2-Annex-D]. 4550 R-DOS is met since ZRTP does not introduce any new denial of 4551 service attacks. 4553 R-EXISTING is met since ZRTP can support the use of certificates 4554 or keys. 4556 R-AGILITY is met since the set of hash, cipher, authentication tag 4557 length, key agreement method, SAS type, and signature type can all 4558 be extended and negotiated. 4560 R-DOWNGRADE is met since ZRTP has protection against downgrade 4561 attacks. 4563 R-PASS-MEDIA is met since ZRTP prevents a passive adversary with 4564 access to the media path from gaining access to keying material 4565 used to protect SRTP media packets. 4567 R-PASS-SIG is met since ZRTP prevents a passive adversary with 4568 access to the signaling path from gaining access to keying 4569 material used to protect SRTP media packets. 4571 R-SIG-MEDIA is met using the a=zrtp-hash SDP attribute in INVITEs 4572 and responses. 4574 R-ID-BINDING is met using the a=zrtp-hash SDP attribute 4575 (Section 8.1). 4577 R-ACT-ACT is met using the a=zrtp-hash SDP attribute in INVITEs 4578 and responses. 4580 R-BEST-SECURE is met since ZRTP utilizes the RTP/AVP profile and 4581 hence best effort SRTP in every case. 4583 R-OTHER-SIGNALING is met since ZRTP can utilize modes in which 4584 there is no dependency on the signaling path. 4586 R-RECORDING is met using the ZRTP Disclosure flag. 4588 R-TRANSCODER is met if the transcoder operates as a trusted MitM 4589 (i.e. a PBX). 4591 R-ALLOW-RTP is met due to ZRTP's best effort encryption. 4593 15. Security Considerations 4595 This document is all about securely keying SRTP sessions. As such, 4596 security is discussed in every section. 4598 Most secure phones rely on a Diffie-Hellman exchange to agree on a 4599 common session key. But since DH is susceptible to a man-in-the- 4600 middle (MiTM) attack, it is common practice to provide a way to 4601 authenticate the DH exchange. In some military systems, this is done 4602 by depending on digital signatures backed by a centrally-managed PKI. 4603 A decade of industry experience has shown that deploying centrally 4604 managed PKIs can be a painful and often futile experience. PKIs are 4605 just too messy, and require too much activation energy to get them 4606 started. Setting up a PKI requires somebody to run it, which is not 4607 practical for an equipment provider. A service provider like a 4608 carrier might venture down this path, but even then you have to deal 4609 with cross-carrier authentication, certificate revocation lists, and 4610 other complexities. It is much simpler to avoid PKIs altogether, 4611 especially when developing secure commercial products. It is 4612 therefore more common for commercial secure phones in the PSTN world 4613 to augment the DH exchange with a Short Authentication String (SAS) 4614 combined with a hash commitment at the start of the key exchange, to 4615 shorten the length of SAS material that must be read aloud. No PKI 4616 is required for this approach to authenticating the DH exchange. The 4617 AT&T TSD 3600, Eric Blossom's COMSEC secure phones [comsec], PGPfone 4618 [pgpfone], and CryptoPhone [cryptophone] are all examples of products 4619 that took this simpler lightweight approach. The main problem with 4620 this approach is inattentive users who may not execute the voice 4621 authentication procedure. 4623 Some questions have been raised about voice spoofing during the SAS 4624 comparison. But it is a mistake to think this is simply an exercise 4625 in voice impersonation (perhaps this could be called the "Rich 4626 Little" attack). Although there are digital signal processing 4627 techniques for changing a person's voice, that does not mean a man- 4628 in-the-middle attacker can safely break into a phone conversation and 4629 inject his own short authentication string (SAS) at just the right 4630 moment. He doesn't know exactly when or in what manner the users 4631 will choose to read aloud the SAS, or in what context they will bring 4632 it up or say it, or even which of the two speakers will say it, or if 4633 indeed they both will say it. In addition, some methods of rendering 4634 the SAS involve using a list of words such as the PGP word 4635 list[Juola2], in a manner analogous to how pilots use the NATO 4636 phonetic alphabet to convey information. This can make it even more 4637 complicated for the attacker, because these words can be worked into 4638 the conversation in unpredictable ways. Remember that the attacker 4639 places a very high value on not being detected, and if he makes a 4640 mistake, he doesn't get to do it over. Some people have raised the 4641 question that even if the attacker lacks voice impersonation 4642 capabilities, it may be unsafe for people who don't know each other's 4643 voices to depend on the SAS procedure. This is not as much of a 4644 problem as it seems, because it isn't necessary that they recognize 4645 each other by their voice, it is only necessary that they detect that 4646 the voice used for the SAS procedure matches the voice in the rest of 4647 the phone conversation. 4649 Special consideration must be given to secure phone calls with 4650 automated systems that cannot perform a verbal SAS comparison between 4651 two humans. If a well functioning PKI is available to all parties, 4652 it is recommended that credentials be provisioned at the automated 4653 system sufficient to use one of the automatic MiTM detection 4654 mechanisms from Section 8.1.1 or Section 7.2. However, those 4655 optional PKI-dependent mechanisms may be avoided if the automated 4656 system (e.g. a voice mail system) is hosted in a PBX that has 4657 previously established a cached shared secret with the caller 4658 (pbxsecret or rs1 or both), backed by a human-executed SAS comparison 4659 during an initial call. In other words, a ZRTP endpoint that is 4660 manned during an initial session for the SAS compare, and unmanned in 4661 a subsequent voice mail session. Note that it is worse than useless 4662 and absolutely unsafe to rely on a robot voice from the remote 4663 endpoint to compare the SAS, because a robot voice can be easily 4664 forged by a MiTM. However, a robot voice may be safe to use strictly 4665 locally for a different purpose. A ZRTP user agent may render its 4666 locally-computed SAS to the local user via a robot voice if no visual 4667 display is available, provided the user can readily determine that 4668 the robot voice is generated locally, not from the remote endpoint. 4670 A popular and field-proven approach to MiTM protection is used by SSH 4671 (Secure Shell) [RFC4251], which Peter Gutmann likes to call the "baby 4672 duck" security model. SSH establishes a relationship by exchanging 4673 public keys in the initial session, when we assume no attacker is 4674 present, and this makes it possible to authenticate all subsequent 4675 sessions. A successful MiTM attacker has to have been present in all 4676 sessions all the way back to the first one, which is assumed to be 4677 difficult for the attacker. ZRTP's key continuity features are 4678 actually better than SSH, at least for VoIP, for reasons described in 4679 Section 15.1. All this is accomplished without resorting to a 4680 centrally-managed PKI. 4682 We use an analogous baby duck security model to authenticate the DH 4683 exchange in ZRTP. We don't need to exchange persistent public keys, 4684 we can simply cache a shared secret and re-use it to authenticate a 4685 long series of DH exchanges for secure phone calls over a long period 4686 of time. If we read aloud just one SAS, and then cache a shared 4687 secret for later calls to use for authentication, no new voice 4688 authentication rituals need to be executed. We just have to remember 4689 we did one already. 4691 If one party ever loses this cached shared secret, it is no longer 4692 available for authentication of DH exchanges. This cache mismatch 4693 situation is easy to detect by the party that still has a surviving 4694 shared secret cache entry. If it fails to match, either there is a 4695 MiTM attack or one side has lost their shared secret cache entry. 4696 The user agent that discovers the cache mismatch must alert the user 4697 that a cache mismatch has been detected, and that he must do a verbal 4698 comparison of the SAS to distinguish if the mismatch is because of a 4699 MiTM attack or because of the other party losing her cache (normative 4700 language is in Section 4.3.2). Voice confirmation is absolutely 4701 essential in this situation. From that point on, the two parties 4702 start over with a new cached shared secret. Then they can go back to 4703 omitting the voice authentication on later calls. 4705 Precautions must be observed when using a trusted MiTM device such as 4706 a trusted PBX, as described in Section 7.3. Make sure you really 4707 trust that this PBX will never be compromised before establishing it 4708 as a trusted MiTM, because it is in a position to wiretap calls for 4709 any phone that trusts it. It is "licensed" to be in a position to 4710 wiretap. You are safer to try to arrange the connection topology to 4711 route the media directly between the two ZRTP peers, not through a 4712 trusted PBX. Real end-to-end encryption is preferred. 4714 The security of the SAS mechanism depends on the user verifying it 4715 verbally with his peer at the other endpoint. There is some risk the 4716 user will not be so diligent, and may ignore the SAS. For a 4717 discussion on how users become habituated to security warnings in the 4718 PKI certificate world, see [Sunshine]. Part of the problems 4719 discussed in that paper are from the habituation syndrome common to 4720 most warning messages, and part of them are from the fact that users 4721 simply don't understand trust models. Fortunately, ZRTP doesn't need 4722 a trust model to use the SAS mechanism, so it's easier for the user 4723 to grasp the idea of comparing the SAS verbally with the other party. 4724 Easier than understanding a trust model, at least. Also, the verbal 4725 comparison of the SAS gets both users involved, and they will notice 4726 a mismatch of the SAS. Also, the ZRTP user agent will know when the 4727 SAS has been previously verified because of the SAS verified flag (V) 4728 (Section 7.1), and only ask the user to verify it when needed. After 4729 it has been verified once, the key continuity features make it 4730 unnecessary to verify it again. 4732 15.1. Self-healing Key Continuity Feature 4734 The key continuity features of ZRTP are analogous to those provided 4735 by SSH (Secure Shell) [RFC4251], but they differ in one respect. SSH 4736 caches public signature keys that never change, and uses a permanent 4737 private signature key that must be guarded from disclosure. If 4738 someone steals your SSH private signature key, they can impersonate 4739 you in all future sessions and can mount a successful MiTM attack any 4740 time they want. 4742 ZRTP caches symmetric key material used to compute secret session 4743 keys, and these values change with each session. If someone steals 4744 your ZRTP shared secret cache, they only get one chance to mount a 4745 MiTM attack, in the very next session. If they miss that chance, the 4746 retained shared secret is refreshed with a new value, and the window 4747 of vulnerability heals itself, which means they are locked out of any 4748 future opportunities to mount a MiTM attack. This gives ZRTP a 4749 "self-healing" feature if any cached key material is compromised. 4751 A MiTM attacker must always be in the media path. This presents a 4752 significant operational burden for the attacker in many VoIP usage 4753 scenarios, because being in the media path for every call is often 4754 harder than being in the signaling path. This will likely create 4755 coverage gaps in the attacker's opportunities to mount a MiTM attack. 4756 ZRTP's self-healing key continuity features are better than SSH at 4757 exploiting any temporary gaps in MiTM attack opportunities. Thus, 4758 ZRTP quickly recovers from any disclosure of cached key material. 4760 In systems that use a persistant private signature key, such as SSH, 4761 the stored signature key is usually protected from disclosure by 4762 encryption that requires a user-supplied high-entropy passphrase. 4763 This arrangement may be acceptable for a diligent user with a desktop 4764 computer sitting in an office with a full ASCII keyboard. But it 4765 would be prohibitively inconvenient and unsafe to type a high-entropy 4766 passphrase on a mobile phone's numeric keypad while driving a car. 4767 Users will reject any scheme that requires the use of a passphrase on 4768 such a platform. Which means mobile phones carry an elevated risk of 4769 compromise of stored key material, and thus would especially benefit 4770 from the self-healing aspects of ZRTP's key continuity features. 4772 The infamous Debian OpenSSL weak key vulnerability [dsa-1571] 4773 (discovered and patched in May 2008) offers a real-world example of 4774 why ZRTP's self-healing scheme is a good way to do key continuity. 4775 The Debian bug resulted in the production of a lot of weak SSH (and 4776 TLS/SSL) keys, which continued to compromise security even after the 4777 bug had been patched. In contrast, ZRTP's key continuity scheme adds 4778 new entropy to the cached key material with every call, so old 4779 deficiencies in entropy are washed away with each new session. 4781 It should be noted that the addition of shared secret entropy from 4782 previous sessions can extend the strength of the new session key to 4783 AES-256 levels, even if the new session uses Diffie-Hellman keys no 4784 larger than DH-3072 or ECDH-256, provided the cached shared secrets 4785 were initially established when the wiretapper was not present. This 4786 is why AES-256 MAY be used with the smaller DH key sizes in 4787 Section 5.1.5, despite the key strength comparisons in Table 2 of 4788 [SP800-57-Part1]. 4790 Caching shared symmetric key material is also less CPU intensive 4791 compared with using digital signatures, which may be important for 4792 low-power mobile platforms. 4794 Unlike the long-lived non-updated key material used by SSH, the 4795 dynamically updated shared secrets of ZRTP may lose sync if 4796 traditional backup/restore mechanisms are used. This limitation is a 4797 consequence of the otherwise beneficial aspects of this approach to 4798 key continuity, and it is partially mitigated by ZRTP's built-in 4799 cache backup logic (Section 4.6.1). 4801 16. Acknowledgments 4803 The authors would like to thank Bryce "Zooko" Wilcox-O'Hearn and 4804 Colin Plumb for their contributions to the design of this protocol, 4805 and to thank Hal Finney, Viktor Krikun, Werner Dittmann, Dan Wing, 4806 Sagar Pai, David McGrew, Colin Perkins, Dan Harkins, David Black, Tim 4807 Polk, Richard Harris, Roni Even, Jon Peterson, and Robert Sparks for 4808 their helpful comments and suggestions. And thanks to Lily Chen at 4809 NIST for her assistance in ensuring compliance with NIST SP800-56A 4810 and SP800-108. 4812 The use of hash chains to key HMACs in ZRTP is similar to Adrian 4813 Perrig's TESLA protocol [TESLA]. 4815 17. References 4817 17.1. Normative References 4819 [RFC2104] Krawczyk, H., Bellare, M., and R. Canetti, "HMAC: Keyed- 4820 Hashing for Message Authentication", RFC 2104, 4821 February 1997. 4823 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 4824 Requirement Levels", BCP 14, RFC 2119, March 1997. 4826 [RFC3526] Kivinen, T. and M. Kojo, "More Modular Exponential (MODP) 4827 Diffie-Hellman groups for Internet Key Exchange (IKE)", 4828 RFC 3526, May 2003. 4830 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 4831 Jacobson, "RTP: A Transport Protocol for Real-Time 4832 Applications", STD 64, RFC 3550, July 2003. 4834 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 4835 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 4836 RFC 3711, March 2004. 4838 [RFC4231] Nystrom, M., "Identifiers and Test Vectors for HMAC-SHA- 4839 224, HMAC-SHA-256, HMAC-SHA-384, and HMAC-SHA-512", 4840 RFC 4231, December 2005. 4842 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 4843 Description Protocol", RFC 4566, July 2006. 4845 [RFC4880] Callas, J., Donnerhacke, L., Finney, H., Shaw, D., and R. 4846 Thayer, "OpenPGP Message Format", RFC 4880, November 2007. 4848 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", 4849 RFC 4960, September 2007. 4851 [RFC5114] Lepinski, M. and S. Kent, "Additional Diffie-Hellman 4852 Groups for Use with IETF Standards", RFC 5114, 4853 January 2008. 4855 [RFC5479] Wing, D., Fries, S., Tschofenig, H., and F. Audet, 4856 "Requirements and Analysis of Media Security Management 4857 Protocols", RFC 5479, April 2009. 4859 [RFC5759] Solinas, J. and L. Zieglar, "Suite B Certificate and 4860 Certificate Revocation List (CRL) Profile", RFC 5759, 4861 January 2010. 4863 [I-D.ietf-avt-srtp-big-aes] 4864 McGrew, D., "The use of AES-192 and AES-256 in Secure 4865 RTP", 4866 http://tools.ietf.org/html/draft-ietf-avt-srtp-big-aes . 4868 [I-D.jivsov-openpgp-ecc] 4869 Jivsov, A., "ECC in OpenPGP", 4870 http://tools.ietf.org/html/draft-jivsov-openpgp-ecc . 4872 [FIPS-140-2-Annex-A] 4873 "Annex A: Approved Security Functions for FIPS PUB 140-2", 4874 NIST FIPS PUB 140-2 Annex A October 2008. 4876 [FIPS-140-2-Annex-D] 4877 "Annex D: Approved Key Establishment Techniques for FIPS 4878 PUB 140-2", NIST FIPS PUB 140-2 Annex D January 2008. 4880 [FIPS-180-3] 4881 "Secure Hash Standard (SHS)", NIST FIPS PUB 180-3 October 4882 2008. 4884 [FIPS-186-3] 4885 "Digital Signature Standard (DSS)", NIST FIPS PUB 186- 4886 3 June 2009. 4888 [FIPS-197] 4889 "Advanced Encryption Standard (AES)", NIST FIPS PUB 4890 197 November 2001. 4892 [FIPS-198-1] 4893 "The Keyed-Hash Message Authentication Code (HMAC)", NIST 4894 FIPS PUB 198-1 July 2008. 4896 [SP800-38A] 4897 Dworkin, M., "Recommendation for Block Cipher Modes of 4898 Operation", NIST Special Publication 800-38A 2001 Edition. 4900 [SP800-56A] 4901 Barker, E., Johnson, D., and M. Smid, "Recommendation for 4902 Pair-Wise Key Establishment Schemes Using Discrete 4903 Logarithm Cryptography", NIST Special Publication 800- 4904 56A Revision 1, March 2007. 4906 [SP800-90] 4907 Barker, E. and J. Kelsey, "Recommendation for Random 4908 Number Generation Using Deterministic Random Bit 4909 Generators", NIST Special Publication 800-90 (Revised) 4910 March 2007. 4912 [SP800-108] 4913 Chen, L., "Recommendation for Key Derivation Using 4914 Pseudorandom Functions", NIST Special Publication 800- 4915 108 October 2009. 4917 [NSA-Suite-B] 4918 "NSA Suite B Cryptography", NSA Information Assurance 4919 Directorate NSA Suite B Cryptography. 4921 [NSA-Suite-B-Guide-56A] 4922 "Suite B Implementer's Guide to NIST SP 800-56A", Suite B 4923 Implementer's Guide to NIST SP 800-56A 28 July 2009. 4925 [TwoFish] Schneier, B., Kelsey, J., Whiting, D., Hall, C., and N. 4926 Ferguson, "Twofish: A 128-Bit Block Cipher", 4927 http://www.schneier.com/paper-twofish-paper.html . 4929 [Skein] Ferguson, N., Lucks, S., Schneier, B., Whiting, D., 4930 Bellare, M., Kohno, T., Callas, J., and J. Walker, "The 4931 Skein Hash Function Family, Version 1.2 - 15 Sep 2009", ht 4932 tp://www.skein-hash.info/sites/default/files/ 4933 skein1.2.pdf . 4935 [pgpwordlist] 4936 "PGP Word List", 4937 http://philzimmermann.com/docs/PGP_word_list.pdf . 4939 17.2. Informative References 4941 [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, 4942 November 1990. 4944 [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery 4945 for IP version 6", RFC 1981, August 1996. 4947 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 4948 A., Peterson, J., Sparks, R., Handley, M., and E. 4949 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 4950 June 2002. 4952 [RFC3514] Bellovin, S., "The Security Flag in the IPv4 Header", 4953 RFC 3514, April 1 2003. 4955 [RFC3824] Peterson, J., Liu, H., Yu, J., and B. Campbell, "Using 4956 E.164 numbers with the Session Initiation Protocol (SIP)", 4957 RFC 3824, June 2004. 4959 [RFC4086] Eastlake, D., Schiller, J., and S. Crocker, "Randomness 4960 Requirements for Security", BCP 106, RFC 4086, June 2005. 4962 [RFC4251] Ylonen, T. and C. Lonvick, "The Secure Shell (SSH) 4963 Protocol Architecture", RFC 4251, January 2006. 4965 [RFC4474] Peterson, J. and C. Jennings, "Enhancements for 4966 Authenticated Identity Management in the Session 4967 Initiation Protocol (SIP)", RFC 4474, August 2006. 4969 [RFC4475] Sparks, R., Hawrylyshen, A., Johnston, A., Rosenberg, J., 4970 and H. Schulzrinne, "Session Initiation Protocol (SIP) 4971 Torture Test Messages", RFC 4475, May 2006. 4973 [RFC4567] Arkko, J., Lindholm, F., Naslund, M., Norrman, K., and E. 4974 Carrara, "Key Management Extensions for Session 4975 Description Protocol (SDP) and Real Time Streaming 4976 Protocol (RTSP)", RFC 4567, July 2006. 4978 [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session 4979 Description Protocol (SDP) Security Descriptions for Media 4980 Streams", RFC 4568, July 2006. 4982 [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol 4983 (SIP) Call Control - Conferencing for User Agents", 4984 BCP 119, RFC 4579, August 2006. 4986 [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, 4987 January 2008. 4989 [RFC5245] Rosenberg, J., "Interactive Connectivity Establishment 4990 (ICE): A Protocol for Network Address Translator (NAT) 4991 Traversal for Offer/Answer Protocols", RFC 5245, 4992 April 2010. 4994 [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer 4995 Security (DTLS) Extension to Establish Keys for the Secure 4996 Real-time Transport Protocol (SRTP)", RFC 5764, May 2010. 4998 [RFC5869] Krawczyk, H. and P. Eronen, "HMAC-based Extract-and-Expand 4999 Key Derivation Function (HKDF)", RFC 5869, May 2010. 5001 [I-D.perkins-avt-srtp-vbr-audio] 5002 Perkins, C. and J. Valin, "Guidelines for the use of 5003 Variable Bit Rate Audio with Secure RTP", http:// 5004 tools.ietf.org/html/draft-perkins-avt-srtp-vbr-audio . 5006 [I-D.wing-sip-identity-media] 5007 Wing, D. and H. Kaplan, "SIP Identity using Media Path", 5008 http://tools.ietf.org/html/draft-wing-sip-identity-media . 5010 [I-D.mcgrew-fundamental-ecc] 5011 McGrew, D., "Fundamental Elliptic Curve Cryptography 5012 Algorithms", 5013 http://tools.ietf.org/html/draft-mcgrew-fundamental-ecc . 5015 [SP800-57-Part1] 5016 Barker, E., Barker, W., Burr, W., Polk, W., and M. Smid, 5017 "Recommendation for Key Management - Part 1: General 5018 (Revised)", NIST Special Publication 800-57 - Part 5019 1 Revised March 2007. 5021 [SHA-3] "Cryptographic Hash Algorithm Competition", NIST Computer 5022 Security Resource Center Cryptographic Hash Project. 5024 [Skein1] "The Skein Hash Function Family - Web site", 5025 http://www.skein-hash.info/ . 5027 [XEP-0262] 5028 Saint-Andre, P., "Use of ZRTP in Jingle RTP Sessions", XSF 5029 XEP 0262, February 2009. 5031 [Ferguson] 5032 Ferguson, N. and B. Schneier, "Practical Cryptography", 5033 Wiley Publishing 2003. 5035 [Juola1] Juola, P. and P. Zimmermann, "Whole-Word Phonetic 5036 Distances and the PGPfone Alphabet", Proceedings of the 5037 International Conference of Spoken Language Processing 5038 (ICSLP-96) 1996. 5040 [Juola2] Juola, P., "Isolated Word Confusion Metrics and the 5041 PGPfone Alphabet", Proceedings of New Methods in Language 5042 Processing 1996. 5044 [pgpfone] Zimmermann, P., "PGPfone", 5045 http://philzimmermann.com/docs/pgpfone10b7.pdf . 5047 [zfone] Zimmermann, P., "Zfone", 5048 http://www.philzimmermann.com/zfone . 5050 [Byzantine] 5051 "The Two Generals' Problem", 5052 http://en.wikipedia.org/wiki/Two_Generals%27_Problem . 5054 [TESLA] Perrig, A., Canetti, R., Tygar, J., and D. Song, "The 5055 TESLA Broadcast Authentication Protocol", http:// 5056 www.ece.cmu.edu/~adrian/projects/tesla-cryptobytes/ 5057 tesla-cryptobytes.pdf . 5059 [z-base-32] 5060 Wilcox-O'Hearn, B., "Human-oriented base-32 encoding", htt 5061 p://philzimmermann.com/docs/ 5062 human-oriented-base-32-encoding.txt , November 2009. 5064 [comsec] Blossom, E., "The VP1 Protocol for Voice Privacy Devices 5065 Version 1.2", http://www.comsec.com/vp1-protocol.pdf . 5067 [cryptophone] 5068 "CryptoPhone", http://www.cryptophone.de/ . 5070 [Wright1] Wright, C., Ballard, L., Coull, S., Monrose, F., and G. 5071 Masson, "Spot me if you can: Uncovering spoken phrases in 5072 encrypted VoIP conversations", Proceedings of the 2008 5073 IEEE Symposium on Security and Privacy 2008. 5075 [Sunshine] 5076 Sunshine, J., Egelman, S., Almuhimedi, H., Atri, N., and 5077 L. Cranor, "Crying Wolf: An Empirical Study of SSL Warning 5078 Effectiveness", USENIX Security Symposium 2009. 5080 [dsa-1571] 5081 "Debian Security Advisory - OpenSSL predictable random 5082 number generator", 5083 http://www.debian.org/security/2008/dsa-1571 . 5085 Authors' Addresses 5087 Philip Zimmermann 5088 Zfone Project 5089 Santa Cruz, California 5091 Email: prz@mit.edu 5093 Alan Johnston (editor) 5094 Avaya 5095 St. Louis, MO 63124 5097 Email: alan.b.johnston@gmail.com 5099 Jon Callas 5100 Apple, Inc. 5102 Email: jon@callas.org