AVT WG P. Zimmermann Internet-Draft Phil Zimmermann and Associates LLC Expires: September 6, 2006 A. Johnston, Ed. SIPStation J. Callas PGP Corporation March 5, 2006 ZRTP: Extensions to RTP for Diffie-Hellman Key Agreement for SRTP draft-zimmermann-avt-zrtp-01 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on September 6, 2006. Copyright Notice Copyright (C) The Internet Society (2006). Abstract This document defines ZRTP, RTP (Real-time Transport Protocol) header extensions for a Diffie-Hellman exchange to agree on a session key and parameters for establishing Secure RTP (SRTP) sessions. The ZRTP protocol is completely self-contained in RTP and does not require support in the signaling protocol or assume a Public Key Zimmermann, et al. Expires September 6, 2006 [Page 1] Internet-Draft ZRTP March 2006 Infrastructure (PKI) infrastructure. For the media session, ZRTP provides confidentiality, protection against Man in the Middle (MitM) attacks, and, in cases where a secret is available from the signaling protocol, authentication. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 3. Protocol Description . . . . . . . . . . . . . . . . . . . . . 7 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2. Key Agreement Algorithm . . . . . . . . . . . . . . . . . 9 3.2.1. Discovery . . . . . . . . . . . . . . . . . . . . . . 9 3.2.2. Hash Commitment . . . . . . . . . . . . . . . . . . . 10 3.2.3. Diffie-Hellman Exchange . . . . . . . . . . . . . . . 11 3.2.4. Confirmation and Switch to SRTP . . . . . . . . . . . 15 3.3. Random Number Generation . . . . . . . . . . . . . . . . . 16 4. RTP Header Extensions . . . . . . . . . . . . . . . . . . . . 17 4.1. ZRTP Message Formats . . . . . . . . . . . . . . . . . . . 17 4.1.1. Message Type Block . . . . . . . . . . . . . . . . . . 17 4.1.2. Message Type Block . . . . . . . . . . . . . . . . . . 18 4.1.3. Cipher Type Block . . . . . . . . . . . . . . . . . . 19 4.1.4. Public Key Type Block . . . . . . . . . . . . . . . . 19 4.1.5. SAS Type Block . . . . . . . . . . . . . . . . . . . . 19 4.2. Hello message . . . . . . . . . . . . . . . . . . . . . . 20 4.3. HelloACK message . . . . . . . . . . . . . . . . . . . . . 21 4.4. Commit message . . . . . . . . . . . . . . . . . . . . . . 22 4.5. DHPart1 message . . . . . . . . . . . . . . . . . . . . . 23 4.6. DHPart2 message . . . . . . . . . . . . . . . . . . . . . 24 4.7. Confirm1 message . . . . . . . . . . . . . . . . . . . . . 25 4.8. Confirm2 message . . . . . . . . . . . . . . . . . . . . . 26 4.9. Conf2ACK message . . . . . . . . . . . . . . . . . . . . . 27 4.10. Error message . . . . . . . . . . . . . . . . . . . . . . 27 4.11. GoClear message . . . . . . . . . . . . . . . . . . . . . 28 4.12. ClearACK message . . . . . . . . . . . . . . . . . . . . . 29 5. Retransmissions . . . . . . . . . . . . . . . . . . . . . . . 29 6. Short Authentication String . . . . . . . . . . . . . . . . . 30 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32 8. Security Considerations . . . . . . . . . . . . . . . . . . . 32 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 32 10. Appendix - ZRTP, SIP, and SDP . . . . . . . . . . . . . . . . 33 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 33 11.1. Normative References . . . . . . . . . . . . . . . . . . . 33 11.2. Informative References . . . . . . . . . . . . . . . . . . 34 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 35 Intellectual Property and Copyright Statements . . . . . . . . . . 36 Zimmermann, et al. Expires September 6, 2006 [Page 2] Internet-Draft ZRTP March 2006 1. Introduction ZRTP is key agreement protocol which performs Diffie-Hellman key exchange during call setup in-band in the Real-time Transport Protocol (RTP) [1] media stream which has been established using some other signaling protocol such as Session Initiation Protocol (SIP) [11]. This generates a shared secret which is then used to generate keys and salt for a Secure RTP (SRTP) [2] session. ZRTP borrows ideas from PGPfone [7]. A reference implementation of ZRTP is available as Zfone [8]. The ZRTP protocol has some nice cryptographic features lacking in many other approaches to media session encryption. Although it uses a public key algorithm, it does not rely on a public key infrastructure (PKI). In fact, it does not use persistent public keys at all. It uses ephemeral Diffie-Hellman (DH) with hash commitment, and allows the detection of Man in the Middle (MitM) attacks by displaying a short authentication string for the users to read and compare over the phone. It has perfect forward secrecy, meaning the keys are destroyed at the end of the call, which precludes retroactively compromising the call by future disclosures of key material. But even if the users are too lazy to bother with short authentication strings, we still get fairly decent authentication against a MitM attack, based on a form of key continuity. It does this by caching some key material to use in the next call, to be mixed in with the next call's DH shared secret, giving it key continuity properties analogous to SSH. All this is done without reliance on a PKI, key certification, trust models, certificate authorities, or key management complexity that bedevils the email encryption world. It also does not rely on SIP signaling for the key management, and in fact does not rely on any servers at all. It performs its key agreements and key management in a purely peer-to-peer manner over the RTP packet stream. Most secure phones rely on a Diffie-Hellman exchange to agree on a common session key. But since DH is susceptible to a man-in-the- middle (MitM) attack, it is common practice to provide a way to authenticate the DH exchange. In some military systems, this is done by depending on digital signatures backed by a centrally-managed PKI. A decade of industry experience has shown that deploying centrally managed PKIs can be a painful and often futile experience. PKIs are just too messy, and require too much activation energy to get them started. Setting up a PKI requires somebody to run it, which is not practical for an equipment provider. A service provider like a carrier might venture down this path, but even then you have to deal with cross-carrier authentication, certificate revocation lists, and other complexities. It is much simpler to avoid PKIs altogether, especially when developing secure commercial products. It is Zimmermann, et al. Expires September 6, 2006 [Page 3] Internet-Draft ZRTP March 2006 therefore more common for commercial secure phones to augment the DH exchange with a Short Authentication String (SAS) combined with a hash commitment at the start of the key exchange, to shorten the length of SAS material that must be read aloud. No PKI is required for this approach to authenticating the DH exchange. The AT&T 3600, Eric Blossom's COMSEC secure phones [9], PGPfone [7], and CryptoPhone [10] are all examples of products that took this simpler lightweight approach. The main problem with this approach is inattentive users who may not execute the voice authentication procedure, or unattended secure phone calls to answering machines that cannot execute it. Additionally, some people worry about voice spoofing (the "Rich Little" attack), and some worry about trying to use it between people who don't know each other's voices. This is not as much of a problem as it seems, because it isn't necessary that they recognize each other by their voice, it's only necessary that they detect that the voice used for the SAS procedure matches the voice in the rest of the phone call. These concerns are not enough reason to embrace PKIs as an alternative, in my opinion. A popular and field-proven approach is used by SSH (Secure Shell) [12], which Peter Gutmann likes to call the "baby duck" security model. SSH establishes a relationship by exchanging public keys in the initial session, when we assume no attacker is present, and this makes it possible to authenticate all subsequent sessions. A successful MitM attacker has to have been present in all sessions all the way back to the first one, which is assumed to be difficult for the attacker. All this is accomplished without resorting to a centrally-managed PKI. We use an analogous baby duck security model to authenticate the DH exchange in ZRTP. We don't need to exchange persistent public keys, we can simply cache a shared secret and re-use it to authenticate a long series of DH exchanges for secure phone calls over a long period of time. If we read aloud just one SAS, and then cache a shared secret for later calls to use for authentication, no new voice authentication rituals need to be executed. We just have to remember we did one already. If we ever lose this cached shared secret, it is no longer available for authentication of DH exchanges, so we would have to do a new SAS procedure and start over with a new cached shared secret. Then we could go back to omitting the voice authentication on later calls. A particularly compelling reason why this approach is attractive is that SAS is easiest to implement when a GUI or some sort of display is available, which raises the question of what to do when no display Zimmermann, et al. Expires September 6, 2006 [Page 4] Internet-Draft ZRTP March 2006 is available. We envision some products that implement secure VoIP via a local network proxy, which lacks a display in many cases. If we take an approach that greatly reduces the need for a SAS in each and every call, we can operate in GUI-less products with greater ease. It's a good idea to force your opponent to have to solve multiple problems in order to mount a successful attack. Some examples of widely differing problems we might like to present him with are: Stealing a shared secret from one of the parties, being present on the very first session and every subsequent session to carry out an active MitM attack, and solving the discrete log problem. We want to force the opponent to solve more than one of these problems to succeed. The protocol can make use different kinds of shared secrets. Each type of shared secret is determined by a different method. All of the shared secrets are hashed together to form a session key to encrypt the call. An attacker must defeat all of the methods in order to determine the session key. First, there is the shared secret determined entirely by a Diffie- Hellman key agreement. It changes with every call, based on random numbers. An attacker may attempt a classic DH MitM attack on this secret, but we can protect against this by displaying and reading aloud a SAS, combined with adding a hash commitment at the beginning of the DH exchange. Second, there is an evolving shared secret, or ongoing shared secret that is automatically changed and refreshed and cached with every new session. We will call this the cached shared secret, or sometimes the retained shared secret. Each new image of this ongoing secret is a non-invertable function of its previous value and the new secret derived by the new DH agreement. It's possible that no cached shared secret is available, because there were no previous sessions to inherit this value from, or because one side loses its cache. There are other approaches for key agreement for SRTP that compute a shared secret using information in the signaling. For example, [14] describes how to carry a MIKEY (Multimedia Internet KEYing) [15] payload in SDP [16]. Or [13] describes directly carrying SRTP keying and configuration information in SDP. ZRTP does not rely on the signaling to compute a shared secret, but If a client does produce a shared secret via the signaling, and makes it available to the ZRTP protocol, ZRTP can make use of this shared secret to augment the list of shared secrets that will be hashed together to form a session key. This way, any security weaknesses that might compromise the shared secret contributed by the signaling will not harm the final resulting Zimmermann, et al. Expires September 6, 2006 [Page 5] Internet-Draft ZRTP March 2006 session key. There may also be a static shared secret that the two parties agree on out-of-band in advance. A hashed passphrase would suffice. The shared secret provided by the signaling (if available), the shared secret computed by DH, and the cached shared secret are all hashed together to compute the session key for a call. If the cached shared secret is not available, it is omitted from the hash computation. If the signaling provides no shared secret, it is also omitted from the hash computation. No DH MitM attack can succeed if the ongoing shared secret is available to the two parties, but not to the attacker. This is because the attacker cannot compute a common session key with either party without knowing the cached secret component, even if he correctly executes a classic DH MitM attack. Mixing in the cached shared secret for the session key calculation allows it to act as an implicit authenticator to protect the DH exchange, without requiring additional explicit HMACs to be computed on the DH parameters. If the cached shared secret is available, a MitM attack would be instantly detected by the failure to achieve a shared session key, resulting in undecryptable packets. The protocol can easily detect this. It would be more accurate to say that the MitM attack is not merely detected, but thwarted. When adding the complexity of additional shared secrets beyond the familiar DH key agreement, we must make sure the lack of availability of the cached shared secret cannot prevent a call from going through, and we must also prevent false alarms that claim an attack was detected. An added benefit of using these cached shared secrets to mix in with the session keys is that it augments the entropy of the session key. Even if limits on the size of the DH exchange produces a session key with less than 256 bits of real work factor, the added entropy from the cached shared secret can bring up all the subsequent session keys to the full 256-bit AES key strength, assuming no attacker was present in the first call. We could have authenticated the DH exchange the same way SSH does it, with digital signatures, caching public keys instead of shared secrets. But this approach with caching shared secrets seemed a bit simpler, and has the added benefit of adding more entropy to the session keys. The following sections provide an overview of the ZRTP protocol, describe the key agreement algorithm and RTP header extensions. Zimmermann, et al. Expires September 6, 2006 [Page 6] Internet-Draft ZRTP March 2006 2. Terminology In this document, the key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as described in RFC 2119 and indicate requirement levels for compliant implementations. 3. Protocol Description 3.1. Overview This section provides a description of how ZRTP works. This description is non-normative in nature but is included to build understanding of the protocol. ZRTP is negotiated the same way a conventional RTP session is negotiated. Using SIP, the AVP/RTP profile is used in SDP. The ZRTP protocol begins after two endpoints have utilized a signaling protocol such as SIP and are ready to send or have already begun sending RTP packets. This specification defines new RTP extension header which is used to carry the ZRTP messages between the endpoints. Since RTP endpoints ignore unknown extension headers, the protocol is fully backwards compatible - a ZRTP endpoint attempting to perform key agreement with a non-ZRTP endpoint will simply receive normal RTP responses and can then inform the user that a secure session is not possible and either continue with the insecure session or terminate the session depending on the user's security policy. The ZRTP exchange begins at the same time that the first RTP packets are exchanged between the endpoints. A ZRTP message can be embedded in RTP messages containing actual media samples, or they may be sent in separate RTP messages. For example, if the RTP payload or codec supports silence or no-op messages, then these can be used for RTP transport. If none of these are supported, an RTP packet containing comfort noise can be generated to carry a ZRTP message. A ZRTP endpoint initiates the exchange by sending a ZRTP Hello message to the other endpoint. The purpose of the Hello message is to discover if the other endpoint supports the protocol and to see what algorithms the two ZRTP endpoints have in common. The Hello message contains the SRTP configuration options, and the ZID. Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID that is generated once at installation time. It is used to look up retained shared secrets in a local cache. A single global ZID for a single installation is the simplest way to implement ZIDs, and may be required in applications where the encryption is being done by a Zimmermann, et al. Expires September 6, 2006 [Page 7] Internet-Draft ZRTP March 2006 "bump in the cord" proxy that does not know who is being called. However, it is specifically not precluded for an implementation to use multiple ZIDs, up to the limit of a separate one per callee. This then turns it into a long-lived "association ID" that does not apply to any other associations between a different pair of parties. It is a goal of this protocol to permit both options to interoperate freely. A response to a ZRTP Hello message is a ZRTP HelloACK message. The HelloACK message simply acknowledges receipt of the Hello message and indicates support for the ZRTP protocol. Since RTP uses best effort UDP transport, ZRTP has retransmission timers in case of lost datagrams. There are two timers, both with exponential backoff mechanisms. One timer is used for retransmissions of Hello messages and the other is used for retransmissions of all other messages after receipt of a HelloACK which indicates support of ZRTP by the other endpoint. After both endpoints exchange Hello and HelloACK messages, the key agreement exchange can begin with the ZRTP Commit message. An example call flow is shown in Figure 1 below. Note that the order of the Hello/HelloACK exchanges in F1/F2 and F3/F4 may be reversed. Also, an endpoint that receives a Hello message and wishes to immediately begin the ZRTP key agreement can omit the HelloACK and send the Commit instead. In Figure 1, this would result in messages F2, F3, and F4 being omitted. Note that the endpoint which sends the Commit message is considered the initiator of the ZRTP session and drives the key agreement exchange. Zimmermann, et al. Expires September 6, 2006 [Page 8] Internet-Draft ZRTP March 2006 Alice Bob | | | Alice and Bob establish a media session.| | | | RTP | |<=======================================>| | | | Hello (ver,cid,hash,cipher,pkt,sas,Alice's ZID) F1 |---------------------------------------->| | HelloACK F2 | |<----------------------------------------| | Hello (ver,cid,hash,cipher,pkt,sas,Bob's ZID) F3 |<----------------------------------------| | HelloACK F4 | |---------------------------------------->| | | | Bob acts as the initiator | | | | Commit (Bob's ZID,hash,cipher,pkt,hvi) F5 |<----------------------------------------| | DHPart1 (pvr,rs1IDr,rs2IDr,sigsIDr,srtpsIDr,other_secretIDr) F6 |---------------------------------------->| | DHPart2 (pvi,rs1IDi,rs2IDi,sigsIDi,ssrtpIDi,other_secretIDi) F7 |<----------------------------------------| | | | Alice and Bob generate SRTP session key.| | | | SRTP begins | |<=======================================>| | | | Confirm1 (plaintext,sasflag,hmac) F8 | |---------------------------------------->| | Confirm2 (plaintext,sasflag,hmac) F9 | |<----------------------------------------| | Confirm2AK F10 | |---------------------------------------->| Figure 1. Establishment of a SRTP session using ZRTP 3.2. Key Agreement Algorithm The key agreement algorithm has four phases that are described normatively in the following sections. 3.2.1. Discovery During the discovery phase, a ZRTP endpoint discovers if the other endpoint supports ZRTP and which ZRTP version, hash, cipher, public Zimmermann, et al. Expires September 6, 2006 [Page 9] Internet-Draft ZRTP March 2006 key type, and sas algorithms are supported. In addition, each endpoint sends and discovers ZIDs. The received ZID is used to retrieve previous retained shared secrets, rs1 and rs2. If the endpoint has other secrets, then they are also collected. The signaling secret (sigs), is passed from the signaling protocol used to establish the RTP session. For SIP, it is the dialog identifier of a Secure SIP (SIPS) session: a string composed of Call-ID, to tag, and from tag. From the definitions in RFC 3261 [11]: sigs = hash(call-id | to-tag | from-tag) Note: the dialog identifier of a non-secure SIP session should not be considered a signaling secret as it has no confidentiality protection. For the SRTP secret (srtps), it is the SRTP master key and salt. This information may have been passed in the signaling using MIKEY or SDP Security Descriptions, for example: srtps = hash(SRTP master key | SRTP master salt) Additional shared secrets can be defined and used as other_secret. If no secret of a given type is available, a random value is generated and used for that secret to ensure a mismatch in the hash comparisons in the DHPart1 and DHPart2 messages. This prevents an eavesdropper from knowing how many shared secrets are available between the endpoints. A Hello message can be sent at any time, but is usually sent at the start of an RTP session to determine if the other endpoint supports ZRTP, and also if the SRTP implementations are compatible. A Hello message is retransmitted using timer T1 and an exponential backoff mechanism detailed in Section 5 until the receipt of a HelloACK message or a Commit message. 3.2.2. Hash Commitment The hash commitment is performed by the initiator of the ZRTP exchange. From the intersection of the algorithms in the sent and received Hello messages, the initiator chooses a hash, cipher, public key type, and sas algorithm to be used. The key agreement begins with the initiator choosing a fresh random Diffie-Hellman (DH) secret value (svi) based on the chosen public key type value, and computing the public value. (Note that to speed up processing, this computation can be done in advance.) For guidance on generating random numbers, see the section on Random Number Generation. The Diffie-Hellman secret value, svi, SHOULD be twice as long as the AES key length. This means, if AES 128 is used, the DH secret value SHOULD be 256 bits long. If AES 256 is used, the secret Zimmermann, et al. Expires September 6, 2006 [Page 10] Internet-Draft ZRTP March 2006 value SHOULD be 512 bits long. pvi = g^svi mod p where g and p are determined by the public key type value, and a hash, hvi, of the public value using the chosen hash algorithm. The hvi includes the set of hash, cipher, pkt, and sas types from the responder's Hello message in the following order: hvi=hash(pvi | hashr1-5 | cipherr1-5 | pktr1-5 | sasr1-5) The information from the responder's Hello message is included in the hash calculation to prevent a bid-down attack by modification of the responder's Hello message. Note: If both sides send Commit messages initiating a secure session at the same time, the Commit message with the lowest hvi value is discarded and the other side is the initiator. This breaks the tie, allowing the protocol to proceed from this point with a clear definition of who is the initiator and who is the responder. 3.2.3. Diffie-Hellman Exchange The purpose of the Diffie-Hellman exchange is for the two ZRTP endpoints to generate a new shared secret, s0. In addition, the endpoints discover if they have any shared secrets in common. If they do, this exchange allows them to discover how many and agree on an ordering for them: s1, s2, etc. 3.2.3.1. Responder Behavior Upon receipt of the Commit message, the responder generates its own fresh random DH secret value, svr, and computes the public value. (Note that to speed up processing, this computation can be done in advance.) For guidance on random number generation, see the section on Random Number Generation. The Diffie-Hellman secret value, svr, SHOULD be twice as long as the AES key length. This means, if AES 128 is used, the DH secret value SHOULD be 256 bits long. If AES 256 is used, the secret value SHOULD be 512 bits long. pvr = g^svr mod p The final shared secret, s0, is calculated by hashing the concatenation of the Diffie-Hellman shared secret (DHSS) followed by the (possibly empty) set of shared secrets that are actually shared between the initiator and responder. For computing the hash, the shared secrets are sorted by ascending order of the initiator's corresponding shared secret IDs. The remainder of this section Zimmermann, et al. Expires September 6, 2006 [Page 11] Internet-Draft ZRTP March 2006 describes an algorithm to accomplish this. First, an HMAC keyed hash is calculated using the first retained shared secret, rs1, as the key on the string "Responder" which generates a retained secret ID, rs1IDr, which is truncated to 64 bits. HMACs are calculated in a similar way for additonal shared secrets: rs1IDr = HMAC(rs1, "Responder") rs2IDr = HMAC(rs2, "Responder") sigsIDr = HMAC(sigs, "Responder") srtpsIDr = HMAC(srtps, "Responder") other_secretIDr = HMAC(other_secret, "Responder") A ZRTP DHPart1 message is generated containing pvr and the set of keyed hashes (HMACs) derived from the possibly shared secrets. Upon receipt of the DHPart2 message, the responder checks that the initiator's public DH value is not equal to 1 or p-1. An attacker might inject a false DHPart2 packet with a value of 1 or p-1 for g^svi mod p, which would cause a disastrously weak final DH result to be computed. If pvi is 1 or p-1, the user should be alerted of the attack and the protocol must be aborted. Otherwise, the responder then computes the hash of the public DH value in the DHPart2 with the hash from the Commit. If they are different (hash(pvi)!= hvi), a MitM attack is taking place and the user is alerted. The responder then calculates the Diffie-Hellman result: DHResult = pvi^svr mod p The responder then calculates the Diffie-Hellman shared secret: DHSS = hash(DHResult) The set of five shared secret IDs received from the DHPart2 message are stored as set A. The responder then calculates the set of secret IDs that are expected to be received from the initiator in the DHPart2 message: rs1IDi = HMAC(rs1, "Initiator") rs2IDi = HMAC(rs2, "Initiator") Zimmermann, et al. Expires September 6, 2006 [Page 12] Internet-Draft ZRTP March 2006 sigsIDi = HMAC(sigs, "Initiator") srtpsIDi = HMAC(srtps, "Initiator") other_secretIDi = HMAC(other_secret, "Initiator") The set (rs1IDi, rs2IDi, sigsIDi, srtpsIDi, other_secretIDi) is set B. Set C is the intersection of set A and set B. Set C is then sorted in ascending numerical order. Set C will contain between zero and five secret IDs. Set D is then created as the actual secrets corresponding to the secret IDs in set C in the same order. The set D is expanded to 5 values by adding in null secrets: s1, s2, s3, s4, and s5. The final shared secret, s0, is calculated by hashing the concatenation of the DHSS and the set of non-null shared secrets. As a result, the null secrets have no effect on the concatenation operation: s0 = hash(DHSS | s1 | s2 | s3 | s4 | s5) 3.2.3.2. Initiator Behavior Upon receipt of the DHPart1 message, the initiator checks that the responder's public DH value is not equal to 1 or p-1. An attacker might inject a false DHPart1 packet with a value of 1 or p-1 for g^svr mod p, which would cause a disastrously weak final DH result to be computed. If pvr is 1 or p-1, the user should be alerted of the attack and the protocol must be aborted. If pvr is not 1 or p-1, the initiator looks up any retained shared secrets associated with the responder's ZID. The final shared secret, s0, is calculated by hashing the concatenation of the DHSS followed by the (possibly empty) set of shared secrets that are actually shared between the initiator and responder. For computing the hash, the shared secrets are sorted by ascending order of the initiator's corresponding shared secret IDs. The remainder of this section describes an algorithm to accomplish this. First, an HMAC keyed hash is calculated using the first retained shared secret, rs1, as the key on the string "Initiator" which generates a retained secret ID, rs1IDi, which is truncated to 64 bits. HMACs are calculated in a similar way for additional shared secrets: rs1IDi = HMAC(rs1, "Initiator") rs2IDi = HMAC(rs2, "Initiator") sigsIDi = HMAC(sigs, "Initiator") Zimmermann, et al. Expires September 6, 2006 [Page 13] Internet-Draft ZRTP March 2006 srtpsIDi = HMAC(srtps, "Initiator") other_secretIDi = HMAC(other_secret, "Initiator") The initiator then sends a DHPart2 message containing the initiator's public DH value and the set of calculated retained secret IDs. The initiator calculates the same Diffie-Hellman result using: DHResult = pvr^svi mod p The initiator then calculates the DH shared secret using: DHSS = hash(DHResult) The set of five shared secret IDs received in the DHPart1 message are stored as set A. The initiator then calculates the set of secret IDs that are expected to be received from the responder in the DHPart1 message: rs1IDr = HMAC(rs1, "Responder") rs2IDr = HMAC(rs2, "Responder") sigsIDr = HMAC(sigs, "Responder") srtpsIDr = HMAC(srtps, "Responder") other_secretIDr = HMAC(other_secret, "Responder") The set (rs1IDr, rs2IDr, sigsIDr, srtpsIDr, other_secretIDr) is B. Set C is the intersection of set A and set B. Set C will contain between zero and five secret IDs. Set D is then created as the actual secrets corresponding to the secret IDs in set C. Set E is the set of secret IDs that corresponds to the secrets in set D sent in the DHPart2 message. Set E is then sorted in ascending numerical order. Set D is then sorted to the same order as the corresponding secrets in set E. The set D is expanded to 5 values by adding in null secrets: s1, s2, s3, s4, and s5. The final shared secret, s0, is calculated by hashing the concatenation of the DHSS and the set of non-null shared secrets. As a result, the null secrets have no effect on the concatenation operation: s0 = hash(DHSS | s1 | s2 | s3 | s4 | s5) Zimmermann, et al. Expires September 6, 2006 [Page 14] Internet-Draft ZRTP March 2006 3.2.4. Confirmation and Switch to SRTP The SRTP master key and master salt are then generated using the shared secret. Separate SRTP keys and salts are used in each direction for each media stream. Unless otherwise specified, ZRTP uses SRTP with no MKI, 32 bit authentication using HMAC-SHA1, AES-CM 128 or 256 bit key length, 112 bit session salt key length, 2^48 key derivation rate, and SRTP prefix length 0. The ZRTP initiator encrypts and the ZRTP responder decrypts packets by using srtpkeyi and srtpsalti, which are generated by: srtpkeyi = HMAC(s0,"Initiator SRTP master key") srtpsalti = HMAC(s0,"Initiator SRTP master salt") The ZRTP responder encrypts and the ZRTP initiator decrypts packets by using srtpkeyr and srtpsaltr, which are generated by: srtpkeyr = HMAC(s0,"Responder SRTP master key") srtpsaltr = HMAC(s0,"Responder SRTP master salt") The HMAC key is generated by: hmackey = HMAC(s0,"HMAC key") Both sides now discard the rs2 value and store rs1 as rs2. A new rs1 is calculated from s0: rs1 = HMAC (s0, "retained secret") The endpoints can now switch to SRTP and begin packet encryption. The ZRTP Initiator and Responder use their own keying material for the SRTP session. No MKI is used and a 32 bit authentication tag is used. The ZRTP Confirm1 and Confirm2 messages are sent for two reasons. First, they confirm that all the key agreement calculations were successful and the encryption is working, and they enable us to automatically detect a DH MitM attack from a reckless attacker who does not know the retained shared secret. Second, they enable us to transmit the SASflag under cover of SRTP encryption, shielding it from a passive observer who would like to know if the human users are in the habit of diligently verifying the SAS. In the Confirm1 and Confirm2 messages, the sasflag Boolean is converted to an octet called sasflagoctet (resulting in either 0x00 Zimmermann, et al. Expires September 6, 2006 [Page 15] Internet-Draft ZRTP March 2006 or 0x01). Confirm1 and Confirm2 messages contain an HMAC of some known plaintext and the sasflagoctet. The HMAC is explicitly included in the payload because we may not always be able to rely on the built-in authentication tag in SRTP, which might be configured to different sizes, including none. hmac = HMAC(hmackey, "known plaintext" | sasflagoctet ) This information is not carried in the extension header but inserted at the start of the SRTP payload. The Comfirm2ACK message completes the exchange. The optional GoClear message is used to switch from SRTP back to RTP. To avoid relying on the optional SRTP authentication tag, the GoClear contains an HMAC of the string "GoClear" computed with the hmackey derived from the shared secret: clear_hmac = HMAC(hmackey, "GoClear") A GoClear message receives either a ClearACK message or an Error message, which indicates that the ZRTP endpoint does not support the GoClear mechanism or that the GoClear has failed authentication (the clear_hmac does not validate). 3.3. Random Number Generation The ZRTP protocol uses random numbers for cryptographic key material, notably for the DH secret exponents, which must be freshly generated with each session. Whenever a random number is needed, all of the following criteria must be satisfied: It MUST be derived from a physical entropy source, such as RF noise, acoustic noise, thermal noise, high resolution timings of environmental events, or other unpredictable physical sources of entropy. Chapter 10 of [4] gives a detailed explanation of cryptographic grade random numbers and provides guidance for collecting suitable entropy. The raw entropy must be distilled and processed through a deterministic random bit generator (DRBG). Examples of DRBGs may be found in NIST SP 800-90 [5], and in [4]. It MUST be freshly generated, meaning that it must not have been used in a previous calculation. It MUST be greater than or equal to two, and less than or equal to 2^L - 1, where L is the number of random bits required. It MUST be chosen with equal probability from the entire available Zimmermann, et al. Expires September 6, 2006 [Page 16] Internet-Draft ZRTP March 2006 number space, e.g., [2, 2^L - 1]. 4. RTP Header Extensions This specification defines a new RTP header extension used for all ZRTP messages. When used, the X bit is set in the RTP header to indicate the presence of the RTP header extension. Section 5.3.1 in RFC 3550 defines the format of an RTP Header extension. The Header extension is appended to the RTP header. The first 16 bits are an identifier for the header extension, and the following 16 bits are length of the extension header in 32 bit words. All word lengths referenced in this specification follow RFC 3550 and are 32 bits or 4 octets. All integer fields are carried in network byte order, that is, most significant byte (octet) first, commonly known as big-endian. Each ZRTP message is carried in a single RTP header extension which is the value of 0x505A. 4.1. ZRTP Message Formats ZRTP messages are designed to simplify endpoint parsing requirements and to reduce the opportunities for buffer overflow attacks (a good goal of any security extension should be to not introduce new attack vectors...) ZRTP uses 8 octet blocks (2 words) to encode many ZRTP parameters. These fixed-length blocks are used for Message Type, Hash Type, Cipher Type, and Public Key Type. The values in the blocks are ASCII strings which are extended with spaces (0x20) to make them 8 characters long. Currently defined block values are listed in Tables 1-4 below. Additional block values may be defined and used. ZRTP uses this ASCII encoding to simplify debugging and make it "ethereal friendly". 4.1.1. Message Type Block Currently eleven Message Type Blocks are defined - they represent the set of ZRTP message primitives. ZRTP endpoints MUST support the Hello, HelloACK, Commit, DHPart1, DHPart2, Confirm1, Confirm2, Conf2ACK, and Error block types. They MAY support GoClear and ClearACK. Zimmermann, et al. Expires September 6, 2006 [Page 17] Internet-Draft ZRTP March 2006 Message Type Block | Meaning --------------------------------------------------- Hello | Hello Message | defined in Section 4.2 --------------------------------------------------- HelloACK | HelloACK Message | defined in Section 4.3 --------------------------------------------------- Commit | Commit Message | defined in Section 4.4 --------------------------------------------------- DHPart1 | DHPart1 Message | defined in Section 4.4 --------------------------------------------------- DHPart2 | DHPart2 Message | defined in Section 4.5 --------------------------------------------------- Confirm1 | Confirm1 Message | defined in Section 4.6 --------------------------------------------------- Confirm2 | Confirm2 Message | defined in Section 4.7 --------------------------------------------------- Conf2ACK | Conf2ACK Message | defined in Section 4.8 --------------------------------------------------- Error | Error Message | defined in Section 4.9 --------------------------------------------------- GoClear | GoClear Message | defined in Section 4.10 --------------------------------------------------- ClearACK | ClearACK Message | defined in Section 4.11 --------------------------------------------------- Table 1. Message Block Type Values 4.1.2. Message Type Block Only one Hash Type is currently defined, SHA256, and all ZRTP endpoints MUST support this hash. Additional Hash Types can be registered and used. Hash Type Block | Meaning --------------------------------------------------- SHA256 | SHA-256 Hash defined in [SHA-256] --------------------------------------------------- Zimmermann, et al. Expires September 6, 2006 [Page 18] Internet-Draft ZRTP March 2006 Table 2. Hash Block Type Values 4.1.3. Cipher Type Block All ZRTP endpoints MUST support AES128 and MAY support AES256 or other Cipher Types. Also, if AES 128 is used, DH3k should be used. If AES 256 is used, DH4k should be used. Cipher Type Block | Meaning --------------------------------------------------- AES128 | AES-CM with 128 bit keys | as defined in RFC 3711 --------------------------------------------------- AES256 | AES-CM with 256 bit keys | as defined in RFC 3711 --------------------------------------------------- Table 3. Cipher Block Type Values 4.1.4. Public Key Type Block All ZRTP endpoints MUST support DH3072 and MAY support DH4096. ZRTP endpoints MUST use the DH generator function g=2. The choice of AES key length is coupled to the choice of public key type. If AES 128 is chosen, DH3072 SHOULD be used. If AES 256 is chosen, DH4096 SHOULD be used. Public Key Type Block| Meaning --------------------------------------------------- DH3072 | DH with p=3072 bit prime | as defined in RFC 3526 --------------------------------------------------- DH4096 | DH with p=4096 bit prime | as defined in RFC 3526 --------------------------------------------------- Table 4. Public Key Block Type Values 4.1.5. SAS Type Block All ZRTP endpoints MAY support the libase32 Short Authentication String scheme or other SAS schemes. The optional ZRTP SAS is described in Section 6. Zimmermann, et al. Expires September 6, 2006 [Page 19] Internet-Draft ZRTP March 2006 SAS Type Block | Meaning --------------------------------------------------- libase32 | Short Authentication String using | libbase32 encoding defined in Section 6. --------------------------------------------------- Table 5. SAS Block Type Values 4.2. Hello message The Hello message has the format shown in Figure 2 below. The header extension payload contains the ZRTP version number and the list of algorithms supported by SRTP. The extension header field format is shown in Figure 2. The Hello ZRTP message begins with the ZRTP header extension field followed by the 32 bit word count of the header field. Next is a word containing the version (ver) of ZRTP. For this specification, the version is the string "0.01". Next is the Client Identifier string (cid) which is 15 octets long and identifies the vendor and release of the ZRTP software. The Passive bit (P) is a Boolean normally set to False. A ZRTP endpoint which is configured to never initiate secure sessions is regarded as passive, and would set the P bit to True. Next is a list of supported Hash Types, Cipher Types, public key types, and SAS Type. Five possible algorithms are listed for each using the Blocks defined in Tables 2, 3, 4, and 5. If fewer than five algorithms are supported, spaces (0x20) are used to pad out the 10 words for each type. The last parameter is the ZID, the 96 bit long unique identifier for the ZRTP endpoint. Zimmermann, et al. Expires September 6, 2006 [Page 20] Internet-Draft ZRTP March 2006 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=50 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block=Hello (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | version (1 word) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Client Identifier (15 octets) | | +-+-+-+-+-+-+-+-+ | |0 0 0 0 0 0 0|P| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Hash Type Blocks 1-5 (10 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Cipher Type Blocks 1-5 (10 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Public Key Type Blocks 1-5 (10 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | SAS Type Blocks 1-5 (10 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ZID (3 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2. Extension header format for Hello message 4.3. HelloACK message The HelloACK message is used to stop retransmissions of a Hello message. A HelloACK is sent regardless if the version number in the Hello is supported or the algorithm list supported. The receipt of a HelloACK stops retransmission of the Hello message. The format is Zimmermann, et al. Expires September 6, 2006 [Page 21] Internet-Draft ZRTP March 2006 shown in Figure 3 below. Note that a Commit message can be sent in place of a HelloACK by an initiator. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=2 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block=HelloACK (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3. Extension header format for HelloACK message 4.4. Commit message The Commit message is sent to initiate the key agreement process after receiving a Hello message. The Commit message contains the initiator's ZID and a list of selected algorithms (hash, cipher, pkt, sas) and hvi, a hash of the public DH value of the initiator and the algorithm list from the responder's Hello message. A Commit cannot be sent until a Hello message has been received. Zimmermann, et al. Expires September 6, 2006 [Page 22] Internet-Draft ZRTP March 2006 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=16 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block=Commit (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ZID (3 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Hash Type Blocks (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Cipher Type Block (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Public Key Type Block (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SAS Type Block (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | hvi (8 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4. Extension header format for Commit message 4.5. DHPart1 message The DHPart1 message contain begins the DH exchange. The format is shown in Figure 5 below. The DHPart1 message is sent if a valid Commit message is received. The length of the pvr value depends on the Public Key Type chosen. If DH4096 is used, the pvr will be 128 words (512 octets). If DH3072 is used, it is 96 words (384 octets). The next five parameters are HMACs of potential shared secrets used in generating the ZRTP secret. The first two, rs1IDr and rs2IDr, are the HMACs of the responder's two retained shared secrets, truncated to 64 bits. Next is sigsIDr, the HMAC of the responder's signaling secret, truncated to 64 bits. Next is srtpsIDr, the HMAC of the responder's SRTP secret, truncated to 64 bits. The last parameter is the HMAC of an additional shared secret. For example, if multiple SRTP secrets are available or some other secret is used, it can used as the other_secret. Zimmermann, et al. Expires September 6, 2006 [Page 23] Internet-Draft ZRTP March 2006 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on PK Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block=DHPart1 (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | pvr (length depends on PK Type) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rs1IDr (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rs2IDr (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | sigsIDr (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | srtpsIDr (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | other_secretIDr (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5. Extension header format for DHPart1 message 4.6. DHPart2 message The DHPart2 message completes the DH exchange. A DHPart2 message is sent if a valid DHPart1 message is received. The length of the pvi value depends on the Public Key Type chosen. If DH4096 is used, the pvr will be 128 words (512 octets). If DH3072 is used, it is 96 words (384 octets). The next five parameters are HMACs of potential shared secrets used in generating the ZRTP secret. The first two, rs1IDi and rs2IDi, are the HMACs of the initiator's two retained shared secrets, truncated to 64 bits. Next is sigsIDi, the HMAC of the initiator's signaling secret, truncated to 64 bits. Next is srtpsIDi, the HMAC of the initiator's SRTP secret, truncated to 64 bits. The last parameter is the HMAC of an additional shared secret. For example, if multiple SRTP secrets are available or some other secret is used, it can be included. Zimmermann, et al. Expires September 6, 2006 [Page 24] Internet-Draft ZRTP March 2006 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on PK Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block=DHPart2 (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | pvi (length depends on PK Type) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rs1IDi (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rs2IDi (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | sigsIDi (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | srtpsIDi (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | other_secretIDi (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 6. Extension header format for DHPart2 message 4.7. Confirm1 message The Confirm1 message is sent in response to a valid DHPart2 message after the SRTP session key and parameters have been negotiated. As a result, it is always sent in an SRTP packet. The format is shown in Figure 7 below. The header extension itself has no parameters besides the Message Type Block. However, three parameters are carried in the SRTP payload. The plaintext parameter contains the known plaintext "known plaintext". The sasflag (S) is a Boolean bit. The hmac is a hash over the known plaintext "known plaintext" and the SASflag Boolean converted to the octet 0x00 or 0x01. The parameters included in the SRTP payload MUST NOT be allowed to pass to the RTP stack or errors may occur with the media stream. Zimmermann, et al. Expires September 6, 2006 [Page 25] Internet-Draft ZRTP March 2006 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=2 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block=Confirm1 (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ At the start of the SRTP payload: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | | plaintext (15 octets) | | +-+-+-+-+-+-+-+-+ | |0 0 0 0 0 0 0|S| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | hmac (8 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7. Extension header format for Confirm1 message 4.8. Confirm2 message The Confirm2 message is sent in response to a Confirm1 message after the SRTP session key and parameters have been negotiated. As a result, it is always sent in an SRTP packet. The format is shown in Figure 8 below. The header extension itself has no parameters besides the Message Type Block. However, three parameters are carried in the SRTP payload. The plaintext parameter contains the known plaintext "known plaintext". The sasflag (S) is a Boolean bit. The hmac is a hash over the known plaintext "known plaintext" and the SASflag Boolean converted to the octet 0x00 or 0x01. The parameters included in the SRTP payload MUST NOT be allowed to pass to the RTP stack or errors may occur with the media stream. Zimmermann, et al. Expires September 6, 2006 [Page 26] Internet-Draft ZRTP March 2006 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=2 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block=Confirm2 (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ At the start of the SRTP payload: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | plaintext (15 octets) | | +-+-+-+-+-+-+-+-+ | |0 0 0 0 0 0 0|S| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | hmac (8 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8. Extension header format for Confirm1 message 4.9. Conf2ACK message The Conf2ACK message is sent in response to a valid Confirm2 message. The format is shown in Figure 9 below. The receipt of a Conf2ACK stops retransmission of the Confirm2 message. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=2 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block=Conf2ACK (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 9. Extension header format for Conf2ACK message 4.10. Error message An Error message is sent in response to another ZRTP message which is not valid or not supported. The format is shown in Figure 10 below. Reasons could be: missing block or parameter, chosen parameter not in offered list, checksum failure, message type block not understood Zimmermann, et al. Expires September 6, 2006 [Page 27] Internet-Draft ZRTP March 2006 etc. The ZRTP message type that generated the error is included in the Message Type Block. This message can be sent in response to any ZRTP message except Hello and HelloACK and is never acknowledged or retransmitted. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=4 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block=Error (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 10. Extension header format for Error message 4.11. GoClear message The optional GoClear message is sent to switch from SRTP back to RTP. The format is shown in Figure 11 below. The clear_hmac is used to authenticate the GoClear message so that bogus GoClear messages introduced by an attacker can be detected and discarded. This message is retransmitted at 500ms intervals until the receipt of a ClearACK message or an Error message. After sending a GoClear message, the ZRTP endpoint stops sending SRTP packets. When a ClearACK is received, the ZRTP endpoint deletes the crypto context for the SRTP session and may then resume sending RTP packets. However, if instead an Error message is received, the SRTP session resumes as if the GoClear had never been sent. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=10 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block=GoClear (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | clear_hmac (8 words) | | . . . | | | Zimmermann, et al. Expires September 6, 2006 [Page 28] Internet-Draft ZRTP March 2006 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 11. Extension header format for GoClear message 4.12. ClearACK message The optional ClearACK message is sent to acknowledge receipt of a GoClear. A ClearACK is only sent if the clear_hmac from the GoClear message is authenticated. Otherwise, an Error message is returned. The format is shown in Figure 12 below. A ZRTP endpoint that receives a GoClear message stops sending SRTP packets, generates a ClearACK in response, and deletes the crypto context for the SRTP session. Until confirmation from the user is received (e.g. clicking a button, pressing a DTMF key, etc.), the ZRTP endpoint MUST NOT resume sending RTP packets. The endpoint then renders the information that the media session has switched to clear mode to the user and waits for confirmation from the user. To prevent pinholes from closing or NAT bindings from expiring, the ClearACK message should be resent every 5 seconds while waiting for confirmation from the user. After confirmation of the notification is received from the user, the sending of RTP packets may begin. Note that if the GoClear/ClearACK mechanism is not supported by a ZRTP endpoint, an Error message MUST be sent in response to a GoClear message. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=2 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block=ClearACK (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 12. Extension header format for ClearACK message 5. Retransmissions ZRTP uses two retransmission timers T1 and T2. T1 is used for retransmission of Hello messages, when the support of ZRTP by the other endpoint may not be known. T2 is used in retransmissions of all the other ZRTP messages with the exception of GoClear. The retransmission of GoClear messages is discussed in the section on GoClear. Zimmermann, et al. Expires September 6, 2006 [Page 29] Internet-Draft ZRTP March 2006 Practical experience has shown that RTP packet loss at the start of an RTP session can be extremely high. Since the entire ZRTP message exchange occurs during this period, the defined retransmission scheme is defined to be aggressive. Since ZRTP packets with the exception of the DHPart1 and DHPart2 messages are small, this should have minimal effect on overall bandwidth utilization of the media session. Hello ZRTP requests are retransmitted at an interval that starts at T1 seconds and doubles after every retransmission, capping at 200ms. A Hello message is retransmitted 20 times before giving up. T1 has a recommended value of 50 ms. Retransmission of a Hello ends upon receipt of a HelloACK or Commit message. Non-Hello ZRTP requests are retransmitted only by the initiator - that is, only Commit, DHPart2, and Confirm2 are retransmitted if the corresponding message from the responder, DHPart1, Confirm1, and Conf2ACK, are not received. Non-Hello ZRTP messages are retransmitted at an interval that starts at T2 seconds and doubles after every retransmission, capping at 600ms. Only the ZRTP initiator performs retransmissions. Each message is retransmitted 10 times before giving up and resuming a normal RTP session. T2 has a default value of 150ms. Each message has a response message that stops retransmissions, as shown in Table 6. The high value of T2 means that retransmissions will likely only occur with packet loss. The receipt of an Error message ends retransmission of the message identified in the Error message. Message Acknowledgement Message ------- ----------------------- Hello HelloACK or Commit Commit DHPart1 DHPart2 Confirm1 Confirm2 Conf2ACK GoClear ClearACK Table 6. Retransmitted ZRTP Messages and Responses 6. Short Authentication String This section will discuss the implementation of the optional Short Authentication String, or SAS in ZRTP. The Short Authentication String (SAS) value is calculated as the hash of both DH public values and the string "Short Authentication String". Zimmermann, et al. Expires September 6, 2006 [Page 30] Internet-Draft ZRTP March 2006 sasvalue = hash(pvi | pvr | "Short Authentication String") The rendering of the SAS value depends on the SAS Type agreed upon in the Commit message. For the SAS Type of libase32, the last 20 bits of the sasvalue are rendered as a form of base32 encoding known as libbase32 [6]. The purpose of libbase32 is to represent arbitrary sequences of octets in a form that is as convenient as possible for human users to manipulate. As a result, the choice of characters is slightly different from base32 as defined in RFC 3548. The last 20 bits of the sasvalue results in four libbase32 characters which are rendered to both ZRTP endpoints. Other SAS Types may be defined to render the SAS value in other ways. The sasflag is set based on the user indicating that SAS has been successfully performed. The sasflag is exchanged securely in the Confirm1 and Confirm2 messages of the next session. In other words, each party sends the sasflag from the previous session in the Confirm message of the current session. It is perfectly reasonable to have a ZRTP endpoint that never sets the sasflag, because it would require adding complexity to the user interface to allow the user to set it. The sasflag is not required to be set, but if it is available to the client software, it allows for the possibility that the client software could render to the user that the SAS verify procedure was carried out in a previous session. Regardless of whether there is a user interface element to allow the user to set the sasflag, it is worth caching a shared secret, because doing so reduces opportunities for an attacker in the next call. If at any time the users carry out the SAS procedure, and it actually fails to match, then this means there is a very resourceful man in the middle. If this is the first call, the MitM was there on the first call, which is impressive enough. If it happens in a later call, it also means the MitM must also know your cached shared secret, because you could not have carried out any voice traffic at all unless the session key was correctly computed and is also known to the attacker. This implies the MitM must have been present in all the previous sessions, since the initial establishment of the first shared secret. This is indeed a resourceful attacker. It also means that if at any time he ceases his participation as a MitM on one of your calls, the protocol will detect that the cached shared secret is no longer valid-- because it was really two different shared secrets all along, one of them between Alice and the attacker, and the other between the attacker and Bob. The continuity of the cached shared secrets make it possible for us to detect the MitM when he inserts himself into the ongoing relationship, as well as when he leaves. Also, if the attacker tries to stay with a long lineage of calls, but fails to execute a DH MitM attack for even one missed call, he is Zimmermann, et al. Expires September 6, 2006 [Page 31] Internet-Draft ZRTP March 2006 permanently excluded. He can no longer resynchronize with the chain of cached shared secrets. Some sort of user interface element (maybe a checkbox) is needed to allow the user to tell the software the SAS verify was successful, causing the software to set the "SAS verified" flag, which (together with our cached shared secret) obviates the need to perform the SAS procedure in the next call. An additional user interface element can be provided to let the user tell the software he detected an actual SAS mismatch, which indicates a MitM attack. The software can then take appropriate action, clearing the "SAS verified" flags, and erase the cached shared secret from this session. It is up to the implementer to decide if this added user interface complexity is warranted. If the SAS matches, it means there is no MitM, which also implies it is now safe to trust a cached shared secret for later calls. If inattentive users don't bother to check the SAS, it means we don't know whether there is or is not a MitM, so even if we do establish a new cached shared secret, there is a risk that our potential attacker may have a subsequent opportunity to continue inserting himself in the call, until we finally get around to checking the SAS. If the SAS matches, it means no attacker was present for any previous session since we started propagating cached shared secrets, because this session and all the previous sessions were also authenticated with a continuous lineage of shared secrets. 7. IANA Considerations If an IANA registry for RTP extension headers were defined, then the value 0x505A would be reserved for ZRTP. 8. Security Considerations This document is all about securely keying SRTP sessions. As such, security is discussed in every section. The next version of this draft will have a summary of those security properties discussed throughout the document. 9. Acknowledgments The authors would like to thank Bryce Wilcox for his contributions to the design of this protocol, and to thank Jon Peterson, Colin Plumb, and Hal Finney for their helpful comments and suggestions. Zimmermann, et al. Expires September 6, 2006 [Page 32] Internet-Draft ZRTP March 2006 10. Appendix - ZRTP, SIP, and SDP This section discusses how ZRTP, SIP, and SDP work together. SIP UAs which support this specification would include the to-be- defined SDP attribute a=zrtp in their SDP offers and answers. The presence of this attribute is a hint to another UA that ZRTP is supported. If a UA supports both ZRTP and another approach to negotiate an SRTP secret such as [14] or [13] , then the presence of the a=zrtp attribute is critical. If both UAs support ZRTP, they will first try ZRTP before attempting SRTP. If only one endpoint supports ZRTP but both support SRTP, then the other method will be used instead. Note that ZRTP may be implemented without coupling with the SIP signaling. For example, ZRTP can be implemented as a "bump in the wire" or as a "bump in the stack" in which RTP sent by the SIP UA is converted to ZRTP. In these cases, the SIP UA will have no knowledge of ZRTP and will not include the a=zrtp attribute. As a result, even if the other UA does not indicate support for ZRTP, a ZRTP endpoint SHOULD still send Hello messages. 11. References 11.1. Normative References [1] Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [2] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, March 2004. [3] Kivinen, T. and M. Kojo, "More Modular Exponential (MODP) Diffie-Hellman groups for Internet Key Exchange (IKE)", RFC 3526, May 2003. [4] Ferguson, N. and B. Schneier, "Practical Cryptography", Wiley Publishing 2003. [5] Barker, E. and J. Kelsey, "Recommendation for Random Number Generation Using Deterministic Random Bit Generators", NIST Special Publication 800-90 DRAFT (December 2005). [6] O'Whielacronx, Z., "human-oriented base-32 encoding", http:// cvs.sourceforge.net/viewcvs.py/libbase32/libbase32/ Zimmermann, et al. Expires September 6, 2006 [Page 33] Internet-Draft ZRTP March 2006 DESIGN?rev=HEAD . 11.2. Informative References [7] Zimmermann, P., "PGPfone", http://www.pgpi.org/products/pgpfone/ . [8] Zimmermann, P., "Zfone", http://www.philzimmermann.com/zfone . [9] Blossom, E., "The VP1 Protocol for Voice Privacy Devices Version 1.2", http://www.comsec.com/vp1-protocol.pdf . [10] "CryptoPhone", http://www.cryptophone.de/ . [11] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler, "SIP: Session Initiation Protocol", RFC 3261, June 2002. [12] Ylonen, T. and C. Lonvick, "The Secure Shell (SSH) Protocol Architecture", RFC 4251, January 2006. [13] Andreasen, F., "Session Description Protocol Security Descriptions for Media Streams", draft-ietf-mmusic-sdescriptions-12 (work in progress), September 2005. [14] Arkko, J., "Key Management Extensions for Session Description Protocol (SDP) and Real Time Streaming Protocol (RTSP)", draft-ietf-mmusic-kmgmt-ext-15 (work in progress), June 2005. [15] Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K. Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830, August 2004. [16] Handley, M. and V. Jacobson, "SDP: Session Description Protocol", RFC 2327, April 1998. Zimmermann, et al. Expires September 6, 2006 [Page 34] Internet-Draft ZRTP March 2006 Authors' Addresses Philip Zimmermann Phil Zimmermann and Associates LLC Email: prz@mit.edu Alan Johnston (editor) SIPStation St. Louis, MO 63124 Email: alan@sipstation.com Jon Callas PGP Corporation Email: jon@pgp.com Zimmermann, et al. Expires September 6, 2006 [Page 35] Internet-Draft ZRTP March 2006 Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Disclaimer of Validity This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. Acknowledgment Funding for the RFC Editor function is currently provided by the Internet Society. Zimmermann, et al. Expires September 6, 2006 [Page 36]