idnits 2.17.1 draft-ietf-avt-srtp-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts -- however, there's a paragraph with a matching beginning. Boilerplate error? == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 31 longer pages, the longest (page 2) being 60 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 32 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: As the rollover counter is 32 bits long, the maximum number of packets in any given SRTP session is 2^48 = 281,474,976,710,656. After that number of SRTP packets have been sent, the sender MUST not send any more packets with that cryptographic context. This limitation enforces a security benefit by providing an upper bound on the amount of traffic that can pass before cryptographic keys are changed. Of course, re-keying mechanisms MUST be triggered before this maximum key lifetime, and key refresh mechanisms MAY be triggered during the key lifetime. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: where TS (Timestamp, 32 bits), SEQ (Sequence Number, 16 bits), M (Marker Bit, 1 bit), PT (Payload Type, 7 bits), and SSRC (Synchronization Source, 32 bits) are taken from the current RTP header. ROC is the 32-bit rollover counter from the identified context. FLAG is a 8-bit value which is used to signal additional information. Currently, the only value defined (for RTP) is FLAG = 00..0. The value 00..01 is reserved for RTCP and MUST not be used with RTP. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: FLAG is a 8-bit value which is used to signal additional information. Currently, the only value defined (for RTCP) is FLAG = 00..01. The value 0..0 is reserved for RTP and MUST not be used for RTCP. This allows to use the same key for related RTP and RTCP flows (being the IV unique). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 2001) is 8321 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'BR98' is mentioned on line 603, but not defined == Missing Reference: 'PCBTS' is mentioned on line 1052, but not defined == Missing Reference: 'B96' is mentioned on line 1080, but not defined == Missing Reference: 'Bi96' is mentioned on line 1103, but not defined == Missing Reference: 'SDP' is mentioned on line 1280, but not defined == Missing Reference: 'SRTP' is mentioned on line 1307, but not defined == Unused Reference: 'BF00' is defined on line 1358, but no explicit reference was found in the text == Unused Reference: 'ES3E' is defined on line 1371, but no explicit reference was found in the text == Unused Reference: 'LRW00' is defined on line 1389, but no explicit reference was found in the text == Unused Reference: 'R92' is defined on line 1408, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'AES' -- Possible downref: Non-RFC (?) normative reference: ref. 'BCNN00' -- Possible downref: Non-RFC (?) normative reference: ref. 'BF00' -- Possible downref: Non-RFC (?) normative reference: ref. 'C99' -- Possible downref: Non-RFC (?) normative reference: ref. 'ES3D' -- Possible downref: Non-RFC (?) normative reference: ref. 'ES3E' -- Possible downref: Non-RFC (?) normative reference: ref. 'HAC' -- Possible downref: Non-RFC (?) normative reference: ref. 'H80' ** Obsolete normative reference: RFC 2401 (ref. 'KA98a') (Obsoleted by RFC 4301) -- Possible downref: Non-RFC (?) normative reference: ref. 'KBHHKR00' -- Possible downref: Non-RFC (?) normative reference: ref. 'LRW00' -- Possible downref: Non-RFC (?) normative reference: ref. 'M00' -- Possible downref: Non-RFC (?) normative reference: ref. 'MF00' -- Possible downref: Non-RFC (?) normative reference: ref. 'MF00b' -- Possible downref: Non-RFC (?) normative reference: ref. 'R92' -- Possible downref: Non-RFC (?) normative reference: ref. 'RC94' -- Possible downref: Non-RFC (?) normative reference: ref. 'RC98' == Outdated reference: A later version (-05) exists of draft-rescorla-sec-cons-00 -- Possible downref: Normative reference to a draft: ref. 'RK99' -- Possible downref: Non-RFC (?) normative reference: ref. 'S96' ** Obsolete normative reference: RFC 1889 (ref. 'SCFJ96') (Obsoleted by RFC 3550) -- No information found for draft-irtf-smug - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'TESLA' Summary: 7 errors (**), 0 flaws (~~), 17 warnings (==), 22 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Rolf Blom, Ericsson 3 AVT Working Group Elisabetta Carrara, Ericsson 4 INTERNET-DRAFT David A. McGrew, Cisco 5 Expires: December 2001 Mats Naslund, Ericsson 6 Karl Norrman, Ericsson 7 David Oran, Cisco 9 July 2001 11 The Secure Real Time Transport Protocol 12 14 Status of this memo 16 This document is an Internet-Draft and is in full conformance with 17 all provisions of Section 10 of RFC2026. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that other 21 groups may also distribute working documents as Internet-Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or cite them other than as "work in progress". 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/lid-abstracts.txt 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html 34 Abstract 36 This document describes the Secure Real Time Transport Protocol 37 (SRTP), a profile of the Real Time Transport Protocol (RTP) which can 38 provide confidentiality, message authentication (in groups, also 39 source origin authentication), replay protection, and implicit header 40 authentication. 42 SRTP can achieve high throughput and low packet expansion by using an 43 additive stream cipher for encryption, a universal hashing based 44 function for message authentication, and an 'implicit' index for 45 sequencing based on the RTP sequence number. 47 Robust and flexible re-keying/access control to media can be achieved 48 through an optional security parameter index (SPI). 50 In addition, SRTP proves to be a suitable protection for 51 heterogeneous environments, i.e. environments including both wired 52 and wireless links. 54 TABLE OF CONTENTS 56 1. Notational Conventions.........................................3 57 2. Goals..........................................................3 58 3. SRTP Overview..................................................4 59 3.1 SRTP Cryptographic Contexts...................................5 60 3.2 Mapping SRTP Packets to Cryptographic Contexts................6 61 3.3 SRTP Packet Processing........................................7 62 3.4 Cryptographic Algorithms......................................8 63 4. Synchronization................................................9 64 4.1 Packet Index Determination....................................9 65 4.2. IV Formation for Implicit Header Authentication ............10 66 5. Replay Protection.............................................11 67 6. Encryption....................................................11 68 6.1 Defined Ciphers..............................................12 69 6.1.1. Counter Mode AES..........................................12 70 6.1.2. AES in f8-Mode............................................13 71 6.1.3. NULL Cipher...............................................15 72 7. Message Authentication........................................15 73 7.1. Non-delayed Message Authentication..........................15 74 7.1.1 Default MAC: UMAC..........................................16 75 7.2 Delayed Message Authentication...............................16 76 7.2.1. TESLA.....................................................17 77 7.3. Compound Authentication Tag.................................17 78 8. SRTP Parameters...............................................17 79 9. Secure RTCP...................................................18 80 10. Rationale....................................................21 81 10.1 Synchronization.............................................21 82 10.2 Replay Protection...........................................22 83 10.3 Source Origin Authentication considerations.................22 84 10.4. Choice of Encryption Transform.............................23 85 11. Security Considerations......................................23 86 11.1. SSRC collision.............................................24 87 11.2. Confidentiality of the RTP Payload.........................25 88 11.3. Confidentiality of the RTP Header..........................25 89 11.4. Integrity of the RTP header................................25 90 12. Multicast and many-to-many...................................26 91 13. Key management ..............................................26 92 13.1. Security Parameters........................................26 93 13.2. SDP attribute support......................................27 94 14. Acknowledgements.............................................28 95 15. Author's Addresses...........................................28 96 16. References...................................................29 97 APPENDIX A: Test Vectors.........................................31 99 1. Notational Conventions 101 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 102 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 103 document are to be interpreted as described in RFC-2119 [B97]. 105 By convention, the most left bit (byte) is the most significant one. 106 By XOR we mean bitwise addition modulo 2 of binary strings, and || 107 denotes concatenation. E.g. if C = A || B, then the most significant 108 bits of C are the same as those of A, and the least significant bits 109 of C equals those of B. 111 2. Goals 113 The security goals for SRTP are to ensure: 115 * the confidentiality of the RTP payload, 117 * the integrity protection of the entire RTP packet, including 118 protection against replayed RTP packets, and 120 * implicit authentication of the header. 122 Each of the security services described above is optional. Any 123 combination of options can be provided, except the single option of 124 implicit header authentication. 126 In group scenarios, source origin authentication does not follow 127 automatically from integrity protection, and therefore an interface 128 is provided to obtain this from an external mechanism, e.g. [TESLA]. 130 To this end, we need to use a wide definition of the term 131 'authentication', meaning both immediate authentication, and delayed 132 authentication. Which authentication scheme (if any) is to be used 133 MUST be signaled in the set-up phase along with other security 134 parameters. 136 Other goals for the protocol are: 138 * a low computational cost, 140 * a low footprint (i.e., small code size and data memory for key 141 schedules and replay lists), 143 * limited packet expansion, 145 * no error propagation (e.g., changing individual bits in the payload 146 of an SRTP packet must not change only the corresponding bits in the 147 RTP packet), 148 * the preservation of RTP header compression efficiency, 150 * to allow cryptographic keys to be used by multiple RTP sessions 151 simultaneously, 153 * independence from the underlying transport used by RTP. 155 These properties ensures that SRTP is a suitable protection scheme 156 for RTP in both wired and wireless scenarios. 158 3. SRTP Overview 160 RTP is the Real Time Transport Protocol [SCFJ96]. We define SRTP as a 161 profile of RTP, in a way analogous to RFC1890 which defines the 162 audio/video profile for RTP. Conceptually, we consider a 'bump in the 163 stack' implementation which resides between the RTP application and 164 the transport layer, which intercepts RTP packets and then forwards 165 an equivalent SRTP packet on the sending side, and which intercepts 166 SRTP packets and passes an equivalent RTP packet up the stack on the 167 receiving side. 169 0 1 2 3 170 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 171 +-->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 172 | |V=2|P|X| CC |M| PT | sequence number | 173 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 174 | | timestamp | 175 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 176 | | synchronization source (SSRC) identifier | 177 | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 178 | | contributing source (CSRC) identifiers | 179 | | .... | 180 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 181 | | RTP extension (optional) | 182 | +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 183 | | | | 184 | | | payload | 185 | | | .... | 186 | +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-| 187 | | | SPI (optional) | 188 +-->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 189 | | | authentication tag (optional) | 190 | | | | 191 | | | .... | 192 | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 193 | | 194 | +- Encrypted Portion 195 +---- Authenticated Portion 196 Figure 1. The format of an SRTP packet. 198 The format of an SRTP packet is illustrated in Figure 1. The optional 199 authentication tag and SPI are the only fields defined by SRTP that 200 is not in RTP. The authentication tag provides data origin 201 authentication of the RTP header and payload, and of the optional 202 SPI, and it indirectly provides replay protection by authenticating 203 the sequence number. 205 The added fields are: 207 Authentication tag: variable length, optional 208 The authentication tag shall be used to carry message 209 integrity/source authentication data. The Authenticated 210 Portion of an SRTP packet consists of the entire equivalent 211 RTP packet, and SPI field when present. There is nothing to 212 prevent the authentication tag from being composed of several 213 sub-tags. 215 Security Parameter Index (SPI): 32 bits, optional 216 The SPI is used to determine the cryptographic context for the 217 current packet. 219 Use of authentication and the SPI field is determined during session 220 establishment. 222 The Encrypted Portion of an SRTP packet consists of the RTP payload 223 of the equivalent RTP packet. 225 3.1 SRTP Cryptographic Contexts 227 Each SRTP session requires the sender and receiver to maintain 228 cryptographic state information. This information is called the 229 cryptographic context, and it consists of: 231 * an encryption key k_e, and optionally a "salting key" k_s. These 232 keys MUST be randomly and independently chosen. 234 * a 32-bit rollover counter r (which records how many times the 16- 235 bit RTP sequence number has been reset to zero after passing 236 through 65,535), 238 * for the receiver only, a sequence number s_l, which is the last 239 received (possibly authenticated, if authentication is provided) 240 sequence number 241 * the mode of operation for the encryption scheme, and 243 * the cipher. 245 SRTP also uses an 8-bit FLAG carrying additional information. The 246 current specification leaves it static or pre-determined throughout 247 the session, though future extensions MAY require it to be included 248 in the context. 250 In addition, when authentication and replay protection are provided, 251 the context contains 253 * the actual authentication protection algorithm(s) and parameters to 254 be used, and 256 * a replay list L (maintained by the receiver only), 258 and, depending on the scheme in use, one or more of the following: 260 * message authentication key(s) {k_a}, 262 * an already source authenticated (i.e. digitally signed) commitment 263 to a chain of keys made by the sender, 265 * a buffer of the most recent packets, maintained by receiver, and/or 266 by the transmitter. 268 3.2 Mapping SRTP Packets to Cryptographic Contexts 270 In this section we define the mapping of RTP and SRTP packets to the 271 cryptographic contexts used to protect them. 273 It is assumed that, when presented with necessary information (see 274 below), the key management returns a context with updated 275 information. 277 An SSRC identifier is unique inside an RTP session, and all packets 278 with the same SSRC form part of the same timing and sequence number 279 space. Thus, the SSRC field and/or transport address information MAY 280 be used by an SRTP receiver (or by a bump in the stack implementation 281 on the sender's side) to identify the proper cryptographic context 282 within that session. Note though that, for instance in a multicast 283 scenario, the RTP anti-collision mechanism for SSRCs may force these 284 identifiers to change over time, see discussion in Section 12. 286 If information in the context (e.g. keys) are to change dynamically, 287 context-signaling MAY be implicit, where in addition, e.g. sequence 288 number and timestamp, carried in the RTP packets, are used to 289 determine the context. This approach may however suffer from 290 synchronization problems due to packet-loss or drift of internal 291 clocks. Therefore, the optional 32-bit SPI field provides means to 292 explicitly signal information on which context to use when processing 293 the packet on the receiving end. The SPI-approach is the only robust 294 method, and SHOULD be used if frequent re-keying is desired. 296 We leave it to the key management to implement such features. 298 Recall that an RTP session for each participant is defined [SCFJ96] 299 by a pair of destination Transport Addresses (one network address 300 plus a port pair for RTP and RTCP), and that a multimedia session is 301 defined as a collection of RTP sessions. For example, a particular 302 multimedia session could include an audio RTP session, a video RTP 303 session, and a text RTP session. 305 SRTP MAY allow the different RTP sessions to use identical 306 cryptographic keys. This is possible if the design of the 307 synchronization mechanism (i.e., the IV in the case of the f8 and 308 Segmented Integer Counter Modes) avoids keystream re-use (the two- 309 time pad, Section 11) and with uniqueness requirements on SSRC beyond 310 that dictated by the RTP standard, see Section 12. However, different 311 multimedia sessions SHOULD use different keys. 313 3.3 SRTP Packet Processing 315 When Generic Forward Error Correction is performed as specified in 316 RFC 2733, then the security processing takes place after FEC on the 317 sender's side, and before FEC on the receiver's side. 319 To construct a proper SRTP packet, given an RTP packet, the sender 320 does the following: 322 1. Determine which cryptographic context to use as described in 323 Section 3.2. 325 2. Determine the index of the SRTP packet as described in Section 326 4.1, using the rollover counter in the cryptographic context and the 327 sequence number in the RTP packet. Form the current initialization 328 vector (IV) if Implicit Header Authentication is provided, as 329 described in Section 4.2. 331 3. Encrypt the Encrypted Portion of the packet, as described in 332 Section 6, using the IV determined in Step 2 and the encryption key 333 and salting key in the context found in Step 1. 335 4. If authentication is provided, compute the authentication tag for 336 the Authenticated Portion of the packet, as described in Section 7, 337 using the index determined in Step 2 and the authentication key in 338 the context found in Step 1. Note that the Encrypted Portion is 339 encrypted before the authentication tag is computed. 341 On the receiving end, packet processing depends on the presence of 342 different types of authentication. The processing below is for so- 343 called immediate authentication. A policy for handling what we call 344 delayed authentication is left to the application (see also Section 345 7). To authenticate and decrypt a SRTP packet, the receiver does the 346 following: 348 1. Determine which cryptographic context to use as described in 349 Section 3.2. 351 2. Estimate the index of the SRTP packet from the rollover counter 352 in the cryptographic context and the sequence number in the RTP 353 packet, as described in Section 4.1. If Implicit Header Protection is 354 provided, form the current IV in the same way as done in Step 2 in 355 the encryption process. 357 3. Check if the packet has been replayed, by checking the Replay List 358 to ensure that no packet with that index has been received and 359 authenticated before. If that index is in the list, then the packet 360 has been replayed and is invalid. It MUST be discarded, and the event 361 SHOULD be logged. 363 Next, perform verification of the authentication tag. If the result 364 is 'AUTHENTICATION FAILURE', the packet MUST be discarded from 365 further processing and the event SHOULD be logged. 367 4. Decrypt the Encrypted Portion of the packet, as described in 368 Section 6, using the IV determined in Step 2 and the encryption key 369 and salting key in the context found in Step 1. 371 5. Update the rollover counter and last sequence number in the local 372 context to the values used in the packet index estimate in Step 2. 374 The processing occurring when replay protection is activated has been 375 chosen to maximize resistance to denial of service attacks (i.e., to 376 minimize the receiver's effort in processing spurious packets). 378 3.4 Cryptographic Algorithms 380 Default encryption and authentication algorithms are specified in 381 Sections 6.1 and 7.1. While there are numerous encryption and message 382 authentication algorithms that can be used in SRTP, we define default 383 algorithms in order to avoid the complexity of specifying the 384 encodings for the signaling of algorithm and parameter identifiers. 386 4. Synchronization 388 4.1 Packet Index Determination 390 SRTP implementations use an 'implicit' packet index for sequencing. 391 Receiver-side implementations use the RTP sequence number to 392 reconstruct the correct index (that is, location in the sequence of 393 all RTP packets). The index is defined as s + r * 65,536, where the 394 sequence number is s and the rollover counter is r. 396 A robust approach for the proper use of a rollover counter requires 397 its handling and use to be well defined. In particular, out-of-order 398 RTP packets with sequence numbers close to 65,536 or zero must be 399 properly dealt with. 401 A receiver reconstructs the index i of a packet with sequence number 402 s using the estimate 404 i = 65,536 * t + s, 406 where t is chosen from the set { r-1, r, r+1 } such that i is closest 407 to the value 65,536 * r + s_l. If the value r+1 is used, then the 408 rollover counter r in the cryptographic context is incremented by 409 one. 410 The pseudocode for the algorithm to process a packet with sequence 411 number s follows: 413 if (s_l < 32,768) 414 if (s - s_l > 32,768) 415 set i to s + 65,536 * (r-1) 416 else 417 set i to s + 65,536 * r 418 endif 419 else 420 if (s_l - 32,768 > s) 421 set r to r + 1 422 endif 423 set i to s + r * 65,536 424 endif 425 set s_l to s 427 The index i is used in replay protection (Section 5) when 428 authentication is provided, in encryption (Section 6), and in message 429 authentication (Section 7). 431 This algorithm should be extended by using the information in the 432 authenticated RTCP reports. 434 When RTP authentication is not present, robust synchronization is not 435 possible. In this case, transmission errors or an active attacker may 436 force the receiver to erroneously update his rollover counter and 437 thus to become completely out of synch. It is not possible to protect 438 against active attackers in such case, but it is possible to have an 439 updating policy for the rollover counter which, except in rare cases, 440 is robust with respect to random bit errors. If 'delayed' 441 authentication, e.g. [TESLA], is present, the same holds. There are 442 many updating policies that could be developed, e.g. by call-back 443 from the application layer, but the present work does leave it open 444 as implementation issue. 446 As the rollover counter is 32 bits long, the maximum number of 447 packets in any given SRTP session is 2^48 = 281,474,976,710,656. 448 After that number of SRTP packets have been sent, the sender MUST not 449 send any more packets with that cryptographic context. This 450 limitation enforces a security benefit by providing an upper bound on 451 the amount of traffic that can pass before cryptographic keys are 452 changed. Of course, re-keying mechanisms MUST be triggered before 453 this maximum key lifetime, and key refresh mechanisms MAY be 454 triggered during the key lifetime. 456 Other approaches to sequencing were considered and rejected; please 457 see Section 10.1 for our rationale. 459 4.2. IV Formation for Implicit Header Authentication 461 The encryption uses a block cipher in feedback or segmented integer 462 counter mode, and these both require an initialization vector (IV). 463 There may be several alternatives for the IV formation. To guarantee 464 synchronization and avoid keystream re-use, we only require the SSRC, 465 rollover counter and sequence number, or some function thereof, to be 466 part of the IV. Below, we give a concrete proposal which also 467 provides 'implicit' header authentication, and works with every 468 cipher having at least 128-bit block size. This particular solution 469 also gives a high degree of agreement between bit ordering in the RTP 470 packet header and the IV, simplifying data copying. 472 When implicit header authentication is provided, data from each RTP 473 packet to be encrypted and transmitted, must be included in the IV. 474 This IV shall be computed and supplied as input to the ciphering 475 algorithm. This shall be done by taking information of said RTP 476 packet, the FLAG, and the rollover counter value, and computing the 477 128-bit IV: 479 IV = ROC || FLAG || M || PT || SEQ || TS || SSRC 481 where TS (Timestamp, 32 bits), SEQ (Sequence Number, 16 bits), M 482 (Marker Bit, 1 bit), PT (Payload Type, 7 bits), and SSRC 483 (Synchronization Source, 32 bits) are taken from the current RTP 484 header. ROC is the 32-bit rollover counter from the identified 485 context. FLAG is a 8-bit value which is used to signal additional 486 information. Currently, the only value defined (for RTP) is FLAG = 487 00..0. The value 00..01 is reserved for RTCP and MUST not be used 488 with RTP. 490 With this IV formation, the number of SRTP packets encrypted with any 491 fixed encryption key MUST therefore be no more than 2^48. Otherwise, 492 the size of the ROC ..||..SEQ .. field will not be large enough to 493 avoid keystream reuse. 495 5. Replay Protection 497 Robust replay-protection is possible when direct (non-delayed) 498 authentication of RTP packets is present. 500 A packet is 'replayed' when it is stored by an adversary, and then 501 re-injected into the network. SRTP provides protection against such 502 attacks whenever authentication is provided, through the storage of 503 the indices of the most recently received and authenticated packets. 505 Each SRTP receiver maintains a Replay List, which conceptually 506 contains the indices of all of the packets which have been received 507 and authenticated. In practice, the list can use a 'sliding window' 508 approach, so that a fixed amount of storage suffices for replay 509 protection. SRTP packet indices which are less than s_l * 65,536 - 510 SRTP-WINDOW-SIZE MAY be assumed to have been received, where SRTP- 511 WINDOW-SIZE is a parameter that MUST be at least 64, and which MAY be 512 set to a higher value. 514 The Replay List can be efficiently implemented by using a bitmap to 515 represent which packets have been received, as described in the 516 Security Architecture for IP [KA98a]. 518 If the authentication is of the delayed-type, so is the replay 519 protection (see Section 7). 521 6. Encryption 523 Encryption uses a 'seekable' additive stream cipher, following the 524 Stream Cipher ESP [sc-esp]. The stream ciphers that can be used must 525 be able to efficiently seek to arbitrary locations in their 526 keystream. Ciphers that can do this include SEAL [RC94, RC98], 527 LEVIATHAN [MF00b], and any block cipher run in suitable mode. In 528 particular, AES in counter mode will provide good security, 529 reasonable performance, and conform to emerging U.S. Federal 530 standards. Another mode which fulfils the requirements is f8 mode 531 [ES3D], used together with AES. 533 SRTP encryption consists of generating a keystream segment 534 corresponding to the index of the packet, and then bitwise exclusive- 535 oring that keystream segment into the RTP packet, starting at the 536 first bit of the RTP payload. Decryption is then done the same way, 537 but swapping the roles of the plaintext and ciphertext. The 538 definition of how the keystream is generated, given the index, 539 depends on the cipher and its mode of operation. 541 Such a cipher shows features which are desired in a general scenario, 542 e.g. low computational cost, and speed. It also shows properties 543 which fulfil additional requirements posed by the cellular 544 environment [BCNN00], i.e. preservation of RTP header compression 545 efficiency, and absence of error propagation and message expansion. 547 Hence, we conclude that the proposed profile can be applied to the 548 most general heterogeneous environment. 550 6.1 Defined Ciphers 552 The default cipher is the Advanced Encryption Standard (AES), and we 553 define two modes of running AES, Segmented Integer Counter Mode AES 554 and AES in f8-Mode. Both of these modes provide a simple way of 555 obtaining implicit header authentication through the use of the IV 556 formation described in Section 4.2. The NULL cipher is also defined, 557 to be used when encryption is not required. 559 6.1.1. Counter Mode AES 561 The default cipher SHALL be AES used in the Segmented Integer Counter 562 Mode (SICM) [M00], with a 128-bit key size and a 128-bit block size. 564 Conceptually, counter mode consists of encrypting successive 565 integers. The actual definition is somewhat more complicated, in 566 order to avoid 128 bit integer arithmetic and to randomize the 567 starting point of the integer sequence. Each packet is encrypted with 568 a distinct keystream segment, which is computed as follows. 570 The 128-bit block is divided into three parts: a 64-bit segment 571 prefix, a 32-bit block index, which is incremented to generate a 572 keystream segment, and a 32-bit segment suffix. The segment 573 prefix/suffix pair is unique for each keystream segment. 575 A keystream segment is the concatenation of the output blocks of the 576 cipher in encrypt mode, in which the block indices are in increasing 577 order. Symbolically, each keystream segment looks like 579 E(A || B || C) || E(A || B + 1 mod 2^32 || C) || E(A || B + 2 mod 580 2^32 || C) .. 582 where A, B, and C are segment prefix, block index, and segment 583 suffix, respectively, determined as given below. 585 The offsets are computed from the salting key k_s and the IV (from 586 Section 4.2) by exclusive-oring k_s and the IV, and setting A to the 587 first 64 bits of the result, B as the following 32 and C to the 588 remaining 32 bits of the result. Symbolically, 590 A || B || C = IV XOR k_s. 592 If k_s is less than 128 bits long, then k_s is concatenated with 593 itself as many times as needed in order to form the salt which is 594 added to the IV. If no salting key is used, this is interpreted as 595 k_s = 0. 597 Note that the segment prefix/suffix pair is distinct for each packet 598 which is encrypted, thus ensuring that keystream segments are 599 distinct and non-overlapping. 601 The restriction on the maximum number of RTP packets above ensures 602 the security of the encryption method by limiting the effectiveness 603 of probabilistic attacks [BR98]. 605 The AES has a block size of 128 bits, so 2^32 output blocks are 606 sufficient to generate the 2^7 * 2^32 = 549755813888 bits of 607 keystream needed to encrypt the largest possible RTP packet. 609 6.1.2. AES in f8-Mode 611 To encrypt UMTS (Universal Mobile Telecommunications System, as 3G 612 networks) data, a solution (see [ES3D]) known as the f8-algorithm has 613 been developed. On a high level, the proposed scheme is a variant of 614 Output Feedback Mode (OFB) [HAC], with a more elaborate 615 initialization and feedback function. As in normal OFB, the core 616 consists of a block cipher. We define the use of AES as default block 617 cipher to be used in f8-Mode for RTP encryption, with 128-bit key and 618 block size. 620 Figure 2 shows the structure of an arbitrary b-bit block size cipher, 621 E, running in what we shall call "f8-mode of operation". 623 | 624 | 625 \|/ 626 +------+ 627 | | 628 --->| E | 629 | | | 630 | +------+ 631 | | 632 m --> * |--------------------------- ... -------| 633 | IV' | | | | 634 | | j=1 --> * j=2 --> * ... j=L-1 --> * 635 | | | | | 636 | | --> * --> * ... --> * 637 | \|/ | \|/ | \|/ | \|/ 638 | +------+ | +------+ | +------+ | +------+ 639 | | | | | | | | | | | | 640 k -------->| E | | | E | | | E | | | E | 641 | | | | | | | | | | | 642 +------+ | +------+ | +------+ | +------+ 643 | | | | | | | 644 |------ |-------- | ... ---- | 645 | | | | 646 \|/ \|/ \|/ \|/ 647 S(0) S(1) S(2) . . . S(L-1) 649 Figure 2. f8-mode of operation (asterisk, *, denotes bitwise XOR). 651 Let E(k,B) be the 128-bit output of E in encrypt mode when applied to 652 the 128-bit key k and 128-bit plaintext block B. Let IV, IV', S(j), 653 and m denote 128-bit blocks, determined below. 655 The S() keystream for an n-bit message is defined by setting IV' = 656 E(k XOR m, IV), and S(-1) = 00..0. For j = 0,1,.., L-1 where L = 657 n/128 (rounded up to nearest integer) compute 659 S(j) = E(k,IV' XOR j XOR S(j-1)), (Eq. 1) 661 Notice that the IV (as defined in Section 4.2) is not used directly. 662 Instead it is fed through E under another key to produce an internal, 663 "salted" value (denoted IV') to prevent an attacker from gaining 664 known input/ouput pairs, and the role of the internal counter is to 665 prevent short keystream cycles. The value of the key mask m is 666 defined to be 668 m = k_s || 0x555..5, 670 i.e. the salting key, padded with the binary pattern 0101.. to fill 671 the 128-bit key size. (If no salting key is used, m = 0x55..5.) 673 The maximum allowable packet size can be determined as follows. The 674 AES has a block size of 128 bits. Assuming that AES behaves like a 675 random function, it is (heuristically) secure to generate about 2^64 676 output blocks, which is sufficient to generate the 2^71 bits of 677 keystream. In practice though, the counter j above will often be 678 sufficient if implemented as a 16- or 32-bit counter. In fact, for 679 some security margin, other methods SHOULD be used if packets of size 680 exceeding 2^32 * 128 = 549755813888 bits are to be encrypted. 682 6.1.3. NULL Cipher 684 The NULL cipher is used when no confidentiality is requested. The 685 keystream can be thought of as "000..0", e.g. the encryption simply 686 copies the plaintext input into the ciphertext output. 688 7. Message Authentication 690 Message integrity and source origin authentication are optional 691 functions provided by SRTP. 693 We need to distinguish the case of instant (non-delayed) 694 authentication from the delayed one. The reason being that some 695 source origin authentication schemes for groups do not allow for 696 packets to be authenticated until at a certain later time. 698 Note that bit-errors during transmission can in general not be 699 distinguished from active attacks, nor does the processing below make 700 attempts to do so. 702 Authentication SHOULD be provided by SRTP. The fact that 703 authentication is optional is motivated by the fact that, while the 704 function is typically highly desired, there are certain cases 705 (notably in cellular environments) where it has an impact in terms of 706 cost, as motivated in [BCNN00]. In those cases, it is up to the user 707 security profile to request authentication. 709 7.1 Non-delayed Message Authentication 711 Integrity can be provided by any message authentication code, though 712 the default value is UMAC [KBHHKR00]. 714 The authentication tag is computed by applying the UMAC function to 715 the Authenticated Portion of the SRTP packet. 717 The authentication tag is appended to the RTP packet. This expansion 718 of the RTP packet may cause the packet size to exceed the Maximum 719 transmission Unit (MTU) of a network interface on its path, 720 especially in circumstances when the application is attempting to 721 'optimize' the size of packets. MTU path discovery SHOULD be used to 722 avoid this problem. 724 7.1.1 Default MAC: UMAC 726 The default message authentication code is UMAC [KBHHKR00], which has 727 proven security properties and is quite fast. Furthermore, it can be 728 used with short (e.g., two or four byte) authentication tags, as well 729 as larger tags. 731 UMAC is a parameterized algorithm (see Section 2.1 of [KBHHKR00]). 732 The default selection of UMAC is UMAC-2/4/128/16/BIG/SIGNED, whose 733 parameters are: 735 WORD-LEN 2 736 UMAC-OUTPUT-LEN 4 737 L1-KEY-LEN 128 738 UMAC-KEY-LEN 16 739 ENDIAN-FAVORITE BIG 740 L1-OPERATIONS-SIGN SIGNED 742 This choice of parameters is intended to work well on low-power 743 processors, to minimize packet expansion (e.g. needed in voice-over- 744 IP type of applications), and to minimize the size of the 745 cryptographic context. The WORD-LEN of two will work well on 16 bit 746 and higher processors. The packet expansion is determined by the 747 UMAC-OUTPUT-LEN to be only four bytes. The storage requirement, per 748 cryptographic context, is 144 bytes. These parameters ensure a 749 forgery probability of no greater than 1/2^30 for each individual 750 packet. Please see the security considerations section in [KBHHKR00] 751 and the references therein for a more detailed discussion. 753 7.2 Delayed Message Authentication 755 Some authentication schemes, in particular ones providing source 756 origin authentication in groups, may only allow verification of the 757 authenticity of a received packet until a few moments later; the 758 authentication is delayed. 760 This leaves open possibilities for handling detected authentication 761 failures according to several policies. Clearly, detected failures 762 MUST be signaled to the application, but how to handle them is left 763 to the application. 765 Currently the only authentication of this type specified for use with 766 SRTP is TESLA. 768 7.2.1 TESLA 770 This primitive enables receivers in a group scenario to efficiently 771 (with symmetric cryptography) verify the identity of a claimed 772 sender, though the entire group shares the key(s). Source Origin 773 Authentication (SOA) is provided by an interface towards TESLA. We 774 refer to [PCBTS,TESLA] for details. 776 The main issues with TESLA are that it requires 'loose' time 777 synchronization between sender/receiver and buffering both at sender 778 and receiver end. (The sender-buffering version actually allows non- 779 delayed authentication at the receiver.) 781 The buffering means that it is not possible to have true real-time 782 communication. Moreover, the buffer size depends on the network 783 round-trip time (RTT). The buffers required will in general need to 784 store the packets received during a time interval roughly 785 proportional to the RTT. 787 7.3 Compound Authentication Tag 789 There may be cases when several authentication schemes are used 790 together, for instance both a non-delayed message authentication code 791 and a delayed SOA in a group scenario. If so, the sender simply 792 concatenates each of the authentication tag information into the 793 (compound) authentication tag according to previous agreement between 794 the parties. Similarly, the receiver parses the compound tag, and the 795 overall authentication MUST be signaled as 'AUTHENTICATION FAILURE' 796 if, and only if, at least one of the individual verifications fail. 798 8. SRTP Parameters 800 The SRTP-WINDOW-SIZE is defined to be at least 64 (Section 5). 802 The current defined modes are Segmented Integer Counter Mode 803 (default), f8 Mode (Section 6), and the NULL Cipher. The default 804 cipher is AES (Section 6), used with a block- and encryption key size 805 of 128 bits. 807 The current defined authentication function is UMAC- 808 2/4/128/16/BIG/SIGNED. 810 The SPI field is not used per default. 812 9. Secure RTCP 814 Secure RTCP follows the definition of Secure RTP, but defines the 815 index and IV differently. In order to differentiate these quantities, 816 we refer to it as the SRTCP index and IV. 818 SRTCP is defined as a profile of RTCP, and it adds two mandatory new 819 fields to the RTCP packet definition, the SRTCP index and the 820 authentication tag, and one optional new field, the SPI. Those fields 821 are appended to an RTCP packet in order to form an equivalent SRTCP 822 packet, so that they follow any other profile specific extensions. An 823 SRTCP packet is illustrated in Figure 3. 825 0 1 2 3 826 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 827 +-->+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 828 | |V=2|P| RC | PT=SR=200 | length | 829 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 830 | | SSRC of sender | 831 | +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 832 | | | ... | 833 | | | sender info | 834 | | | ... | 835 | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 836 | | | ... | 837 | | | report block 1 | 838 | | | ... | 839 | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 840 | | | ... | 841 | | | report block 2 | 842 | | | ... | 843 | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 844 | | | | 845 | | | ... | 846 | | | | 847 | | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 848 | | | ... | 849 | | | profile-specific extensions | 850 | | | ... | 851 | +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ 852 | | | SRTCP index | 853 | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 854 | | | SPI (optional) | 855 +-|>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 856 | | | ... | 857 | | | authentication tag | 858 | | | ... | 859 | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 860 | | 861 | +-- Encrypted Portion 862 +---- Authenticated Portion 863 Figure 3. The format of a Secure RTCP packet, after Section 6.3.1 of 864 [SCFJ96]. In this case, the underlying RTCP packet is a sender report 865 packet; the SRTP format is identical for other RTCP packet types. 867 The added fields are: 869 SRTCP index: 32 bits, mandatory 870 As we allow both encrypted and non-encrypted packets belonging 871 to the same flow (see discussion below), indices with their 872 most significant bit set to '1' are reserved for encrypted 873 packets, and indices with most significant bit set to '0' are 874 used for non-encrypted packets. With this restriction, the 875 rest of the bits are set to zero before the first SRTCP packet 876 is sent, and is incremented by one after each SRTCP is sent. 877 Except for differences in the most significant bit, SRTCP 878 indices form a strictly increasing sequence. The index is 879 explicitly included in each packet, in contrast to the 880 'implicit' index approach used for SRTP. 882 Security Parameter Index (SPI): 32 bits, optional 883 The SPI is used to determine the cryptographic context for the 884 current packet. Use of authentication and the SPI field is 885 determined during session establishment. 887 Authentication Tag: variable length, mandatory 888 The authentication tag shall be used to carry message 889 integrity/source authentication data. The Authenticated 890 Portion of an SRTCP packet consists of the entire equivalent 891 RTP packet, SRTCP index, and SPI when present. 893 The Encrypted Portion of an SRTCP packet consists of the RTCP payload 894 of the equivalent RTCP packet. 896 SRTCP packet processing is identical to that of SRTP packet 897 processing, with the following changes: 899 * SRTCP replay protection is as defined in Section 5, but using the 900 SRTCP index as the index i. 902 * SRTCP encryption is as defined in Section 6, but using the 903 definition of the SRTCP Encrypted Portion as defined in this section, 904 using the SRTCP index as the index i, and the IV as defined in this 905 section. 907 * The SRTCP authentication tag is defined as in Section 7, but with 908 the Authenticated Portion of the SRTCP packet defined in this 909 section, and using the SRTCP index as the index i. SRTCP 910 authentication is mandatory. 912 * SRCTP decryption is performed as in Section 6, but only if the 913 SRTCP index has its most significant bit equal to 1. If so, the 914 encrypted portion is decrypted, using the SRTCP index as the index i, 915 and the IV as defined in this section. In case the most significant 916 bit of the index is 0, the payload is simply copied. 918 The IV for ciphers using 128-bit block size is formed in the 919 following way: 921 IV = SRTCP index || FLAG || PT || 0..0 || SSRC 923 where PT (Payload Type, 8 bit), and SSRC (Synchronization Source, 32 924 bits) are taken from the first header in the RTCP compound packet. 925 SRTCP index is the added 32-bit index to the packet. A pad of 48 926 zeros is inserted between the PT and the SSRC. 928 FLAG is a 8-bit value which is used to signal additional information. 929 Currently, the only value defined (for RTCP) is FLAG = 00..01. The 930 value 0..0 is reserved for RTP and MUST not be used for RTCP. This 931 allows to use the same key for related RTP and RTCP flows (being the 932 IV unique). 934 Then this IV is treated in the same way as defined in Section 6, 935 according to the chosen encryption mode. 937 The encryption prefix (Section 6.1 of [SCFJ96]), which is a random 938 32-bit quantity intended to improve privacy, SHOULD NOT be used. This 939 is because SRTP encryption uses an additive stream cipher, and thus 940 the prefix offers no benefit. 942 The maximum number of SRTCP packets with a fixed key is limited to 943 2^31 = 2,147,483,648. The last RTCP packet MUST contain an RTCP BYE. 944 SRTCP senders MUST send an RTCP BYE in the final packet, if the 945 maximum number of SRTCP packets is reached. Similarly, SRTCP 946 receivers MUST act as though the last RTCP packet included a BYE, 947 even if no BYE was included in the packet, if the maximum number of 948 SRTCP packets is reached for a fixed key. 950 Authentication MUST be required for RTCP, being it the control 951 protocol (e.g., it has a BYE packet). Moreover, the cost for RTCP 952 authentication is not of the same order of RTP authentication, being 953 the session bandwidth allocated to RTCP recommended at 5%. However, 954 when adding authentication to RTCP, the overhead in bandwidth SHOULD 955 be considered (it will be more than 5%). 957 It is allowed to split a compound RTCP packet into two lower-layer 958 packets, one to be encrypted and one to be sent in the clear, as 959 described in Section 9.1 of [SCFJ96]. Encryption/non-encryption is 960 signaled by the most significant bit of the SRTCP index as described 961 above. 963 10. Rationale 965 SRTP achieves high throughput and low packet expansion by using fast 966 stream ciphers for encryption, an implicit index for synchronization, 967 and universal hash functions for message authentication. SRTP shows 968 to be a suitable choice for the most general scenario, and to fit 969 also the most demanding one, conversation multimedia over wireless, 970 having it the necessary robustness properties. 972 Only a single header extension may be appended to the RTP data 973 header, so the use of a header extension for SRTP was avoided. SRTP 974 and SRTCP are defined as profiles of RTP and RTCP, respectively. 976 10.1 Synchronization 978 RTP typically runs over unreliable transport. Thus, maintaining 979 synchronization of the cryptographic context between the sender and 980 the receiver is a conspicuous challenge. Because of the requirement 981 to minimize packet expansion, no explicit sequencing information 982 should be added. RTP packets contain two fields for synchronization 983 purposes, the timestamp and the sequence number. The timestamp field 984 could be used for cryptographic synchronization in some 985 circumstances. However, this field is not appropriate for such use. 986 From [SCFJ96]: 988 Several consecutive RTP packets may have equal timestamps if they are 989 (logically) generated at once, e.g., belong to the same video frame. 990 Consecutive RTP packets may contain timestamps that are not monotonic 991 if the data is not transmitted in the order it was sampled, as in the 992 case of MPEG interpolated video frames. 994 The RTP sequence number might be directly used as a unique identifier 995 for SRTP packets. However, it has only 16 bits, which would limit the 996 duration of an SRTP security association to only 64,536 packets, 997 asking therefore for relatively frequent re-keying. 999 The 'implicit index' approach works as long as the reorder and loss 1000 of the packets is not too great. In particular, 32,768 packets would 1001 need to be lost, or a packet would need to be 32,768 packets out of 1002 sequence in order for synchronization to be lost. Such drastic loss 1003 or reorder is likely to disrupt the RTP application itself. 1005 When a participant joins an SRTP session while that session is in 1006 progress, the entire cryptographic context except for the replay list 1007 is sent to that participant. See also Section 12. 1009 10.2 Replay Protection 1011 Replay protection is undoubtedly important for multimedia data, and 1012 SHOULD be provided. Otherwise, it would be possible for an adversary 1013 to perform simple manipulations on data that subverted security. For 1014 example, in a voice application, the phrase "yes" could be 1015 substituted for "no" if replay protection were not present. However, 1016 there are certain scenarios, e.g. conversation multimedia, where it 1017 may be difficult to perform such a kind of attacks. Moreover, to be 1018 useful, replay protection needs to be based on an authentication 1019 mechanism (i.e., authentication of the sequence number of the RTP 1020 header), and this has a cost when cellular links are involved on the 1021 path. 1023 10.3 Source Origin Authentication considerations 1025 Normally, SOA can be done using signatures. However, this has high 1026 impact in terms of bandwidth and processing time, therefore we do not 1027 consider signatures in the discussion. 1029 The presence of mixers and translators does not allow source 1030 authentication in case the RTP payload and/or the RTP header are 1031 manipulated. Note that this type of middle entities also disrupts 1032 end-to-end confidentiality (being the IV formation dependent e.g. on 1033 the RTP header preservation). 1035 Examples of the mixer and translator scenarios include a translator 1036 re-encoding data at a lower rate or in a different encoding, and a 1037 mixer combining the audio streams of multiple speakers in a 1038 teleconference. In these cases, it is not clear that meaningful 1039 source origin authentication is possible, as the data that is 1040 received is not the same as the data that is signed/authenticated. If 1041 the translator is trusted by the receivers, then it could sign or re- 1042 sign the data streams, but this scenario may not be prevalent. It may 1043 be possible to devise a signing scheme that authenticates the source 1044 but not the content (enabling the receivers to know that "John is one 1045 of the people talking", but not providing authentication on who said 1046 what) by signing the concatenation of the Contributing source (CSRC) 1047 field and some sequencing information (e.g., a timestamp or sequence 1048 number), but such schemes require synchronization between the 1049 senders. This synchronization is not required by the RTP protocol 1050 itself, and may be difficult or impossible to arrange. 1052 A scheme, namely TESLA [PCBTS], has been recently developed to 1053 provide a secure sender authentication mechanism for multicast or 1054 broadcast data streams based mainly on symmetric techniques, see 1055 Section 7. 1057 10.4 Choice of Encryption Transform 1059 When adopting a block cipher mode to produce keystreams, the central 1060 ingredient is the block cipher which is its core. As far as modern 1061 cryptology knows, the security basically stands (and falls) with the 1062 security of the block cipher. This means that if a weakness is found, 1063 replacing the block cipher with a new one will most likely remedy the 1064 security problems. We define AES (Rijndael) [AES] as default block 1065 cipher, as it is widely believed to be secure. 1067 11. Security Considerations 1069 The security of UMAC is well understood, and is described in 1070 [KBHHKR00]. 1072 Additive ciphers do not provide any security service other than 1073 privacy. In particular, they do not provide message authentication 1074 (see [RK99] or [S96] for a discussion of this security service). 1075 However, SRTP uses a message authentication code to provide that 1076 security service. 1078 By using 'seekable' stream ciphers, SRTP avoids the denial of service 1079 attacks that are possible on stream ciphers that lack this property 1080 (these attacks are described in Section 3.4 of [B96]). 1082 No bit of keystream in an additive stream cipher should ever be used 1083 to encrypt multiple distinct plaintext bits. Such keystream reuse 1084 (jokingly called a 'two-time pad' system by cryptographers), can 1085 seriously compromise security. The NSA's VENONA project [C99] 1086 provides a historical example of such a compromise. In SRTP, a 'two- 1087 time pad' is avoided by requiring the key or the IV to be unique. 1089 An SSRC and transport addresses are mapped to a unique crypto context 1090 (with additional information in case re-keying/key refresh are in 1091 place). Multiple crypto contexts may contain identical keys; in this 1092 case, each context together with data from the RTP header MUST 1093 produce a unique IV (which is typically assured by plugging the 1094 unique SSRC in the IV). 1096 If manual keying is used, two different cryptographic contexts might 1097 accidentally use the same encryption key with non-negligible 1098 probability, through manual error or procedural inadequacies. Thus, 1099 manual keying SHOULD NOT be used for SRTP (or SRTCP). 1101 An additive stream cipher is vulnerable to attacks that use 1102 statistical knowledge about the plaintext source to enable key 1103 collision and time-memory tradeoff attacks [MF00,H80,Bi96]. These 1104 attacks take advantage of commonalities among plaintexts, and provide 1105 a way for a cryptanalyst to amortize the computational effort of 1106 decryption over many keys, thus reducing the effective key size of 1107 the cipher. A detailed analysis of these attacks and their 1108 applicability to the encryption of Internet traffic is provided in 1109 [MF00]. In summary, the effective key size of SRTP when used in a 1110 security system in which m distinct keys are used, is equal to the 1111 key size of the cipher less the logarithm (base two) of m. Protection 1112 against such attacks can be provided simply by increasing the size of 1113 the keys used, which here can be accomplished by the use of the 1114 "salting key". 1116 In order to provide an effective key size of n bits in a deployment 1117 in which 2^m SRTP/SRTCP cryptographic contexts will be created, the 1118 true key size will need to be n+m bits. The value of m SHOULD be 32 1119 bits for networks with 50,000 connections (fully meshed networks with 1120 up to 200 devices), and SHOULD be 64 bits for networks with 49e+12 1121 connections (fully meshed networks with up to 7,000,000 devices). 1122 These choices of m ensures that key collision attacks amortized over 1123 a ten year period offer no advantage over exhaustive search, when new 1124 SRTP keys are established for every connection every hour (note that 1125 such an attack requires the storage of all network traffic over the 1126 ten year period). These choices will suffice for many networks, 1127 though SRTP deployments with more stringent security requirements 1128 will need to make a detailed assessment of those requirements with 1129 respect to the attacks described in [MF00]. 1131 Implementations SHOULD use keys that are as large as possible. Please 1132 note that in many cases increasing the key size of a cipher does not 1133 affect the throughput of that cipher. 1135 It is an important point that the m bits of 'extra' key provided to 1136 thwart these attacks need not be private. In jurisdictions with 1137 mandated limits on the length of a secret key, the additional key 1138 bits could be made public. This is because those bits are 1139 functionally equivalent to the 'salt' that is used to protect 1140 passwords from dictionary attacks. The fact that the 'extra' key bits 1141 are distinct for many different keys defeats the key collision and 1142 time-memory tradeoff attacks by reducing the number of keys over 1143 which cryptanalytic computation can be amortized. 1145 Note that other security protocols which use additive ciphers for the 1146 encryption of Internet traffic (e.g., SSL, TLS, SSH, IPsec) are also 1147 vulnerable to the attacks described in [MF00]. Those attacks are 1148 generic to additive encryption of redundant plaintext, and are not 1149 particular to SRTP. 1151 11.1 SSRC collision 1153 Assume that two or more communication parties use the same key. 1154 Though RTP implements an SSRC collision detection mechanism, it is 1155 impossible to guarantee that two parties do not accidentally choose 1156 the same SSRC and send a few packets before the collision is 1157 detected. In a very unfortunate case, the IV formation in Section 4.2 1158 could in fact make the keystreams collide and we have a 'two-time 1159 pad'. This is probably a bigger problem in the case of group 1160 communication when a single group key is desired. See also some 1161 administrative issues with SSRC collisions in Section 12. 1163 11.2. Confidentiality of the RTP Payload 1165 It is important to be aware that, as with any stream cipher, the 1166 exact length of the payload is revealed by the encryption. This means 1167 that it may be possible to deduce certain "formatting bits" of the 1168 payload, as the length of the codec output might vary due to certain 1169 parameter settings etc. This, in turn, implies that the corresponding 1170 bit of the keystream can be deduced. However, if the stream cipher is 1171 secure, knowledge of a few bits of the keystream will not aid an 1172 attacker in predicting the following keystream bits. Thus, the 1173 payload length (and information deducible from this) will leak, but 1174 nothing else. 1176 11.3. Confidentiality of the RTP Header 1178 With the described proposal, RTP headers are sent in the clear to 1179 allow for header compression. This means that data such as payload 1180 type, synchronization source identifier, and timestamp are available 1181 to an eavesdropper. Moreover, since RTP allows for future extensions 1182 of headers, we cannot foresee what kind of possibly sensitive 1183 information might also be "leaked". 1185 The described proposal is a low-cost method, which allows header 1186 compression to reduce bandwidth. It is up to the endpoints policies 1187 to decide about the security scheme to employ. If the header 1188 compression is omitted, other solutions might be applicable, e.g. 1189 [sc-esp]. In other words, we provide a solution that works in the 1190 most general scenario, even in the most demanding one (like 1191 conversational multimedia over low- bandwidth, unreliable media). Of 1192 course the solution will then also work in less restricted 1193 environments, but we suggest that if one really needs to protect 1194 headers, and is allowed to do so by the surrounding environment, then 1195 he should also look at alternatives. In addition, we strongly 1196 recommend the use of profiles to select the right trade-off for the 1197 required level of security. 1199 11.4 Integrity of the RTP header 1201 The IV formation in Section 4.2, which depends on the RTP header, 1202 provides an 'implicit' authentication of that header, which is useful 1203 when the authentication option is not present. This is because any 1204 attack which modifies the header of such a packet will cause the SRTP 1205 receiver to use an incorrect IV in the decryption step, with the 1206 result that the decrypted RTP payload will be essentially random. 1208 12. Multicast and Many-to-many 1210 The scheme described here can be also used in case a single, unique 1211 set of keys is shared by all the media sessions belonging to the same 1212 multimedia session, for a low complexity key management. However, in 1213 this case there must be a way to assure that each SSRC is unique also 1214 among all the RTP sessions inside that multimedia session, to avoid 1215 unlucky IV combinations and end up in two-time padding. This is a 1216 light and feasible solution in several scenarios, e.g. one sender 1217 only, streaming, and unicast. 1219 Some special consideration arise when the SSRC is part of the 1220 identifier for the correct cryptographic context. In multicast and in 1221 many-to-many scenarios, to use the same group key for the multimedia 1222 session and the IV formation suggested in Section 4.2., there MUST be 1223 a way to guarantee uniqueness of the SSRC before starting sending. 1224 Otherwise, the triggering of the anti-collision mechanism will ask 1225 for a change in the SSRCs of the parties that happened to have the 1226 same SSRC, e.g giving trouble in pointing to the right context. 1228 The problem remains, how to address the context after the anti- 1229 collision algorithm has changed the SSRCs. Section 3.3 defines the 1230 use of SSRC and Transport Addresses of that packet as selectors to 1231 the database. In case of UDP, the unchanged transport addresses can 1232 be a good indicator that a collision, followed by anti-collision 1233 triggering, has happened. So, simply try decryption until a RTCP 1234 message confirms the change in the SSRC on that transport addresses 1235 and then update the database selectors. 1237 If the requirement of unique SSRC inside that multimedia session 1238 cannot be guaranteed (e.g., for large groups), then a unique key per 1239 sender might be used. The requirement then becomes to have the SSRC 1240 unique per sender, which appears to be feasible enough. However, the 1241 same consideration on the anti-collision algorithm triggering 1242 applies. 1244 13. Key Management Considerations 1246 13.1. Security parameters 1248 SRTP is a Security Protocol, and it is decoupled from key management. 1249 There is work done in IETF to define key management schemes, e.g. 1250 IPSEC WG, MSEC WG, TLS, etc. 1252 The key management scheme has to provide SRTP with the initial 1253 security parameters for the cryptographic context: correct encryption 1254 and salting keys, mode of operation and cipher, authentication 1255 algorithm(s) and key(s), (source origin) authentication algorithms 1256 and parameters, and to maintain the mapping of context identifiers 1257 (SSRC, adresses, SPI etc) to the actual context. 1259 The initial value for the ROC must also be agreed upon (0 is 1260 default). s_l is initially 0, the replay list is initially empty. 1262 When a newcomer joins an already existing group, all the 1263 cryptographic context except the replay list MUST be passed to him, 1264 unless backward security (disclosure of previous communication to the 1265 newcomer) is wanted, in which case (and if the key management 1266 supports backward security) re-keying is triggered in a way to ensure 1267 it. 1269 Key refresh SHOULD be supported, using RTP sequence number and ROC as 1270 a basis for refresh. 1272 A re-keying mechanism SHOULD be supported, both to allow flexible 1273 access control to media, and also to enable long sessions that would 1274 otherwise force the cryptographic core into degeneration by 1275 'exhausting' the key(s). 1277 13.2. SDP Attribute Support 1279 SRTP is defined as an RTP profile, and, as such, its use has to be 1280 signaled inside the Session Description Protocol (SDP) [SDP], when 1281 SDP is used to carry the description of the media sessions. 1283 An example of the profile's announce is the following: 1285 m = audio 5004 RTP/SAVP 9 1287 SAVP indicates the use of the SRTP and AVP profiles. 1289 If the SRTP profile is to be applied (being it announced in the "m=" 1290 line), then the necessary security parameters follow in a 1291 correspondent attribute: 1293 a=x-kxg-sec:SRTP 1295 where 1297 = "null" | "CM_AES" | "f8_AES" | .. 1298 = "null" | "SRTP UMAC" | "TESLA".. 1300 is an identifier used to select an encryption scheme. A set 1301 of standard encryption schemes must be defined and assigned a number 1302 each. Defined values are "null", "CM_AES", and "f8_AES". "CM_AES" is 1303 the default value. 1305 is an identifier used to select an authentication scheme. 1306 Defined values are "null" and "SRTP UMAC". SRTP_UMAC is defined as 1307 UMAC-2/4/128/16/BIG/SIGNED, see also [SRTP]. The default value is 1308 "SRTP_UMAC". 1310 The is the base64 encoded salting key. This key may be 1311 in clear text. If it needs to be protected, it is recommended that 1312 the master key is extended so that the salting key can be derived 1313 from the extra bits. 1315 Moreover, in case of dynamic groups, where members may join/leave, it 1316 is necessary to pass the rollover counter. The SPI has to be agreed 1317 on. 1319 Using the IV formation suggested in Section 4.2., the same encryption 1320 key is used for securing RTP and related RTCP streams. The same 1321 authentication key MAY be used for RTP and related RTCP streams. 1323 14. Acknowledgements 1325 The authors would like to thank Magnus Westerlund, Mark Baugher, 1326 Brian Weis, and Adrian Perrig for their reviews and comments. 1328 15. Author's Addresses 1330 Questions and comments about this memo can be directed to: 1332 David A. McGrew 1333 David Oran 1334 Cisco Systems, Inc. 1335 San Jose, CA 95134-1706 USA 1336 mcgrew@cisco.com, oran@cisco.com 1338 Rolf Blom 1339 Elisabetta Carrara 1340 Mats Naslund 1341 Karl Norrman 1342 Ericsson Research 1343 {rolf.blom, elisabetta.carrara, mats.naslund, 1344 karl.norrman}@era.ericsson.se 1346 16. References 1348 [AES] NIST, "Advanced Encryption Standard (AES)", 1349 http://csrc.nist.gov/encryption/aes/ 1351 [B97] Bradner, S., "Key words for use in RFCs to Indicate 1352 Requirement Levels", RFC 2119, March 1997. 1354 [BCNN00] Blom, R., Carrara, E., Naslund, M., and Norrman, K., 1355 "Conversational Multimedia Security in 3G Networks", Internet Draft, 1356 November 2000, . 1358 [BF00] Boneh, D., and Franklin, M., "Message Authentication in a 1359 Multicast Environment", the Proceedings of the Seventh Annual 1360 Workshop on Selected Areas in Cryptography (SAC 2000), Springer- 1361 Verlag. 1363 [C99] Crowell, W. P., "Introduction to the VENONA Project", 1364 http://www.nsa.gov:8080/docs/venona/index.html. 1366 [ES3D] ETSI SAGE 3GPP Standard Algorithms Task Force, "Security 1367 Algorithms Group of Experts (SAGE); General Report on the Design, 1368 Specification and Evaluation of 3GPP Standard Confidentiality and 1369 Integrity Algorithms", Public report, Draft Version 1.0, Dec 1999. 1371 [ES3E] ETSI SAGE 3GPP Standard Algorithms Task Force, "Security 1372 Algorithms Group of Experts (SAGE) Report on the Evaluation of 3GPP 1373 Standard Confidentiality and Integrity Algorithms", Public report, 1374 Draft Version 1.0, Dec 1999. 1376 [HAC] Menezes, A., Van Oorschot, P., and Vanstone, S., "Handbook of 1377 Applied Cryptography", CRC Press, 1997, ISBN 0-8493-8523-7. 1379 [H80] Hellman, M. E., "A cryptanalytic time-memory trade-off", IEEE 1380 Transactions on Information Theory, July 1980, pp. 401-406. 1382 [KA98a] Kent, S., and R. Atkinson, "Security Architecture for IP", 1383 RFC 2401, November 1998. 1385 [KBHHKR00] Krovetz, T., Black, J., Halevi, S., Hevia, A., Krawczyk, 1386 H., Rogaway, P., "UMAC: Message Authentication Code using Universal 1387 Hashing", Internet Draft, October 2000, . 1389 [LRW00] Lipmaa, H., Rogaway, P., and Wagner, D., "Comments to NIST 1390 Concerning AES Modes of Operation: CTR-Mode Encryption", NIST 1391 Workshop on AES Modes of Operation, 1392 http://csrc.nist.gov/encryption/aes/modes/lipmaa-ctr.pdf 1394 [M00] McGrew, D., "Segmented Integer Counter Mode: Specification 1395 and Rationale", NIST Workshop on AES Modes of Operation, 1396 http://www.mindspring.com/~dmcgrew/sic-mode.pdf 1398 [MF00] McGrew, D., and Fluhrer, S., "Attacks on Encryption of 1399 Redundant Plaintext and Implications on Internet Security", the 1400 Proceedings of the Seventh Annual Workshop on Selected Areas in 1401 Cryptography (SAC 2000), Springer-Verlag. 1403 [MF00b] McGrew, D., and Fluhrer, S., "The Stream Cipher LEVIATHAN: 1404 Specification and Supporting Documentation", Submission to the New 1405 European Schemes for Signatures, Integrity, and Encryption (NESSIE) 1406 Process, October, 2000http://www.cryptonessie.org/. 1408 [R92] Rueppel, R., "Stream Ciphers", Chapter 2 of Simmons, G., 1409 "Contemporary Cryptology: the Science of Information Integrity," 1410 1992, IEEE Press. 1412 [RC94] Rogaway, P. and Coppersmith, D., "A Software-Optimized 1413 Encryption Algorithm", Proceedings of the 1994 Fast Software 1414 Encryption Workshop, Lecture Notes In Computer Science, Volume 809, 1415 Springer-Verlag, 1994, pp. 56-63. 1417 [RC98] Rogaway, P. and Coppersmith, D., "A Software-Optimized 1418 Encryption Algorithm", Journal of Cryptology, Volume 11, Number 4, 1419 Springer-Verlag, 1998, Pages 273-287. Also available on the Internet 1420 at http://www.cs.ucdavis.edu/~rogaway/papers/seal-abstract.html. 1422 [RK99] Rescorla, E., and Korver, B., "Guidelines for Writing RFC 1423 Text on Security Considerations," draft-rescorla-sec-cons-00.txt 1425 [S96] Schneier, B. "Applied Cryptography: Protocols, Algorithms, 1426 and Source Code in C", Wiley, 1996. 1428 [sc-esp] McGrew, D., Fluhrer, S., Peyravian, M., "The Stream Cipher 1429 Encapsulating Security Payload", Internet Draft, July 2000 1431 [SCFJ96] Schulzrinne, H., Casner, S., Frederick, R., Jacobson, V., 1432 "RTP: A Transport Protocol for Real-Time Applications", IETF Request 1433 For Comments RFC 1889. 1435 [TESLA] Perrig, A:, Canetti, R., Briscoe, B., Tygar, D., Song, D., 1436 "TESLA: Multicast Source Origin Transform", draft-irtf-smug.tesla- 1437 00.txt 1439 Appendix A 1441 Test vectors 1443 We include in the following some test vectors for f8-AES. 1445 key: 1446 234829008467be186c3de14aae72d62c 1448 salting key || 0x555... : 1449 32f2870d555555555555555555555555 1451 AES-internal expanded key: 1452 23482900 8467be18 6c3de14a ae72d62c 1453 62be58e4 e6d9e6fc 8ae407b6 2496d19a 1454 f080e0d2 1659062e 9cbd0198 b82bd002 1455 05f097be 13a99190 8f149008 373f400a 1456 78f9f024 6b5061b4 e444f1bc d37bb1b6 1457 4931be42 2261dff6 c6252e4a 155e9ffc 1458 31ea0e1b 138bd1ed d5aeffa7 c0f0605b 1459 fd3a37a1 eeb1e64c 3b1f19eb fbef79b0 1460 a28cd0ae 4c3d36e2 77222f09 8ccd56b9 1461 043d86ca 4800b028 3f229f21 b3efc998 1462 ede0c0a7 a5e0708f 9ac2efae 292d2636 1464 AES-internal expanded salting key || 555...: 1465 32f2870d 55555555 55555555 55555555 1466 cf0e7bf1 9a5b2ea4 cf0e7bf1 9a5b2ea4 1467 f43f3249 6e641ced a16a671c 3b3149b8 1468 37045eab 59604246 f80a255a c33b6ce2 1469 dd54c685 843484c3 7c3ea199 bf05cd7b 1470 a6e9e78d 22dd634e 5ee3c2d7 e1e60fac 1471 089f7675 2a42153b 74a1d7ec 9547d840 1472 e8fe7f5f c2bc6a64 b61dbd88 235a65c8 1473 d6b39779 140ffd1d a2124095 8148255d 1474 9f8cdb75 8b832668 299166fd a8d943a0 1475 9c963bb7 17151ddf 3e847b22 965d3882 1477 RTP-packet header fields: 1478 version = 2 1479 padding = 0 1480 extension = 0 1481 CSRC count = 0 1482 marker bit = 0 1483 payload type = 6e 1484 sequence no. = 5cba 1485 timestamp = 50681de5 1486 SSRC = 5c621599 1488 Data from Cryptographic context: 1489 FLAG = 0 1490 Rollover counter = d462564a 1492 IV: 1493 d462564a006e5cba50681de55c621599 1495 IV': 1496 4fee844eedb458a3e2b0c7ed43888cc1 1498 Encryption of bits 0 to 127: 1500 j: 0 1501 S(-1) : 00000000000000000000000000000000 1502 S(-1) XOR IV' : 4fee844eedb458a3e2b0c7ed43888cc1 1503 S(-1) XOR IV' XOR ct : 4fee844eedb458a3e2b0c7ed43888cc1 1504 plain text P[0..127] : 6e915f07cd6f1c0d44afaab4961c7d31 1505 final keystream S(0) : b2d3b3d7e16092de379e33b350582e63 1506 cipher text C[0..127] : dc42ecd02c0f8ed373319907c6445352 1508 Encryption of bits 128 to 255: 1510 j: 1 1511 S(0) : b2d3b3d7e16092de379e33b350582e63 1512 S(0) XOR IV' : fd3d37990cd4ca7dd52ef45e13d0a2a2 1513 S(0) XOR IV' XOR ct : fd3d37990cd4ca7dd52ef45e13d0a2a3 1514 plain text P[128..255] : 7b9daad84352a6d4bcdf501a560832a0 1515 final keystream S(1) : b1ce287dc53c1975de3d7d0500f780ba 1516 cipher text C[128..255] : ca5382a5866ebfa162e22d1f56ffb21a 1518 ------------------------------------------------------------ 1520 This Internet-Draft expires in December 2001.