idnits 2.17.1 

draft-ietf-sip-media-security-requirements-09.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** The document seems to lack a License Notice according IETF Trust
     Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009
     Section 6.b -- however, there's a paragraph with a matching beginning.
     Boilerplate error?

  -- It seems you're using the 'non-IETF stream' Licence Notice instead


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Line 976 has weird spacing: '...ication  along...'

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (January 9, 2009) is 5585 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-07) exists of
     draft-ietf-avt-dtls-srtp-06

  == Outdated reference: A later version (-07) exists of
     draft-ietf-mmusic-media-path-middleboxes-01

  == Outdated reference: A later version (-13) exists of
     draft-ietf-mmusic-sdp-capability-negotiation-09

  == Outdated reference: A later version (-15) exists of
     draft-ietf-sip-certs-07

  == Outdated reference: A later version (-06) exists of
     draft-mcgrew-srtp-ekt-03

  == Outdated reference: A later version (-22) exists of
     draft-zimmermann-avt-zrtp-11

  -- Obsolete informational reference (is this intentional?): RFC 4474
     (Obsoleted by RFC 8224)

  -- Obsolete informational reference (is this intentional?): RFC 4492
     (Obsoleted by RFC 8422)


     Summary: 1 error (**), 0 flaws (~~), 8 warnings (==), 5 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	SIP Working Group                                           D. Wing, Ed.
3	Internet-Draft                                                     Cisco
4	Intended status: Informational                                  S. Fries
5	Expires: July 13, 2009                                        Siemens AG
6	                                                           H. Tschofenig
7	                                                  Nokia Siemens Networks
8	                                                                F. Audet
9	                                                                  Nortel
10	                                                         January 9, 2009

12	    Requirements and Analysis of Media Security Management Protocols
13	             draft-ietf-sip-media-security-requirements-09

15	Status of this Memo

17	   This Internet-Draft is submitted to IETF in full conformance with the
18	   provisions of BCP 78 and BCP 79.

20	   Internet-Drafts are working documents of the Internet Engineering
21	   Task Force (IETF), its areas, and its working groups.  Note that
22	   other groups may also distribute working documents as Internet-
23	   Drafts.

25	   Internet-Drafts are draft documents valid for a maximum of six months
26	   and may be updated, replaced, or obsoleted by other documents at any
27	   time.  It is inappropriate to use Internet-Drafts as reference
28	   material or to cite them other than as "work in progress."

30	   The list of current Internet-Drafts can be accessed at
31	   http://www.ietf.org/ietf/1id-abstracts.txt.

33	   The list of Internet-Draft Shadow Directories can be accessed at
34	   http://www.ietf.org/shadow.html.

36	   This Internet-Draft will expire on July 13, 2009.

38	Copyright Notice

40	   Copyright (c) 2009 IETF Trust and the persons identified as the
41	   document authors.  All rights reserved.

43	   This document is subject to BCP 78 and the IETF Trust's Legal
44	   Provisions Relating to IETF Documents
45	   (http://trustee.ietf.org/license-info) in effect on the date of
46	   publication of this document.  Please review these documents
47	   carefully, as they describe your rights and restrictions with respect
48	   to this document.

50	Abstract

52	   This document describes requirements for a protocol to negotiate a
53	   security context for SIP-signaled SRTP media.  In addition to the
54	   natural security requirements, this negotiation protocol must
55	   interoperate well with SIP in certain ways.  A number of proposals
56	   have been published and a summary of these proposals is in the
57	   appendix of this document.

59	Table of Contents

61	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
62	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
63	   3.  Attack Scenarios . . . . . . . . . . . . . . . . . . . . . . .  5
64	   4.  Call Scenarios and Requirements Considerations . . . . . . . .  8
65	     4.1.  Clipping Media Before Signaling Answer . . . . . . . . . .  8
66	     4.2.  Retargeting and Forking  . . . . . . . . . . . . . . . . .  9
67	     4.3.  Recording  . . . . . . . . . . . . . . . . . . . . . . . . 12
68	     4.4.  PSTN gateway . . . . . . . . . . . . . . . . . . . . . . . 12
69	     4.5.  Call Setup Performance . . . . . . . . . . . . . . . . . . 13
70	     4.6.  Transcoding  . . . . . . . . . . . . . . . . . . . . . . . 13
71	     4.7.  Upgrading to SRTP  . . . . . . . . . . . . . . . . . . . . 14
72	     4.8.  Interworking with Other Signaling Protocols  . . . . . . . 14
73	     4.9.  Certificates . . . . . . . . . . . . . . . . . . . . . . . 15
74	   5.  Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 15
75	     5.1.  Key Management Protocol Requirements . . . . . . . . . . . 15
76	     5.2.  Security Requirements  . . . . . . . . . . . . . . . . . . 17
77	     5.3.  Requirements Outside of the Key Management Protocol  . . . 19
78	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 20
79	   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 20
80	   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20
81	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
82	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 20
83	     9.2.  Informative References . . . . . . . . . . . . . . . . . . 21
84	   Appendix A.  Overview and Evaluation of Existing Keying
85	                Mechanisms  . . . . . . . . . . . . . . . . . . . . . 24
86	     A.1.  Signaling Path Keying Techniques . . . . . . . . . . . . . 25
87	       A.1.1.  MIKEY-NULL . . . . . . . . . . . . . . . . . . . . . . 25
88	       A.1.2.  MIKEY-PSK  . . . . . . . . . . . . . . . . . . . . . . 25
89	       A.1.3.  MIKEY-RSA  . . . . . . . . . . . . . . . . . . . . . . 26
90	       A.1.4.  MIKEY-RSA-R  . . . . . . . . . . . . . . . . . . . . . 26
91	       A.1.5.  MIKEY-DHSIGN . . . . . . . . . . . . . . . . . . . . . 26
92	       A.1.6.  MIKEY-DHHMAC . . . . . . . . . . . . . . . . . . . . . 26
93	       A.1.7.  MIKEY-ECIES and MIKEY-ECMQV (MIKEY-ECC)  . . . . . . . 27
94	       A.1.8.  Security Descriptions with SIPS  . . . . . . . . . . . 27
95	       A.1.9.  Security Descriptions with S/MIME  . . . . . . . . . . 27
96	       A.1.10. SDP-DH (expired) . . . . . . . . . . . . . . . . . . . 27
97	       A.1.11. MIKEYv2 in SDP (expired) . . . . . . . . . . . . . . . 27
98	     A.2.  Media Path Keying Technique  . . . . . . . . . . . . . . . 28
99	       A.2.1.  ZRTP . . . . . . . . . . . . . . . . . . . . . . . . . 28
100	     A.3.  Signaling and Media Path Keying Techniques . . . . . . . . 28
101	       A.3.1.  EKT  . . . . . . . . . . . . . . . . . . . . . . . . . 28
102	       A.3.2.  DTLS-SRTP  . . . . . . . . . . . . . . . . . . . . . . 29
103	       A.3.3.  MIKEYv2 Inband (expired) . . . . . . . . . . . . . . . 29
104	     A.4.  Evaluation Criteria - SIP  . . . . . . . . . . . . . . . . 29
105	       A.4.1.  Secure Retargeting and Secure Forking  . . . . . . . . 29
106	       A.4.2.  Clipping Media Before SDP Answer . . . . . . . . . . . 32
107	       A.4.3.  SSRC and ROC . . . . . . . . . . . . . . . . . . . . . 34
108	     A.5.  Evaluation Criteria - Security . . . . . . . . . . . . . . 36
109	       A.5.1.  Distribution and Validation of Persistent Public
110	               Keys and Certificates  . . . . . . . . . . . . . . . . 36
111	       A.5.2.  Perfect Forward Secrecy  . . . . . . . . . . . . . . . 38
112	       A.5.3.  Best Effort Encryption . . . . . . . . . . . . . . . . 40
113	       A.5.4.  Upgrading Algorithms . . . . . . . . . . . . . . . . . 41
114	   Appendix B.  Out-of-Scope  . . . . . . . . . . . . . . . . . . . . 43
115	     B.1.  Shared Key Conferencing  . . . . . . . . . . . . . . . . . 43
116	   Appendix C.  Requirement renumbering in -02  . . . . . . . . . . . 44
117	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 46

119	1.  Introduction

121	   The work on media security started when the Session Initiation
122	   Protocol (SIP) was still in its infancy.  With the increased SIP
123	   deployment and the availability of new SIP extensions and related
124	   protocols, the need for end-to-end security was re-evaluated.  The
125	   procedure of re-evaluating prior protocol work and design decisions
126	   is not an uncommon strategy and, to some extent, considered necessary
127	   to ensure that the developed protocols indeed meet the previously
128	   envisioned needs for the users on the Internet.

130	   This document summarizes media security requirements, i.e.,
131	   requirements for mechanisms that negotiate security context such as
132	   cryptographic keys and parameters for SRTP.

134	   The organization of this document is as follows: Section 2 introduces
135	   terminology, Section 3 describes various attack scenarios against the
136	   signaling path and media path, Section 4 provides an overview about
137	   possible call scenarios, Section 5 lists requirements for media
138	   security.  The main part of the document concludes with the security
139	   considerations Section 6, IANA considerations Section 7 and an
140	   acknowledgement section in Section 8.  Appendix A lists and compares
141	   available solution proposals.  The following Appendix A.4 compares
142	   the different approaches regarding their suitability for the SIP
143	   signaling scenarios described in Appendix A, while Appendix A.5
144	   provides a comparison regarding security aspects.  Appendix B lists
145	   non-goals for this document.

147	2.  Terminology

149	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
150	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
151	   document are to be interpreted as described in [RFC2119], with the
152	   important qualification that, unless otherwise stated, these terms
153	   apply to the design of the media security key management protocol,
154	   not its implementation or application.

156	   Furthermore, the terminology described in SIP ([RFC3261]) regarding
157	   functions and components are used throughout the document

159	   Additionally, the following items are used in this document:

161	   AOR (Address-of-Record):   A SIP or SIPS URI that points to a domain
162	      with a location service that can map the URI to another URI where
163	      the user might be available.  Typically, the location service is
164	      populated through registrations.  An AOR is frequently thought of
165	      as the "public address" of the user.

167	   SSRC:  The 32-bit value that defines the synchronization source, used
168	      in RTP.  These are generally unique, but collisions can occur.

170	   two-time pad:  The use of the same key and the same keystream to
171	      encrypt different data.  For SRTP, a two-time pad occurs if two
172	      senders are using the same key and the same RTP SSRC value.

174	   Perfect Forward Secrecy (PFS):  The property that disclosure of the
175	      long-term secret keying material that is used to derive an agreed
176	      ephemeral key does not compromise the secrecy of agreed keys from
177	      earlier runs.

179	   active adversary:  An active adversary is able to alter data
180	      communication to affect its operation (see also [RFC4949]).

182	   passive adversary:  A passive adversary is able to learn information
183	      from data communication, but not alter that data communication
184	      (see also[RFC4949]).

186	   signaling path:  The signaling path is the route taken by SIP
187	      signaling messages transmitted between the calling and called user
188	      agents.  This can be either direct signaling between the calling
189	      and called user agents or, more commonly involves the SIP proxy
190	      servers that were involved in the call setup.

192	   media path:  The media path is the route taken by media packets
193	      exchanged by the endpoints.  In the simplest case, the endpoints
194	      exchange media directly, and the "media path" is defined by a
195	      quartet of IP addresses and TCP/UDP ports, along with an IP route.
196	      In other cases, this path may include RTP relays, mixers,
197	      transcoders, session border controllers, NATs, or media gateways.

199	   Moreover, as this document discusses requirements for media security,
200	   the nomenclature R-XXX is used to mark requrements, were XXX is the
201	   requirement, which needs to be met.

203	3.  Attack Scenarios

205	   The discussion in this section relates to requirements R-PASS-MEDIA,
206	   R-PASS-SIG, R-ASSOC, R-SIG-MEDIA, R-ACT-ACT, and R-ID-BINDING.

208	   This document classifies adversaries according to their access and
209	   their capabilities.  An adversary might have access:

211	   1.  only to the media path,
212	   2.  only to the signaling path,

214	   3.  to the media path and to the signaling path.

216	   An attacker that can solely be located along the signaling path, and
217	   does not have access to media (item 2), is not considered in this
218	   document.

220	   There are two different types of adversaries, active and passive.  An
221	   active adversary may need to be active with regard to the key
222	   exchange relevant information traveling along the media path or
223	   traveling along the signaling path.

225	   Based on their robustness against the adversary capabilities
226	   described above, we can group security mechanisms using the following
227	   labels.  This list is generally ordered from easiest to compromise
228	   (at the top) to more difficult to compromise:

230	    +---------------+---------+--------------------------------------+
231	    | SIP signaling |  media  |             abbreviation             |
232	    +---------------+---------+--------------------------------------+
233	    |      none     | passive |      no-signaling-passive-media      |
234	    |      none     |  active |       no-signaling-active-media      |
235	    |    passive    | passive |    passive-signaling-passive-media   |
236	    |    passive    |  active |    passive-signaling-active-media    |
237	    |     active    | passive |    active-signaling-passive-media    |
238	    |     active    |  active |     active-signaling-active-media    |
239	    |     active    |  active | active-signaling-active-media-detect |
240	    +---------------+---------+--------------------------------------+

242	   no-signaling-passive-media:
243	      Access to only the media path is sufficient to reveal the content
244	      of the media traffic.

246	   passive-signaling-passive-media:
247	      Passive attack on the signaling and passive attack on the media
248	      path is necessary to reveal the content of the media traffic.

250	   passive-signaling-active-media:
251	      Passive attack on the signaling and active attack on the media
252	      path is necessary to reveal the content of the media traffic.

254	   active-signaling-passive-media:
255	      Active attack on the signaling path and passive attack on the
256	      media path is necessary to reveal the content of the media
257	      traffic.

259	   no-signaling-active-media:
260	      Active attack on the media path is sufficient to reveal the
261	      content of the media traffic.

263	   active-signaling-active-media:
264	      Active attack on both the signaling path and the media path is
265	      necessary to reveal the content of the media traffic.

267	   active-signaling-active-media-detect:
268	      Active attack on both signaling and media path is necessary to
269	      reveal the content of the media traffic (as with active-signaling-
270	      active-media), and the attack is detectable by protocol messages
271	      exchanged between the end points.

273	   For example, unencrypted RTP is vulnerable to no-signaling-passive-
274	   media.

276	   As another example, Security Descriptions [RFC4568], when protected
277	   by TLS (as it is commonly implemented and deployed), belongs in the
278	   passive-signaling-passive-media category since the adversary needs to
279	   learn the Security Descriptions key by seeing the SIP signaling
280	   message at a SIP proxy (assuming that the adversary is in control of
281	   the SIP proxy).  The media traffic can be decrypted using that
282	   learned key.

284	   As another example, DTLS-SRTP falls into active-signaling-active-
285	   media category when DTLS-SRTP is used with a public key based
286	   ciphersuite with self-signed certificates and without SIP-Identity
287	   [RFC4474].  An adversary would have to modify the fingerprint that is
288	   sent along the signaling path and subsequently to modify the
289	   certificates carried in the DTLS handshake that travel along the
290	   media path.  If DTLS-SRTP is used with both SIP Identity [RFC4474]
291	   and SIP Connected Identity [RFC4916], the RFC4474 signature protects
292	   both the offer and the answer, and such a system would then belong to
293	   the active-signaling-active-attack-detect category (provided, of
294	   course, the signaling path to the RFC4474 authenticator and verifier
295	   is secured as per RFC4474 and the RFC4474 authenticator and verifier
296	   are behaving as per RFC4474).

298	   The above discussion of DTLS-SRTP demonstrates how a single security
299	   protocol can be in different classes depending on the mode in which
300	   it is operated.  Other protocols can achieve similar effect by adding
301	   functions outside of the on-the-wire key management protocol itself.
302	   Although it may be appropriate to deploy lower-classed mechanisms in
303	   some cases, the ultimate security requirement for a media security
304	   negotiation protocol is that it have a mode of operation available in
305	   which is detect-attack, which provides protection against the passive
306	   and active attacks and provides detection of such attacks.  That is,
307	   there must be a way to use the protocol so that an active attack is
308	   required against both the signaling and media paths, and so that such
309	   attacks are detectable by the endpoints.

311	4.  Call Scenarios and Requirements Considerations

313	   The following subsections describe call scenarios that pose the most
314	   challenge to the key management system for media data in cooperation
315	   with SIP signaling.

317	   Throughout the subsections requirements are stated by using the
318	   nomenclature R- to state an explicit requirement.  All of the stated
319	   requirements are explanied in detail in section Section 5.  The
320	   requirements in section Section 5 are listed according their
321	   association to the key management protocol, to attack scenarios, and
322	   requirements which can be met inside the key management protocol or
323	   outside of the key management protocol.

325	4.1.  Clipping Media Before Signaling Answer

327	   The discussion in this section relates to requirement R-AVOID-
328	   CLIPPING and R-ALLOW-RTP.

330	   Per the SDP Offer/Answer Model [RFC3264],

332	      "Once the offerer has sent the offer, it MUST be prepared to
333	      receive media for any recvonly streams described by that offer.
334	      It MUST be prepared to send and receive media for any sendrecv
335	      streams in the offer, and send media for any sendonly streams in
336	      the offer (of course, it cannot actually send until the peer
337	      provides an answer with the needed address and port information)."

339	   To meet this requirement with SRTP, the offerer needs to know the
340	   SRTP key for arriving media.  If either endpoint receives encrypted
341	   media before it has access to the associated SRTP key, it cannot play
342	   the media -- causing clipping.

344	   For key exchange mechanisms that send the answerer's key in SDP, a
345	   SIP provisional response [RFC3261], such as 183 (session progress),
346	   is useful.  However, the 183 messages are not reliable unless both
347	   the calling and called end point support PRACK [RFC3262], use TCP
348	   across all SIP proxies, implement Security Preconditions [RFC5027],
349	   or the both ends implement ICE [I-D.ietf-mmusic-ice] and the answerer
350	   implements the reliable provisional response mechanism described in
351	   ICE.  Unfortunately, there is not wide deployment of any of these
352	   techniques and there is industry reluctance to require these
353	   techniques to avoid the problems described in this section.

355	   Note that the receipt of an SDP answer is not always sufficient to
356	   allow media to be played to the offerer.  Sometimes, the offerer must
357	   send media in order to open up firewall holes or NAT bindings before
358	   media can be received (for details see
359	   [I-D.ietf-mmusic-media-path-middleboxes]).  In this case, even a
360	   solution that makes the key available before the SDP answer arrives
361	   will not help.

363	   Preventing the arrival of early media (i.e., media that arrives at
364	   the SDP offerer before the SDP answer arrives) might obsolete the
365	   R-AVOID-CLIPPING requirement, but at the time of writing such early
366	   media exists in many normal call scenarios.

368	4.2.  Retargeting and Forking

370	   The discussion in this section relates to requirements R-FORK-
371	   RETARGET, R-DISTINCT, R-HERFP, and R-BEST-SECURE.

373	   In SIP, a request sent to a specific AOR but delivered to a different
374	   AOR is called a "retarget".  A typical scenario is a "call
375	   forwarding" feature.  In Figure 1 Alice sends an INVITE in step 1
376	   that is sent to Bob in step 2.  Bob responds with a redirect (SIP
377	   response code 3xx) pointing to Carol in step 3.  This redirect
378	   typically does not propagate back to Alice but only goes to a proxy
379	   (i.e., the retargeting proxy) that sends the original INVITE to Carol
380	   in step 4.

382	                                    +-----+
383	                                    |Alice|
384	                                    +--+--+
385	                                       |
386	                                       | INVITE (1)
387	                                       V
388	                                  +----+----+
389	                                  |  proxy  |
390	                                  ++-+-----++
391	                                   | ^     |
392	                        INVITE (2) | |     | INVITE (4)
393	                    & redirect (3) | |     |
394	                                   V |     V
395	                                  ++-++   ++----+
396	                                  |Bob|   |Carol|
397	                                  +---+   +-----+

399	                           Figure 1: Retargeting

401	   Using retargeting might lead to situations where the User Agent
402	   Client (UAC) does not know where its request will be going.  This
403	   might not immediately seem like a serious problem; after all, when
404	   one places a telephone call on the PSTN, one never really knows if it
405	   will be forwarded to a different number, who will pick up the line
406	   when it rings, and so on.  However, when considering SIP mechanisms
407	   for authenticating the called party, this function can also make it
408	   difficult to differentiate an intermediary that is behaving
409	   legitimately from an attacker.  From this perspective, the main
410	   problems with retargeting are:

412	   Not detectable by the caller:   The originating user agent has no
413	      means of anticipating that the condition will arise, nor any means
414	      of determining that it has occurred until the call has already
415	      been set up.

417	   Not preventable by the caller:  There is no existing mechanism that
418	      might be employed by the originating user agent in order to
419	      guarantee that the call will not be re-targeted.

421	   The mechanism used by SIP for identifying the calling party is SIP
422	   Identity [RFC4474].  However, due to the nature of retargeting SIP
423	   Identity can only identify the calling party (that is, the party that
424	   initiated the SIP request).  Some key exchange mechanisms predate SIP
425	   Identity and include their own identity mechanism (e.g., MIKEY).
426	   However, those built-in identity mechanism also suffer from the SIP
427	   retargeting problem.  While Connected Identity [RFC4916] allows
428	   positive identification of the called party, the primary difficulty
429	   still remains that the calling party does not know if a mismatched
430	   called party is legitimate (i.e., due to authorized retargeting) or
431	   illegitimate (i.e., due to unauthorized retargeting by an attacker
432	   above to modify SIP signaling).

434	   In SIP, 'forking' is the delivery of a request to multiple locations.
435	   This happens when a single AOR is registered more than once.  An
436	   example of forking is when a user has a desk phone, PC client, and
437	   mobile handset all registered with the same AOR.

439	                                  +-----+
440	                                  |Alice|
441	                                  +--+--+
442	                                     |
443	                                     | INVITE
444	                                     V
445	                               +-----+-----+
446	                               |   proxy   |
447	                               ++---------++
448	                                |         |
449	                         INVITE |         | INVITE
450	                                V         V
451	                             +--+--+   +--+--+
452	                             |Bob-1|   |Bob-2|
453	                             +-----+   +-----+

455	                             Figure 2: Forking

457	   With forking, both Bob-1 and Bob-2 might send back SDP answers in SIP
458	   responses.  Alice will see those intermediate (18x) and final (200)
459	   responses.  It is useful for Alice to be able to associate the SIP
460	   response with the incoming media stream.  Although this association
461	   can be done with ICE [I-D.ietf-mmusic-ice], and ICE is useful to make
462	   this association with RTP, it is not desirable to require ICE to
463	   accomplish this association.

465	   Forking and retargeting are often used together.  For example, a boss
466	   and secretary might have both phones ring (forking) and rollover to
467	   voice mail if neither phone is answered (retargeting).

469	   To maintain security of the media traffic, only the end point that
470	   answers the call should know the SRTP keys for the session.  Forked
471	   and re-targeted calls only reveal sensitive information to non-
472	   responders when the signaling messages contain sensitive information
473	   (e.g., SRTP keys) that is accessible by parties that receive the
474	   offer, but may not respond (i.e., the original recipients in a
475	   retargeted call, or non-answering endpoints in a forked call).  For
476	   key exchange mechanisms that do not provide secure forking or secure
477	   retargeting, one workaround is to re-key immediately after forking or
478	   retargeting.  However, because the originator may not be aware that
479	   the call forked this mechanism requires rekeying immediately after
480	   every session is established.  This doubles the number of messages
481	   processed by the network.

483	   Further compounding this problem is a unique feature of SIP that when
484	   forking is used, there is always only one final error response
485	   delivered to the sender of the request: the forking proxy is
486	   responsible for choosing which final response to choose in the event
487	   where forking results in multiple final error responses being
488	   received by the forking proxy.  This means that if a request is
489	   rejected, say with information that the keying information was
490	   rejected and providing the far end's credentials, it is very possible
491	   that the rejection will never reach the sender.  This problem, called
492	   the Heterogeneous Error Response Forking Problem (HERFP) [RFC3326],
493	   is difficult to solve in SIP.  Because we expect the HERFP to
494	   continue to be a problem in SIP for the foreseeable future, a media
495	   security system should function even in the presence of HERFP
496	   behavior.

498	4.3.  Recording

500	   The discussion in this section relates to requirement R-RECORDING.

502	   Some business environments, such as stock brokers, banks, and catalog
503	   call centers, require recording calls with customers.  This is the
504	   familiar "this call is being recorded for quality purposes" heard
505	   during calls to these sorts of businesses.  In these environments,
506	   media recording is typically performed by an intermediate device
507	   (with RTP, this is typically implemented in a 'sniffer').

509	   When performing such call recording with SRTP, the end-to-end
510	   security is compromised.  This is unavoidable, but necessary because
511	   the operation of the business requires such recording.  It is
512	   desirable that the media security is not unduly compromised by the
513	   media recording.  The endpoint within the organization needs to be
514	   informed that there is an intermediate device and needs to cooperate
515	   with that intermediate device.

517	   This scenario does not place a requirement directly on the key
518	   management protocol.  The requirement could be met directly by the
519	   key management protocol (e.g., MIKEY-NULL or [RFC4568]) or through an
520	   external out-of-band-mechanism (e.g., [I-D.wing-sipping-srtp-key]).

522	4.4.  PSTN gateway

524	   The discussion in this section relates to requirement R-PSTN.

526	   It is desirable, even when one leg of a call is on the PSTN, that the
527	   IP leg of the call be protected with SRTP.

529	   A typical case of using media security where two entities are having
530	   a VoIP conversation over IP capable networks.  However, there are
531	   cases where the other end of the communication is not connected to an
532	   IP capable network.  In this kind of setting, there needs to be some
533	   kind of gateway at the edge of the IP network which converts the VoIP
534	   conversation to format understood by the other network.  An example
535	   of such gateway is a PSTN gateway sitting at the edge of IP and PSTN
536	   networks (such as the architecture described in [RFC3372]).

538	   If media security (e.g., SRTP protection) is employed in this kind of
539	   gateway-setting, then media security and the related key management
540	   is terminated at the PSTN gateway.  The other network (e.g., PSTN)
541	   may have its own measures to protect the communication, but this
542	   means that from media security point of view the media security is
543	   not employed truely end-to-end between the communicating entities.

545	4.5.  Call Setup Performance

547	   The discussion in this section relates to requirement R-REUSE.

549	   Some devices lack sufficient processing power to perform public key
550	   operations or Diffie-Hellman operations for each call, or prefer to
551	   avoid performing those operations on every call.  The ability to re-
552	   use previous public key or Diffie-Hellman operations can vastly
553	   decrease the call setup delay and processing requirements for such
554	   devices.

556	   In certain devices, it can take a second or two to perform a Diffie-
557	   Hellman operation.  Examples of these devices include handsets, IP
558	   Multimedia Services Identity Module (ISIMs), and PSTN gateways.  PSTN
559	   gateways typically utilize a Digital Signal Processor (DSP) which is
560	   not yet involved with typical DSP operations at the beginning of a
561	   call, thus the DSP could be used to perform the calculation, so as to
562	   avoid having the central host processor perform the calculation.
563	   However, not all PSTN gateways use DSPs (some have only central
564	   processors or their DSPs are incapable of performing the necessary
565	   public key or Diffie-Hellman operation), and handsets lack a
566	   separate, unused processor to perform these operations.

568	   Two scenarios where R-REUSE is useful are calls between an endpoint
569	   and its voicemail server or its PSTN gateway.  In those scenarios
570	   calls are made relatively often and it can be useful for the
571	   voicemail server or PSTN gateway to avoid public key operations for
572	   subsequent calls.

574	   Storing keys across sessions often interferes with perfect forward
575	   secrecy (R-PFS).

577	4.6.  Transcoding

579	   The discussion in this section relates to requirement R-TRANSCODER.

581	   In some environments is is necessary for network equipment to
582	   transcode from one codec (e.g., a highly compressed codec which makes
583	   efficient use of wireless bandwidth) to another codec (e.g., a
584	   standardized codec to a SIP peering interface).  With RTP, a
585	   transcoding function can be performed with the combination of a SIP
586	   B2BUA (to modify the SDP) and a processor to perform the transcoding
587	   between the codecs.  However, with end-to-end secured SRTP, a
588	   transcoding function implemented the same way is a man in the middle
589	   attack, and the key management system prevents its use.

591	   However, such a network-based transcoder can still be realized with
592	   the cooperation and approval of the endpoint, and can provide end-to-
593	   transcoder and transcoder-to-end security.

595	4.7.  Upgrading to SRTP

597	   The discussion in this section relates to the requirement R-ALLOW-
598	   RTP.

600	   Legitimate RTP media can be sent to an endpoint for announcements,
601	   colorful ringback tones (e.g., music), advertising, or normal call
602	   progress tones.  The RTP may be received before an associated SDP
603	   answer.  For details on various scenarios, see
604	   [I-D.stucker-sipping-early-media-coping].

606	   While receiving such RTP exposes the calling party to a risk of
607	   receiving malicious RTP from an attacker, SRTP endpoints will need to
608	   receive and play out RTP media in order to be compatible with
609	   deployed systems that send RTP to calling parties.

611	4.8.  Interworking with Other Signaling Protocols

613	   The discussion in this section relates to the requirement R-OTHER-
614	   SIGNALING.

616	   In many environments, some devices are signaled with protocols other
617	   than SIP which do not share SIP's offer/answer model (e.g., [H.248.1]
618	   or do not utilize SDP (e.g., H.323).  In other environments, both
619	   endpoints may be SIP, but may use different key management systems
620	   (e.g., one uses MIKEY-RSA, the other MIKEY-RSA-R).

622	   In these environments, it is desirable to have SRTP -- rather than
623	   RTP -- between the two endpoints.  It is always possible, although
624	   undesirable, to interwork those disparate signaling systems or
625	   disparate key management systems by decrypting and re-encrypting each
626	   SRTP packet in a device in the middle of the network (often the same
627	   device performing the signaling interworking).  This is undesirable
628	   due to the cost and increased attack area, as such an SRTP/SRTP
629	   interworking device is a valuable attack target.

631	   At the time of this writing, interworking is considered important.
632	   Interworking without decryption/encryption of the SRTP, while useful,
633	   is not yet deemed critical because the scale of such SRTP deployments
634	   is, to date, relatively small.

636	4.9.  Certificates

638	   The discussion in this section relates to R-CERTS.

640	   On the Internet and on some private networks, validating another
641	   peer's certificate is often done through a trust anchor -- a list of
642	   Certificate Authorities that are trusted.  It can be difficult or
643	   expensive for a peer to obtain these certificates.  In all cases,
644	   both parties to the call would need to trust the same trust anchor
645	   (i.e., "certificate authority").  For these reasons, it is important
646	   that the media plane key management protocol offer a mechanism that
647	   allows end-users who have no prior association to authenticate to
648	   each other without acquiring credentials from a third party trust
649	   point.  Note that this does not rule out mechanisms in which servers
650	   have certificates and attest to the identities of end-users.

652	5.  Requirements

654	   This section is divided into several parts: requirements specific to
655	   the key management protocol (Section 5.1), attack scenarios
656	   (Section 5.2), and requirements which can be met inside the key
657	   management protocol or outside of the key management protocol
658	   (Section 5.3).

660	5.1.  Key Management Protocol Requirements

662	   SIP Forking and Retargeting, from Section 4.2:

664	   R-FORK-RETARGET:
665	         The media security key management protocol MUST securely
666	         support forking and retargeting when all endpoints are willing
667	         to use SRTP without causing the call setup to fail.  This
668	         requirement means the endpoints that did not answer the call
669	         MUST NOT learn the SRTP keys (in either direction) used by the
670	         answering endpoint.

672	   R-DISTINCT:
673	         The media security key management protocol MUST be capable of
674	         creating distinct, independent cryptographic contexts for each
675	         endpoint in a forked session.

677	   R-HERFP:
678	         The media security key management protocol MUST function
679	         securely even in the presence of HERFP behavior, i.e., the
680	         rejection of key information does not reach the sender.

682	   Performance considerations:

684	   R-REUSE:
685	         The media security key management protocol MAY support the re-
686	         use of a previously established security context.

688	               Note: re-use of the security context does not imply re-
689	               use of RTP parameters (e.g., payload type or SSRC).

691	   Media considerations:

693	   R-AVOID-CLIPPING:
694	         The media security key management protocol SHOULD avoid
695	         clipping media before SDP answer without requiring Security
696	         Preconditions [RFC5027].  This requirement comes from
697	         Section 4.1.

699	   R-RTP-CHECK:
700	         If SRTP key negotiation is performed over the media path (i.e.,
701	         using the same UDP/TCP ports as media packets), the key
702	         negotiation packets MUST NOT pass the RTP validity check
703	         defined in Appendix A.1 of [RFC3550], so that SRTP negotiation
704	         packets can be differentiated from RTP packets.

706	   R-ASSOC:
707	         The media security key management protocol SHOULD include a
708	         mechanism for associating key management messages with both the
709	         signaling traffic that initiated the session and with protected
710	         media traffic.  It is useful to associate key management
711	         messages with call signaling messages, as this allows the SDP
712	         offerer to avoid performing CPU-consuming operations (e.g.,
713	         Diffie-Hellman or public key operations) with attackers that
714	         have not seen the signaling messages.

716	         For example, if using a Diffie-Hellman keying technique with
717	         security preconditions that forks to 20 end points, the call
718	         initiator would get 20 provisional responses containing 20
719	         signed Diffie-Hellman key pairs.  Calculating 20 Diffie-Hellman
720	         secrets and validating signatures can be a difficult task for
721	         some devices.  Hence, in the case of forking, it is not
722	         desirable to perform a Diffie-Hellman operation with every
723	         party, but rather only with the party that answers the call
724	         (and incur some media clipping).  To do this, the signaling and
725	         media need to be associated so the calling party knows which
726	         key management exchange needs to be completed.  This might be
727	         done by using the transport address indicated in the SDP,
728	         although NATs can complicate this association.

730	               Note: due to RTP's design requirements, it is expected
731	               that SRTP receivers will have to perform authentication
732	               of any received SRTP packets.

734	   R-NEGOTIATE:
735	         The media security key management protocol MUST allow a SIP
736	         User Agent to negotiate media security parameters for each
737	         individual session.  Such negotiation MUST NOT cause a two-time
738	         pad (Section 9.1 of [RFC3711]).

740	   R-PSTN:
741	         The media security key management protocol MUST support
742	         termination of media security in a PSTN gateway.  This
743	         requirement is from Section 4.4.

745	5.2.  Security Requirements

747	   This section describes overall security requirements and specific
748	   requirements from the attack scenarios (Section 3).

750	   Overall security requirements:

752	   R-PFS:
753	         The media security key management protocol MUST be able to
754	         support perfect forward secrecy.

756	   R-COMPUTE:
757	         The media security key management protocol MUST support
758	         offering additional SRTP cipher suites without incurring
759	         significant computational expense.

761	   R-CERTS:
762	         The key management protocol MUST NOT require that end-users
763	         obtain credentials (certificates or private keys) from a third-
764	         party trust anchor.

766	   R-FIPS:
767	         The media security key management protocol SHOULD use
768	         algorithms that allow FIPS 140-2 [FIPS-140-2] certification or
769	         similar country-specific certification (e.g., [AISITSEC]).

771	         The United States Government can only purchase and use crypto
772	         implementations that have been validated by the FIPS-140

774	         [FIPS-140-2] process:

776	               "The FIPS-140 standard is applicable to all Federal
777	               agencies that use cryptographic-based security systems to
778	               protect sensitive information in computer and
779	               telecommunication systems, including voice systems.  The
780	               adoption and use of this standard is available to private
781	               and commercial organizations."

783	         Some commercial organizations, such as banks and defense
784	         contractors, require or prefer equipment which has received the
785	         same validation.

787	   R-DOS:
788	         The media security key management protocol MUST NOT introduce
789	         any new significant denial of service vulnerabilities (e.g.,
790	         the protocol should not request the endpoint to perform CPU-
791	         intensive operations without the client being able to validate
792	         or authorize the request).

794	   R-EXISTING:
795	         The media security key management protocol SHOULD allow
796	         endpoints to authenticate using pre-existing cryptographic
797	         credentials, e.g., certificates or pre-shared keys.

799	   R-AGILITY:
800	         The media security key management protocol MUST provide crypto-
801	         agility, i.e., the ability to adapt to evolving cryptography
802	         and security requirements (update of cryptographic algorithms
803	         without substantial disruption to deployed implementations)

805	   R-DOWNGRADE:
806	         The media security key management protocol MUST protect cipher
807	         suite negotiation against downgrading attacks.

809	   R-PASS-MEDIA:
810	         The media security key management protocol MUST have a mode
811	         which prevents a passive adversary with access to the media
812	         path from gaining access to keying material used to protect
813	         SRTP media packets.

815	   R-PASS-SIG:
816	         The media security key management protocol MUST have a mode in
817	         which it prevents a passive adversary with access to the
818	         signaling path from gaining access to keying material used to
819	         protect SRTP media packets.

821	   R-SIG-MEDIA:
822	         The media security key management protocol MUST have a mode in
823	         which it defends itself from an attacker that is solely on the
824	         media path and from an attacker that is solely on the signaling
825	         path.  A successful attack refers to the ability for the
826	         adversary to obtain keying material to decrypt the SRTP
827	         encrypted media traffic.

829	   R-ID-BINDING:
830	         The media security key management protocol MUST enable the
831	         media security keys to be cryptographically bound to an
832	         identity of the endpoint.

834	               This allows domains to deploy SIP Identity [RFC4474].

836	   R-ACT-ACT:
837	         The media security key management protocol MUST support a mode
838	         of operation that provides active-signaling-active-media-detect
839	         robustness, and MAY support modes of operation that provide
840	         lower levels of robustness (as described in Section 3).

842	               Failing to meet R-ACT-ACT indicates the protocol can not
843	               provide secure end-to-end media.

845	5.3.  Requirements Outside of the Key Management Protocol

847	   The requirements in this section are for an overall VoIP security
848	   system.  These requirements can be met within the key management
849	   protocol itself, or can be solved outside of the key management
850	   protocol itself (e.g., solved in SIP or in SDP).

852	   R-BEST-SECURE:
853	         Even when some end points of a forked or retargeted call are
854	         incapable of using SRTP, a solution MUST be described which
855	         allows the establishment of SRTP associations with SRTP-capable
856	         endpoints and / or RTP associations with non-SRTP-capable
857	         endpoints.

859	   R-OTHER-SIGNALING:
860	         A solution SHOULD be able to negotiate keys for SRTP sessions
861	         created via different call signaling protocols (e.g., between
862	         Jabber, SIP, H.323, MGCP).

864	   R-RECORDING:
865	         A solution SHOULD be described which supports recording of
866	         decrypted media.  This requirement comes from Section 4.3.

868	   R-TRANSCODER:
869	         A solution SHOULD be described which supports intermediate
870	         nodes (e.g., transcoders), terminating or processing media,
871	         between the end points.

873	   R-ALLOW-RTP:  A solution SHOULD be described which allows RTP media
874	         to be received by the calling party until SRTP has been
875	         negotiated with the answerer, after which SRTP is preferred
876	         over RTP.

878	6.  Security Considerations

880	   This document lists requirements for securing media traffic.  As
881	   such, it addresses security throughout the document.

883	7.  IANA Considerations

885	   This document does not require actions by IANA.

887	8.  Acknowledgements

889	   For contributions to the requirements portion of this document, the
890	   authors would like to thank the active participants of the RTPSEC BoF
891	   and on the RTPSEC mailing list, and a special thanks to Steffen Fries
892	   and Dragan Ignjatic for their excellent MIKEY comparison [RFC5197]
893	   document.

895	   The authors would furthermore like to thank the following people for
896	   their review, suggestions, and comments: Flemming Andreasen, Richard
897	   Barnes, Mark Baugher, Wolfgang Buecker, Werner Dittmann, Lakshminath
898	   Dondeti, John Elwell, Martin Euchner, Hans-Heinrich Grusdt, Christer
899	   Holmberg, Guenther Horn, Peter Howard, Leo Huang, Dragan Ignjatic,
900	   Cullen Jennings, Alan Johnston, Vesa Lehtovirta, Matt Lepinski, David
901	   McGrew, David Oran, Colin Perkins, Eric Raymond, Eric Rescorla, Peter
902	   Schneider, Srinath Thiruvengadam, Dave Ward, Dan York, and Phil
903	   Zimmermann.

905	9.  References

907	9.1.  Normative References

909	   [FIPS-140-2]
910	              NIST, "Security Requirements for Cryptographic Modules",
911	              June 2005, <http://csrc.nist.gov/publications/fips/
912	              fips140-2/fips1402.pdf>.

914	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
915	              Requirement Levels", BCP 14, RFC 2119, March 1997.

917	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
918	              A., Peterson, J., Sparks, R., Handley, M., and E.
919	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
920	              June 2002.

922	   [RFC3262]  Rosenberg, J. and H. Schulzrinne, "Reliability of
923	              Provisional Responses in Session Initiation Protocol
924	              (SIP)", RFC 3262, June 2002.

926	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
927	              with Session Description Protocol (SDP)", RFC 3264,
928	              June 2002.

930	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
931	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
932	              RFC 3711, March 2004.

934	9.2.  Informative References

936	   [AISITSEC]
937	              "Anwendungshinweise und Interpretationen (AIS) zu ITSEC",
938	              January 2002,
939	              <http://www.bsi.de/zertifiz/zert/interpr/aisitsec.htm>.

941	   [H.248.1]  ITU, "Gateway control protocol", June 2000,
942	              <http://www.itu.int/rec/T-REC-H.248/e>.

944	   [I-D.baugher-mmusic-sdp-dh]
945	              Baugher, M. and D. McGrew, "Diffie-Hellman Exchanges for
946	              Multimedia Sessions", draft-baugher-mmusic-sdp-dh-00 (work
947	              in progress), February 2006.

949	   [I-D.dondeti-msec-rtpsec-mikeyv2]
950	              Dondeti, L., "MIKEYv2: SRTP Key Management using MIKEY,
951	              revisited", draft-dondeti-msec-rtpsec-mikeyv2-01 (work in
952	              progress), March 2007.

954	   [I-D.fischl-sipping-media-dtls]
955	              Fischl, J., "Datagram Transport Layer Security (DTLS)
956	              Protocol for Protection of Media  Traffic Established with
957	              the Session Initiation Protocol",
958	              draft-fischl-sipping-media-dtls-03 (work in progress),
959	              July 2007.

961	   [I-D.ietf-avt-dtls-srtp]
962	              McGrew, D. and E. Rescorla, "Datagram Transport Layer
963	              Security (DTLS) Extension to Establish Keys for  Secure
964	              Real-time Transport Protocol (SRTP)",
965	              draft-ietf-avt-dtls-srtp-06 (work in progress),
966	              October 2008.

968	   [I-D.ietf-mmusic-ice]
969	              Rosenberg, J., "Interactive Connectivity Establishment
970	              (ICE): A Protocol for Network Address  Translator (NAT)
971	              Traversal for Offer/Answer Protocols",
972	              draft-ietf-mmusic-ice-19 (work in progress), October 2007.

974	   [I-D.ietf-mmusic-media-path-middleboxes]
975	              Stucker, B. and H. Tschofenig, "Analysis of Middlebox
976	              Interactions for Signaling Protocol Communication  along
977	              the Media Path",
978	              draft-ietf-mmusic-media-path-middleboxes-01 (work in
979	              progress), July 2008.

981	   [I-D.ietf-mmusic-sdp-capability-negotiation]
982	              Andreasen, F., "SDP Capability Negotiation",
983	              draft-ietf-mmusic-sdp-capability-negotiation-09 (work in
984	              progress), July 2008.

986	   [I-D.ietf-msec-mikey-ecc]
987	              Milne, A., "ECC Algorithms for MIKEY",
988	              draft-ietf-msec-mikey-ecc-03 (work in progress),
989	              June 2007.

991	   [I-D.ietf-sip-certs]
992	              Jennings, C. and J. Fischl, "Certificate Management
993	              Service for The Session Initiation Protocol (SIP)",
994	              draft-ietf-sip-certs-07 (work in progress), November 2008.

996	   [I-D.ietf-tls-rfc4346-bis]
997	              Dierks, T. and E. Rescorla, "The Transport Layer Security
998	              (TLS) Protocol Version 1.2", draft-ietf-tls-rfc4346-bis-10
999	              (work in progress), March 2008.

1001	   [I-D.jennings-sipping-multipart]
1002	              Wing, D. and C. Jennings, "Session Initiation Protocol
1003	              (SIP) Offer/Answer with Multipart Alternative",
1004	              draft-jennings-sipping-multipart-02 (work in progress),
1005	              March 2006.

1007	   [I-D.mcgrew-srtp-ekt]
1008	              McGrew, D., "Encrypted Key Transport for Secure RTP",
1009	              draft-mcgrew-srtp-ekt-03 (work in progress), July 2007.

1011	   [I-D.stucker-sipping-early-media-coping]
1012	              Stucker, B., "Coping with Early Media in the Session
1013	              Initiation Protocol (SIP)",
1014	              draft-stucker-sipping-early-media-coping-03 (work in
1015	              progress), October 2006.

1017	   [I-D.wing-sipping-srtp-key]
1018	              Wing, D., Audet, F., Fries, S., Tschofenig, H., and A.
1019	              Johnston, "Secure Media Recording and Transcoding with the
1020	              Session Initiation  Protocol",
1021	              draft-wing-sipping-srtp-key-04 (work in progress),
1022	              October 2008.

1024	   [I-D.zimmermann-avt-zrtp]
1025	              Zimmermann, P., Johnston, A., and J. Callas, "ZRTP: Media
1026	              Path Key Agreement for Secure RTP",
1027	              draft-zimmermann-avt-zrtp-11 (work in progress),
1028	              November 2008.

1030	   [RFC3326]  Schulzrinne, H., Oran, D., and G. Camarillo, "The Reason
1031	              Header Field for the Session Initiation Protocol (SIP)",
1032	              RFC 3326, December 2002.

1034	   [RFC3372]  Vemuri, A. and J. Peterson, "Session Initiation Protocol
1035	              for Telephones (SIP-T): Context and Architectures",
1036	              BCP 63, RFC 3372, September 2002.

1038	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1039	              Jacobson, "RTP: A Transport Protocol for Real-Time
1040	              Applications", STD 64, RFC 3550, July 2003.

1042	   [RFC3830]  Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K.
1043	              Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830,
1044	              August 2004.

1046	   [RFC4474]  Peterson, J. and C. Jennings, "Enhancements for
1047	              Authenticated Identity Management in the Session
1048	              Initiation Protocol (SIP)", RFC 4474, August 2006.

1050	   [RFC4492]  Blake-Wilson, S., Bolyard, N., Gupta, V., Hawk, C., and B.
1051	              Moeller, "Elliptic Curve Cryptography (ECC) Cipher Suites
1052	              for Transport Layer Security (TLS)", RFC 4492, May 2006.

1054	   [RFC4568]  Andreasen, F., Baugher, M., and D. Wing, "Session
1055	              Description Protocol (SDP) Security Descriptions for Media
1056	              Streams", RFC 4568, July 2006.

1058	   [RFC4650]  Euchner, M., "HMAC-Authenticated Diffie-Hellman for
1059	              Multimedia Internet KEYing (MIKEY)", RFC 4650,
1060	              September 2006.

1062	   [RFC4738]  Ignjatic, D., Dondeti, L., Audet, F., and P. Lin, "MIKEY-
1063	              RSA-R: An Additional Mode of Key Distribution in
1064	              Multimedia Internet KEYing (MIKEY)", RFC 4738,
1065	              November 2006.

1067	   [RFC4771]  Lehtovirta, V., Naslund, M., and K. Norrman, "Integrity
1068	              Transform Carrying Roll-Over Counter for the Secure Real-
1069	              time Transport Protocol (SRTP)", RFC 4771, January 2007.

1071	   [RFC4916]  Elwell, J., "Connected Identity in the Session Initiation
1072	              Protocol (SIP)", RFC 4916, June 2007.

1074	   [RFC4949]  Shirey, R., "Internet Security Glossary, Version 2",
1075	              RFC 4949, August 2007.

1077	   [RFC5027]  Andreasen, F. and D. Wing, "Security Preconditions for
1078	              Session Description Protocol (SDP) Media Streams",
1079	              RFC 5027, October 2007.

1081	   [RFC5197]  Fries, S. and D. Ignjatic, "On the Applicability of
1082	              Various Multimedia Internet KEYing (MIKEY) Modes and
1083	              Extensions", RFC 5197, June 2008.

1085	Appendix A.  Overview and Evaluation of Existing Keying Mechanisms

1087	   Based on how the SRTP keys are exchanged, each SRTP key exchange
1088	   mechanism belongs to one general category:

1090	   signaling path:
1091	        All the keying is carried in the call signaling (SIP or SDP)
1092	        path.

1094	   media path:
1095	        All the keying is carried in the SRTP/SRTCP media path, and no
1096	        signaling whatsoever is carried in the call signaling path.

1098	   signaling and media path:
1099	        Parts of the keying are carried in the SRTP/SRTCP media path,
1100	        and parts are carried in the call signaling (SIP or SDP) path.

1102	   One of the significant benefits of SRTP over other end-to-end
1103	   encryption mechanisms, such as for example IPsec, is that SRTP is
1104	   bandwidth efficient and SRTP retains the header of RTP packets.

1106	   Bandwidth efficiency is vital for VoIP in many scenarios where access
1107	   bandwidth is limited or expensive, and retaining the RTP header is
1108	   important for troubleshooting packet loss, delay, and jitter.

1110	   Related to SRTP's characteristics is a goal that any SRTP keying
1111	   mechanism to also be efficient and not cause additional call setup
1112	   delay.  Contributors to additional call setup delay include network
1113	   or database operations: retrieval of certificates and additional SIP
1114	   or media path messages, and computational overhead of establishing
1115	   keys or validating certificates.

1117	   When examining the choice between keying in the signaling path,
1118	   keying in the media path, or keying in both paths, it is important to
1119	   realize the media path is generally 'faster' than the SIP signaling
1120	   path.  The SIP signaling path has computational elements involved
1121	   which parse and route SIP messages.  The media path, on the other
1122	   hand, does not normally have computational elements involved, and
1123	   even when computational elements such as firewalls are involved, they
1124	   cause very little additional delay.  Thus, the media path can be
1125	   useful for exchanging several messages to establish SRTP keys.  A
1126	   disadvantage of keying over the media path is that interworking
1127	   different key exchange requires the interworking function be in the
1128	   media path, rather than just in the signaling path; in practice this
1129	   involvement is probably unavoidable anyway.

1131	A.1.  Signaling Path Keying Techniques

1133	A.1.1.  MIKEY-NULL

1135	   MIKEY-NULL [RFC3830] has the offerer indicate the SRTP keys for both
1136	   directions.  The key is sent unencrypted in SDP, which means the SDP
1137	   must be encrypted hop-by-hop (e.g., by using TLS (SIPS)) or end-to-
1138	   end (e.g., by using S/MIME).

1140	   MIKEY-NULL requires one message from offerer to answerer (half a
1141	   round trip), and does not add additional media path messages.

1143	A.1.2.  MIKEY-PSK

1145	   MIKEY-PSK (pre-shared key) [RFC3830] requires that all endpoints
1146	   share one common key.  MIKEY-PSK has the offerer encrypt the SRTP
1147	   keys for both directions using this pre-shared key.

1149	   MIKEY-PSK requires one message from offerer to answerer (half a round
1150	   trip), and does not add additional media path messages.

1152	A.1.3.  MIKEY-RSA

1154	   MIKEY-RSA [RFC3830] has the offerer encrypt the keys for both
1155	   directions using the intended answerer's public key, which is
1156	   obtained from a mechanism outside of MIKEY.

1158	   MIKEY-RSA requires one message from offerer to answerer (half a round
1159	   trip), and does not add additional media path messages.  MIKEY-RSA
1160	   requires the offerer to obtain the intended answerer's certificate.

1162	A.1.4.  MIKEY-RSA-R

1164	   MIKEY-RSA-R [RFC4738] is essentially the same as MIKEY-RSA but
1165	   reverses the role of the offerer and the answerer with regards to
1166	   providing the keys.  That is, the answerer encrypts the keys for both
1167	   directions using the offerer's public key.  Both the offerer and
1168	   answerer validate each other's public keys using a standard X.509
1169	   validation techniques.  MIKEY-RSA-R also enables sending certificates
1170	   in the MIKEY message.

1172	   MIKEY-RSA-R requires one message from offerer to answer, and one
1173	   message from answerer to offerer (full round trip), and does not add
1174	   additional media path messages.  MIKEY-RSA-R requires the offerer
1175	   validate the answerer's certificate.

1177	A.1.5.  MIKEY-DHSIGN

1179	   In MIKEY-DHSIGN [RFC3830] the offerer and answerer derive the key
1180	   from a Diffie-Hellman exchange.  In order to prevent an active man-
1181	   in-the-middle the DH exchange itself is signed using each endpoint's
1182	   private key and the associated public keys are validated using
1183	   standard X.509 validation techniques.

1185	   MIKEY-DHSIGN requires one message from offerer to answerer, and one
1186	   message from answerer to offerer (full round trip), and does not add
1187	   additional media path messages.  MIKEY-DHSIGN requires the offerer
1188	   and answerer to validate each other's certificates.  MIKEY-DHSIGN
1189	   also enables sending the answerer's certificate in the MIKEY message.

1191	A.1.6.  MIKEY-DHHMAC

1193	   MIKEY-DHHMAC [RFC4650] uses a pre-shared secret to HMAC the Diffie-
1194	   Hellman exchange, essentially combining aspects of MIKEY-PSK with
1195	   MIKEY-DHSIGN, but without MIKEY-DHSIGN's need for certificate
1196	   authentication.

1198	   MIKEY-DHHMAC requires one message from offerer to answerer, and one
1199	   message from answerer to offerer (full round trip), and does not add
1200	   additional media path messages.

1202	A.1.7.  MIKEY-ECIES and MIKEY-ECMQV (MIKEY-ECC)

1204	   ECC Algorithms For MIKEY [I-D.ietf-msec-mikey-ecc] describes how ECC
1205	   can be used with MIKEY-RSA (using ECDSA signature) and with MIKEY-
1206	   DHSIGN (using a new DH-Group code), and also defines two new ECC-
1207	   based algorithms, Elliptic Curve Integrated Encryption Scheme (ECIES)
1208	   and Elliptic Curve Menezes-Qu-Vanstone (ECMQV) .

1210	   With this proposal, the ECDSA signature, MIKEY-ECIES, and MIKEY-ECMQV
1211	   function exactly like MIKEY-RSA, and the new DH-Group code function
1212	   exactly like MIKEY-DHSIGN.  Therefore these ECC mechanisms are not
1213	   discussed separately in this document.

1215	A.1.8.  Security Descriptions with SIPS

1217	   Security Descriptions [RFC4568] has each side indicate the key it
1218	   will use for transmitting SRTP media, and the keys are sent in the
1219	   clear in SDP.  Security Descriptions relies on hop-by-hop (TLS via
1220	   "SIPS:") encryption to protect the keys exchanged in signaling.

1222	   Security Descriptions requires one message from offerer to answerer,
1223	   and one message from answerer to offerer (full round trip), and does
1224	   not add additional media path messages.

1226	A.1.9.  Security Descriptions with S/MIME

1228	   This keying mechanism is identical to Appendix A.1.8, except that
1229	   rather than protecting the signaling with TLS, the entire SDP is
1230	   encrypted with S/MIME.

1232	A.1.10.  SDP-DH (expired)

1234	   SDP Diffie-Hellman [I-D.baugher-mmusic-sdp-dh] exchanges Diffie-
1235	   Hellman messages in the signaling path to establish session keys.  To
1236	   protect against active man-in-the-middle attacks, the Diffie-Hellman
1237	   exchange needs to be protected with S/MIME, SIPS, or SIP Identity
1238	   [RFC4474] and SIP Conected Identity [RFC4916].

1240	   SDP-DH requires one message from offerer to answerer, and one message
1241	   from answerer to offerer (full round trip), and does not add
1242	   additional media path messages.

1244	A.1.11.  MIKEYv2 in SDP (expired)

1246	   MIKEYv2 [I-D.dondeti-msec-rtpsec-mikeyv2] adds mode negotiation to
1247	   MIKEYv1 and removes the time synchronization requirement.  It
1248	   therefore now takes 2 round-trips to complete.  In the first round
1249	   trip, the communicating parties learn each other's identities, agree
1250	   on a MIKEY mode, crypto algorithm, SRTP policy, and exchanges nonces
1251	   for replay protection.  In the second round trip, they negotiate
1252	   unicast and/or group SRTP context for SRTP and/or SRTCP.

1254	   Furthemore, MIKEYv2 also defines an in-band negotiation mode as an
1255	   alternative to SDP (see Appendix A.3.3).

1257	A.2.  Media Path Keying Technique

1259	A.2.1.  ZRTP

1261	   ZRTP [I-D.zimmermann-avt-zrtp] does not exchange information in the
1262	   signaling path (although it's possible for endpoints to exchange a
1263	   hash of the ZRTP Hello message with "a=zrtp-hash" in the initial
1264	   Offer if sent over an integrity-protected signaling channel.  This
1265	   provides some useful correlation between the signaling and media
1266	   layers).  In ZRTP the keys are exchanged entirely in the media path
1267	   using a Diffie-Hellman exchange.  The advantage to this mechanism is
1268	   that the signaling channel is used only for call setup and the media
1269	   channel is used to establish an encrypted channel -- much like
1270	   encryption devices on the PSTN.  ZRTP uses voice authentication of
1271	   its Diffie-Hellman exchange by having each person read digits or
1272	   words to the other person.  Subsequent sessions with the same ZRTP
1273	   endpoint can be authenticated using the stored hash of the previously
1274	   negotiated key rather than voice authentication.  ZRTP uses 4 media
1275	   path messages (Hello, Commit, DHPart1, and DHPart2) to establish the
1276	   SRTP key, and 3 media path confirmation messages.  These initial
1277	   messages are all sent as non-RTP packets.

1279	      Note that when ZRTP probing is used, unencrypted RTP can be
1280	      exchanged until the SRTP keys are established.

1282	A.3.  Signaling and Media Path Keying Techniques

1284	A.3.1.  EKT

1286	   EKT [I-D.mcgrew-srtp-ekt] relies on another SRTP key exchange
1287	   protocol, such as Security Descriptions or MIKEY, for bootstrapping.
1288	   In the initial phase, each member of a conference uses an SRTP key
1289	   exchange protocol to establish a common key encryption key (KEK).
1290	   Each member may use the KEK to securely transport its SRTP master key
1291	   and current SRTP rollover counter (ROC), via RTCP, to the other
1292	   participants in the session.

1294	   EKT requires the offerer to send some parameters (EKT_Cipher, KEK,
1295	   and security parameter index (SPI)) via the bootstrapping protocol
1296	   such as Security Descriptions or MIKEY.  Each answerer sends an SRTCP
1297	   message which contains the answerer's SRTP Master Key, rollover
1298	   counter, and the SRTP sequence number.  Rekeying is done by sending a
1299	   new SRTCP message.  For reliable transport, multiple RTCP messages
1300	   need to be sent.

1302	A.3.2.  DTLS-SRTP

1304	   DTLS-SRTP [I-D.ietf-avt-dtls-srtp] exchanges public key fingerprints
1305	   in SDP [I-D.fischl-sipping-media-dtls] and then establishes a DTLS
1306	   session over the media channel.  The endpoints use the DTLS handshake
1307	   to agree on crypto suites and establish SRTP session keys.  SRTP
1308	   packets are then exchanged between the endpoints.

1310	   DTLS-SRTP requires one message from offerer to answerer (half round
1311	   trip), and one message from the answerer to offerer (full round trip)
1312	   so the offerer can correlate the SDP answer with the answering
1313	   endpoint.  DTLS-SRTP uses 4 media path messages to establish the SRTP
1314	   key.

1316	   This document assumes DTLS will use TLS_RSA_WITH_AES_128_CBC_SHA as
1317	   its cipher suite, which is the mandatory-to-implement cipher suite in
1318	   TLS [I-D.ietf-tls-rfc4346-bis].

1320	A.3.3.  MIKEYv2 Inband (expired)

1322	   As defined in Appendix A.1.11, MIKEYv2 also defines an in-band
1323	   negotiation mode as an alternative to SDP (see Appendix A.3.3).  The
1324	   details are not sorted out in the draft yet on what in-band actually
1325	   means (i.e., UDP, RTP, RTCP, etc.).

1327	A.4.  Evaluation Criteria - SIP

1329	   This section considers how each keying mechanism interacts with SIP
1330	   features.

1332	A.4.1.  Secure Retargeting and Secure Forking

1334	   Retargeting and forking of signaling requests is described within
1335	   Section 4.2.  The following builds upon this description.

1337	   The following list compares the behavior of secure forking, answering
1338	   association, two-time pads, and secure retargeting for each keying
1339	   mechanism.

1341	      MIKEY-NULL  Secure Forking: No, all AORs see offerer's and
1342	         answerer's keys.  Answer is associated with media by the SSRC
1343	         in MIKEY.  Additionally, a two-time pad occurs if two branches
1344	         choose the same 32-bit SSRC and transmit SRTP packets.

1346	         Secure Retargeting: No, all targets see offerer's and
1347	         answerer's keys.  Suffers from retargeting identity problem.

1349	      MIKEY-PSK
1350	         Secure Forking: No, all AORs see offerer's and answerer's keys.
1351	         Answer is associated with media by the SSRC in MIKEY.  Note
1352	         that all AORs must share the same pre-shared key in order for
1353	         forking to work at all with MIKEY-PSK.  Additionally, a two-
1354	         time pad occurs if two branches choose the same 32-bit SSRC and
1355	         transmit SRTP packets.

1357	         Secure Retargeting: Not secure.  For retargeting to work, the
1358	         final target must possess the correct PSK.  As this is likely
1359	         in scenarios were the call is targeted to another device
1360	         belonging to the same user (forking), it is very unlikely that
1361	         other users will possess that PSK and be able to successfully
1362	         answer that call.

1364	      MIKEY-RSA
1365	         Secure Forking: No, all AORs see offerer's and answerer's keys.
1366	         Answer is associated with media by the SSRC in MIKEY.  Note
1367	         that all AORs must share the same private key in order for
1368	         forking to work at all with MIKEY-RSA.  Additionally, a two-
1369	         time pad occurs if two branches choose the same 32-bit SSRC and
1370	         transmit SRTP packets.

1372	         Secure Retargeting: No.

1374	      MIKEY-RSA-R
1375	         Secure Forking: Yes. Answer is associated with media by the
1376	         SSRC in MIKEY.

1378	         Secure Retargeting: Yes.

1380	      MIKEY-DHSIGN
1381	         Secure Forking: Yes, each forked endpoint negotiates unique
1382	         keys with the offerer for both directions.  Answer is
1383	         associated with media by the SSRC in MIKEY.

1385	         Secure Retargeting: Yes, each target negotiates unique keys
1386	         with the offerer for both directions.

1388	      MIKEYv2 in SDP
1389	         The behavior will depend on which mode is picked.

1391	      MIKEY-DHHMAC
1392	         Secure Forking: Yes, each forked endpoint negotiates unique
1393	         keys with the offerer for both directions.  Answer is
1394	         associated with media by the SSRC in MIKEY.

1396	         Secure Retargeting: Yes, each target negotiates unique keys
1397	         with the offerer for both directions.  Note that for the keys
1398	         to be meaningful, it would require the PSK to be the same for
1399	         all the potential intermediaries, which would only happen
1400	         within a single domain.

1402	      Security Descriptions with SIPS
1403	         Secure Forking: No.  Each forked endpoint sees the offerer's
1404	         key.  Answer is not associated with media.

1406	         Secure Retargeting: No.  Each target sees the offerer's key.

1408	      Security Descriptions with S/MIME
1409	         Secure Forking: No.  Each forked endpoint sees the offerer's
1410	         key.  Answer is not associated with media.

1412	         Secure Retargeting: No.  Each target sees the offerer's key.
1413	         Suffers from retargeting identity problem.

1415	      SDP-DH
1416	         Secure Forking: Yes. Each forked endpoint calculates a unique
1417	         SRTP key.  Answer is not associated with media.

1419	         Secure Retargeting: Yes. The final target calculates a unique
1420	         SRTP key.

1422	      ZRTP
1423	         Yes. Each forked endpoint calculates a unique SRTP key.  With
1424	         the "a=zrtp-hash" attribute, the media can be associated with
1425	         an answer.

1427	         Secure Retargeting: Yes. The final target calculates a unique
1428	         SRTP key.

1430	      EKT
1431	         Secure Forking: Inherited from the bootstrapping mechanism (the
1432	         specific MIKEY mode or Security Descriptions).  Answer is
1433	         associated with media by the SPI in the EKT protocol.  Answer
1434	         is associated with media by the SPI in the EKT protocol.

1436	         Secure Retargeting: Inherited from the bootstrapping mechanism
1437	         (the specific MIKEY mode or Security Descriptions).

1439	      DTLS-SRTP
1440	         Secure Forking: Yes. Each forked endpoint calculates a unique
1441	         SRTP key.  Answer is associated with media by the certificate
1442	         fingerprint in signaling and certificate in the media path.

1444	         Secure Retargeting: Yes. The final target calculates a unique
1445	         SRTP key.

1447	      MIKEYv2 Inband
1448	         The behavior will depend on which mode is picked.

1450	A.4.2.  Clipping Media Before SDP Answer

1452	   Clipping media before receiving the signaling answer is described
1453	   within Section 4.1.  The following builds upon this description.

1455	   Furthermore, the problem of clipping gets compounded when forking is
1456	   used.  For example, if using a Diffie-Hellman keying technique with
1457	   security preconditions that forks to 20 endpoints, the call initiator
1458	   would get 20 provisional responses containing 20 signed Diffie-
1459	   Hellman half keys.  Calculating 20 DH secrets and validating
1460	   signatures can be a difficult task depending on the device
1461	   capabilities.

1463	   The following list compares the behavior of clipping before SDP
1464	   answer for each keying mechanism.

1466	      MIKEY-NULL
1467	         Not clipped.  The offerer provides the answerer's keys.

1469	      MIKEY-PSK
1470	         Not clipped.  The offerer provides the answerer's keys.

1472	      MIKEY-RSA
1473	         Not clipped.  The offerer provides the answerer's keys.

1475	      MIKEY-RSA-R
1476	         Clipped.  The answer contains the answerer's encryption key.

1478	      MIKEY-DHSIGN
1479	         Clipped.  The answer contains the answerer's Diffie-Hellman
1480	         response.

1482	      MIKEY-DHHMAC
1483	         Clipped.  The answer contains the answerer's Diffie-Hellman
1484	         response.

1486	      MIKEYv2 in SDP
1487	         The behavior will depend on which mode is picked.

1489	      Security Descriptions with SIPS
1490	         Clipped.  The answer contains the answerer's encryption key.

1492	      Security Descriptions with S/MIME
1493	         Clipped.  The answer contains the answerer's encryption key.

1495	      SDP-DH
1496	         Clipped.  The answer contains the answerer's Diffie-Hellman
1497	         response.

1499	      ZRTP
1500	         Not clipped because the session intially uses RTP.  While RTP
1501	         is flowing, both ends negotiate SRTP keys in the media path and
1502	         then switch to using SRTP.

1504	      EKT
1505	         Not clipped, as long as the first RTCP packet (containing the
1506	         answerer's key) is not lost in transit.  The answerer sends its
1507	         encryption key in RTCP, which arrives at the same time (or
1508	         before) the first SRTP packet encrypted with that key.

1510	            Note: RTCP needs to work, in the answerer-to-offerer
1511	            direction, before the offerer can decrypt SRTP media.

1513	      DTLS-SRTP
1514	         No clipping after the DTLS-SRTP handshake has completed.  SRTP
1515	         keys are exchanged in the media path.  Need to wait for SDP
1516	         answer to ensure DTLS-SRTP handshake was done with an
1517	         authorized party.

1519	            If a middlebox interferes with the media path, there can be
1520	            clipping [I-D.ietf-mmusic-media-path-middleboxes].

1522	      MIKEYv2 Inband
1523	         Not clipped.  Keys are exchanged in the media path without
1524	         relying on the signaling path.

1526	A.4.3.  SSRC and ROC

1528	   In SRTP, a cryptographic context is defined as the SSRC, destination
1529	   network address, and destination transport port number.  Whereas RTP,
1530	   a flow is defined as the destination network address and destination
1531	   transport port number.  This results in a problem -- how to
1532	   communicate the SSRC so that the SSRC can be used for the
1533	   cryptographic context.

1535	   Two approaches have emerged for this communication.  One, used by all
1536	   MIKEY modes, is to communicate the SSRCs to the peer in the MIKEY
1537	   exchange.  Another, used by Security Descriptions, is to apply "late
1538	   binding" -- that is, any new packet containing a previously-unseen
1539	   SSRC (which arrives at the same destination network address and
1540	   destination transport port number) will create a new cryptographic
1541	   context.  Another approach, common amongst techniques with media-path
1542	   SRTP key establishment, is to require a handshake over that media
1543	   path before SRTP packets are sent.  MIKEY's approach changes RTP's
1544	   SSRC collision detection behavior by requiring RTP to pre-establish
1545	   the SSRC values for each session.

1547	   Another related issue is that SRTP introduces a rollover counter
1548	   (ROC), which records how many times the SRTP sequence number has
1549	   rolled over.  As the sequence number is used for SRTP's default
1550	   ciphers, it is important that all endpoints know the value of the
1551	   ROC.  The ROC starts at 0 at the beginning of a session.

1553	   Some keying mechanisms cause a two-time pad to occur if two endpoints
1554	   of a forked call have an SSRC collision.

1556	   Note: A proposal has been made to send the ROC value on every Nth
1557	   SRTP packet[RFC4771].  This proposal has not yet been incorporated
1558	   into this document.

1560	   The following list examines handling of SSRC and ROC:

1562	      MIKEY-NULL
1563	         Each endpoint indicates a set of SSRCs and the ROC for SRTP
1564	         packets it transmits.

1566	      MIKEY-PSK
1567	         Each endpoint indicates a set of SSRCs and the ROC for SRTP
1568	         packets it transmits.

1570	      MIKEY-RSA
1571	         Each endpoint indicates a set of SSRCs and the ROC for SRTP
1572	         packets it transmits.

1574	      MIKEY-RSA-R
1575	         Each endpoint indicates a set of SSRCs and the ROC for SRTP
1576	         packets it transmits.

1578	      MIKEY-DHSIGN
1579	         Each endpoint indicates a set of SSRCs and the ROC for SRTP
1580	         packets it transmits.

1582	      MIKEY-DHHMAC
1583	         Each endpoint indicates a set of SSRCs and the ROC for SRTP
1584	         packets it transmits.

1586	      MIKEYv2 in SDP
1587	         Each endpoint indicates a set of SSRCs and the ROC for SRTP
1588	         packets it transmits.

1590	      Security Descriptions with SIPS
1591	         Neither SSRC nor ROC are signaled.  SSRC 'late binding' is
1592	         used.

1594	      Security Descriptions with S/MIME
1595	         Neither SSRC nor ROC are signaled.  SSRC 'late binding' is
1596	         used.

1598	      SDP-DH
1599	         Neither SSRC nor ROC are signaled.  SSRC 'late binding' is
1600	         used.

1602	      ZRTP
1603	         Neither SSRC nor ROC are signaled.  SSRC 'late binding' is
1604	         used.

1606	      EKT
1607	         The SSRC of the SRTCP packet containing an EKT update
1608	         corresponds to the SRTP master key and other parameters within
1609	         that packet.

1611	      DTLS-SRTP
1612	         Neither SSRC nor ROC are signaled.  SSRC 'late binding' is
1613	         used.

1615	      MIKEYv2 Inband
1616	         Each endpoint indicates a set of SSRCs and the ROC for SRTP
1617	         packets it transmits.

1619	A.5.  Evaluation Criteria - Security

1621	   This section evaluates each keying mechanism on the basis of their
1622	   security properties.

1624	A.5.1.  Distribution and Validation of Persistent Public Keys and
1625	        Certificates

1627	   Using persistent public keys for confidentiality and authentication
1628	   can introduce requirements for two types of systems, often
1629	   implemented using certificates: (1) a system to distribute those
1630	   persistent public keys certificates, and (2) a system for validating
1631	   those persistent public keys.  We refer to the former as a key
1632	   distribution system and the latter as an authentication
1633	   infrastructure.  In many cases, a monolithic public key
1634	   infrastructure (PKI) is used for fulfill both of these roles.
1635	   However, these functions can be provided by many other systems.  For
1636	   instance, key distribution may be accomplished by any public
1637	   repository of keys.  Any system in which the two endpoints have
1638	   access to trust anchors and intermediate CA certificates that can be
1639	   used to validate other endpoints' certificates (including a system of
1640	   self-signed certificates) can be used to support certificate
1641	   validation in the below schemes.

1643	   With real-time communications it is desirable to avoid fetching or
1644	   validating certificates that delay call setup.  Rather, it is
1645	   preferable to fetch or validate certificates in such a way that call
1646	   setup is not delayed.  For example, a certificate can be validated
1647	   while the phone is ringing or can be validated while ring-back tones
1648	   are being played or even while the called party is answering the
1649	   phone and saying "hello".  Even better is to avoid fetching or
1650	   validating persistent public keys at all.

1652	   SRTP key exchange mechanisms that require a particular authentication
1653	   infrastructure to operate (whether for distribution or validation)
1654	   are gated on the deployment of a such an infrastructure available to
1655	   both endpoints.  This means that no media security is achievable
1656	   until such an infrastructure exists.  For SIP, something like sip-
1657	   certs [I-D.ietf-sip-certs] might be used to obtain the certificate of
1658	   a peer.

1660	      Note: Even if sip-certs [I-D.ietf-sip-certs] was deployed, the
1661	      retargeting problem (Appendix A.4.1) would still prevent
1662	      successful deployment of keying techniques which require the
1663	      offerer to obtain the actual target's public key.

1665	   The following list compares the requirements introduced by the use of
1666	   public-key cryptography in each keying mechanism, both for public key
1667	   distribution and for certificate validation.

1669	      MIKEY-NULL
1670	         Public-key cryptography is not used.

1672	      MIKEY-PSK
1673	         Public-key cryptography is not used.  Rather, all endpoints
1674	         must have some way to exchange per-endpoint or per-system pre-
1675	         shared keys.

1677	      MIKEY-RSA
1678	         The offerer obtains the intended answerer's public key before
1679	         initiating the call.  This public key is used to encrypt the
1680	         SRTP keys.  There is no defined mechanism for the offerer to
1681	         obtain the answerer's public key, although [I-D.ietf-sip-certs]
1682	         might be viable in the future.

1684	         The offer may also contain a certificate for the offeror, which
1685	         would require an authentication infrastructure in order to be
1686	         validated by the receiver.

1688	      MIKEY-RSA-R
1689	         The offer contains the offerer's certificate, and the answer
1690	         contains the answerer's certificate.  The answerer uses the
1691	         public key in the certificate to encrypt the SRTP keys that
1692	         will be used by the offerer and the answerer.  An
1693	         authentication infrastructure is necessary to validate the
1694	         certificates.

1696	      MIKEY-DHSIGN
1697	         An authentication infrastructure is used to authenticate the
1698	         public key that is included in the MIKEY message.

1700	      MIKEY-DHHMAC
1701	         Public-key cryptography is not used.  Rather, all endpoints
1702	         must have some way to exchange per-endpoint or per-system pre-
1703	         shared keys.

1705	      MIKEYv2 in SDP
1706	         The behavior will depend on which mode is picked.

1708	      Security Descriptions with SIPS
1709	         Public-key cryptography is not used.

1711	      Security Descriptions with S/MIME
1712	         Use of S/MIME requires that the endpoints be able to fetch and
1713	         validate certificates for each other.  The offerer must obtain
1714	         the intended target's certificate and encrypts the SDP offer
1715	         with the public key contained in target's certificate.  The
1716	         answerer must obtain the offerer's certificate and encrypt the
1717	         SDP answer with the public key contained in the offerer's
1718	         certificate.

1720	      SDP-DH
1721	         Public-key cryptography is not used.

1723	      ZRTP
1724	         Public-key cryptography is used (Diffie-Hellman), but without
1725	         dependence on persistent public keys.  Thus, certificates are
1726	         not fetched or validated.

1728	      EKT
1729	         Public-key cryptography is not used by itself, but might be
1730	         used by the EKT bootstrapping keying mechanism (such as certain
1731	         MIKEY modes).

1733	      DTLS-SRTP
1734	         Remote party's certificate is sent in media path, and a
1735	         fingerprint of the same certificate is sent in the signaling
1736	         path.

1738	      MIKEYv2 Inband
1739	         The behavior will depend on which mode is picked.

1741	A.5.2.  Perfect Forward Secrecy

1743	   In the context of SRTP, Perfect Forward Secrecy is the property that
1744	   SRTP session keys that protected a previous session are not
1745	   compromised if the static keys belonging to the endpoints are
1746	   compromised.  That is, if someone were to record your encrypted
1747	   session content and later acquires either party's private key, that
1748	   encrypted session content would be safe from decryption if your key
1749	   exchange mechanism had perfect forward secrecy.

1751	   The following list describes how each key exchange mechanism provides
1752	   PFS.

1754	      MIKEY-NULL
1755	         Not applicable; MIKEY-NULL does not have a long-term secret.

1757	      MIKEY-PSK
1758	         No PFS.

1760	      MIKEY-RSA
1761	         No PFS.

1763	      MIKEY-RSA-R
1764	         No PFS.

1766	      MIKEY-DHSIGN
1767	         PFS is provided with the Diffie-Hellman exchange.

1769	      MIKEY-DHHMAC
1770	         PFS is provided with the Diffie-Hellman exchange.

1772	      MIKEYv2 in SDP
1773	         The behavior will depend on which mode is picked.

1775	      Security Descriptions with SIPS
1776	         Not applicable; Security Descriptions does not have a long-term
1777	         secret.

1779	      Security Descriptions with S/MIME
1780	         Not applicable; Security Descriptions does not have a long-term
1781	         secret.

1783	      SDP-DH
1784	         PFS is provided with the Diffie-Hellman exchange.

1786	      ZRTP
1787	         PFS is provided with the Diffie-Hellman exchange.

1789	      EKT
1790	         No PFS.

1792	      DTLS-SRTP
1793	         PFS is provided if the negotiated cipher suite uses ephemeral
1794	         keys (e.g., Diffie-Hellman (DHE_RSA [I-D.ietf-tls-rfc4346-bis])
1795	         or Elliptic Curve Diffie-Hellman [RFC4492]).

1797	      MIKEYv2 Inband
1798	         The behavior will depend on which mode is picked.

1800	A.5.3.  Best Effort Encryption

1802	   With best effort encryption, SRTP is used with endpoints that support
1803	   SRTP, otherwise RTP is used.

1805	   SIP needs a backwards-compatible best effort encryption in order for
1806	   SRTP to work successfully with SIP retargeting and forking when there
1807	   is a mix of forked or retargeted devices that support SRTP and don't
1808	   support SRTP.

1810	      Consider the case of Bob, with a phone that only does RTP and a
1811	      voice mail system that supports SRTP and RTP.  If Alice calls Bob
1812	      with an SRTP offer, Bob's RTP-only phone will reject the media
1813	      stream (with an empty "m=" line) because Bob's phone doesn't
1814	      understand SRTP (RTP/SAVP).  Alice's phone will see this rejected
1815	      media stream and may terminate the entire call (BYE) and re-
1816	      initiate the call as RTP-only, or Alice's phone may decide to
1817	      continue with call setup with the SRTP-capable leg (the voice mail
1818	      system).  If Alice's phone decided to re-initiate the call as RTP-
1819	      only, and Bob doesn't answer his phone, Alice will then leave
1820	      voice mail using only RTP, rather than SRTP as expected.

1822	   Currently, several techniques are commonly considered as candidates
1823	   to provide opportunistic encryption:

1825	   multipart/alternative
1826	      [I-D.jennings-sipping-multipart] describes how to form a
1827	      multipart/alternative body part in SIP.  The significant issues
1828	      with this technique are (1) that multipart MIME is incompatible
1829	      with existing SIP proxies, firewalls, Session Border Controllers,
1830	      and endpoints and (2) when forking, the Heterogeneous Error
1831	      Response Forking Problem (HERFP) [RFC3326] causes problems if such
1832	      non-multipart-capable endpoints were involved in the forking.

1834	   session attribute
1835	      With this technique, the endpoints signal their desire to do SRTP
1836	      by signaling RTP (RTP/AVP), and using an attribute ("a=") in the
1837	      SDP.  This technique is entirely backwards compatible with non-
1838	      SRTP-aware endpoints, but doesn't use the RTP/SAVP protocol
1839	      registered by SRTP [RFC3711].

1841	   SDP Capability Negotiation
1842	      SDP Capability Negotiation
1843	      [I-D.ietf-mmusic-sdp-capability-negotiation] provides a backwards-
1844	      compatible mechanism to allow offering both SRTP and RTP in a
1845	      single offer.  This is the preferred technique.

1847	   Probing
1848	      With this technique, the endpoints first establish an RTP session
1849	      using RTP (RTP/AVP).  The endpoints send probe messages, over the
1850	      media path, to determine if the remote endpoint supports their
1851	      keying technique.  A disadvantage of probing is an active attacker
1852	      can interfere with probes, and until probing completes (and SRTP
1853	      is established) the media is in the clear.

1855	   The preferred technique, SDP Capability Negotiation
1856	   [I-D.ietf-mmusic-sdp-capability-negotiation], can be used with all
1857	   key exchange mechanisms.  What remains unique is ZRTP, which can also
1858	   accomplish its best effort encryption by probing (sending ZRTP
1859	   messages over the media path) or by session attribute (see "a=zrtp-
1860	   hash" in [I-D.zimmermann-avt-zrtp]).  Current implementations of ZRTP
1861	   use probing.

1863	A.5.4.  Upgrading Algorithms

1865	   It is necessary to allow upgrading SRTP encryption and hash
1866	   algorithms, as well as upgrading the cryptographic functions used for
1867	   the key exchange mechanism.  With SIP's offer/answer model, this can
1868	   be computionally expensive because the offer needs to contain all
1869	   combinations of the key exchange mechanisms (all MIKEY modes,
1870	   Security Descriptions) and all SRTP cryptographic suites (AES-128,
1871	   AES-256) and all SRTP cryptographic hash functions (SHA-1, SHA-256)
1872	   that the offerer supports.  In order to do this, the offerer has to
1873	   expend CPU resources to build an offer containing all of this
1874	   information which becomes computationally prohibitive.

1876	   Thus, it is important to keep the offerer's CPU impact fixed so that
1877	   offering multiple new SRTP encryption and hash functions incurs no
1878	   additional expense.

1880	   The following list describes the CPU effort involved in using each
1881	   key exchange technique.

1883	      MIKEY-NULL
1884	         No significant computational expense.

1886	      MIKEY-PSK
1887	         No significant computational expense.

1889	      MIKEY-RSA
1890	         For each offered SRTP crypto suite, the offerer has to perform
1891	         RSA operation to encrypt the TGK

1893	      MIKEY-RSA-R
1894	         For each offered SRTP crypto suite, the offerer has to perform
1895	         public key operation to sign the MIKEY message.

1897	      MIKEY-DHSIGN
1898	         For each offered SRTP crypto suite, the offerer has to perform
1899	         Diffie-Hellman operation, and a public key operation to sign
1900	         the Diffie-Hellman output.

1902	      MIKEY-DHHMAC
1903	         For each offered SRTP crypto suite, the offerer has to perform
1904	         Diffie-Hellman operation.

1906	      MIKEYv2 in SDP
1907	         The behavior will depend on which mode is picked.

1909	      Security Descriptions with SIPS
1910	         No significant computational expense.

1912	      Security Descriptions with S/MIME
1913	         S/MIME requires the offerer and the answerer to encrypt the SDP
1914	         with the other's public key, and to decrypt the received SDP
1915	         with their own private key.

1917	      SDP-DH
1918	         For each offered SRTP crypto suite, the offerer has to perform
1919	         a Diffie-Hellman operation.

1921	      ZRTP
1922	         The offerer has no additional computational expense at all, as
1923	         the offer contains no information about ZRTP or might contain
1924	         "a=zrtp-hash".

1926	      EKT
1927	         The offerer's Computational expense depends entirely on the EKT
1928	         bootstrapping mechanism selected (one or more MIKEY modes or
1929	         Security Descriptions).

1931	      DTLS-SRTP
1932	         The offerer has no additional computational expense at all, as
1933	         the offer contains only a fingerprint of the certificate that
1934	         will be presented in the DTLS exchange.

1936	      MIKEYv2 Inband
1937	         The behavior will depend on which mode is picked.

1939	Appendix B.  Out-of-Scope

1941	   The compromise of an endpoint that has access to decrypted media
1942	   (e.g., SIP user agent, transcoder, recorder) is out of scope of this
1943	   document.  Such a compromise might be via privilege escalation,
1944	   installation of a virus or trojan horse, or similar attacks.

1946	B.1.  Shared Key Conferencing

1948	   The consensus on the RTPSEC mailing list was to concentrate on
1949	   unicast, point-to-point sessions.  Thus, there are no requirements
1950	   related to shared key conferencing.  This section is retained for
1951	   informational purposes.

1953	   For efficient scaling, large audio and video conference bridges
1954	   operate most efficiently by encrypting the current speaker once and
1955	   distributing that stream to the conference attendees.  Typically,
1956	   inactive participants receive the same streams -- they hear (or see)
1957	   the active speaker(s), and the active speakers receive distinct
1958	   streams that don't include themselves.  In order to maintain
1959	   confidentiality of such conferences where listeners share a common
1960	   key, all listeners must rekeyed when a listener joins or leaves a
1961	   conference.

1963	   An important use case for mixers/translators is a conference bridge:

1965	                                         +----+
1966	                             A --- 1 --->|    |
1967	                               <-- 2 ----| M  |
1968	                                         | I  |
1969	                             B --- 3 --->| X  |
1970	                               <-- 4 ----| E  |
1971	                                         | R  |
1972	                             C --- 5 --->|    |
1973	                               <-- 6 ----|    |
1974	                                         +----+

1976	                       Figure 3: Centralized Keying

1978	   In the figure above, 1, 3, and 5 are RTP media contributions from
1979	   Alice, Bob, and Carol, and 2, 4, and 6 are the RTP flows to those
1980	   devices carrying the 'mixed' media.

1982	   Several scenarios are possible:

1984	   a.  Multiple inbound sessions: 1, 3, and 5 are distinct RTP sessions,

1986	   b.  Multiple outbound sessions: 2, 4, and 6 are distinct RTP
1987	       sessions,

1989	   c.  Single inbound session: 1, 3, and 5 are just different sources
1990	       within the same RTP session,

1992	   d.  Single outbound session: 2, 4, and 6 are different flows of the
1993	       same (multi-unicast) RTP session

1995	   If there are multiple inbound sessions and multiple outbound sessions
1996	   (scenarios a and b), then every keying mechanism behaves as if the
1997	   mixer were an end point and can set up a point-to-point secure
1998	   session between the participant and the mixer.  This is the simplest
1999	   situation, but is computationally wasteful, since SRTP processing has
2000	   to be done independently for each participant.  The use of multiple
2001	   inbound sessions (scenario a) doesn't waste computational resources,
2002	   though it does consume additional cryptographic context on the mixer
2003	   for each participant and has the advantage of data origin
2004	   authentication.

2006	   To support a single outbound session (scenario d), the mixer has to
2007	   dictate its encryption key to the participants.  Some keying
2008	   mechanisms allow the transmitter to determine its own key, and others
2009	   allow the offerer to determine the key for the offerer and answerer.
2010	   Depending on how the call is established, the offerer might be a
2011	   participant (such as a participant dialing into a conference bridge)
2012	   or the offerer might be the mixer (such as a conference bridge
2013	   calling a participant).  The use of offerless INVITEs may help some
2014	   keying mechanisms reverse the role of offerer/answerer.  A
2015	   difficulty, however, is knowing a priori if the role should be
2016	   reversed for a particular call.  The significant advantage of a
2017	   single outbound session is the number of SRTP encryption operations
2018	   remains constant even as the number of participants increases.
2019	   However, a disadvantage is that data origin authentication is lost,
2020	   allowing any participant to spoof the sender (because all
2021	   participants know the sender's SRTP key).

2023	Appendix C.  Requirement renumbering in -02

2025	   [[RFC Editor: Please delete this section prior to publication.]]

2027	   Previous versions of this document used requirement numbers, which
2028	   were changed to mnemonics as follows:

2030	   R1    R-FORK-RETARGET

2032	   R2    R-BEST-SECURE

2034	   R3    R-DISTINCT

2036	   R4    R-REUSE; changed from 'MAY' to 'protocol MUST support, and
2037	         SHOULD implement'

2039	   R5    R-AVOID-CLIPPING

2041	   R6    R-PASS-MEDIA

2043	   R7    R-PASS-SIG

2045	   R8    R-PFS

2047	   R9    R-COMPUTE

2049	   R10   R-RTP-CHECK

2051	   R11   (folded into R4; was reuse previous session)

2053	   R12   R-CERTS

2055	   R13   R-FIPS

2057	   R14   R-ASSOC

2059	   R15   R-ALLOW-RTP

2061	   R16   R-DOS

2063	   R17   R-SIG-MEDIA

2065	   R18   R-EXISTING

2067	   R19   R-AGILITY

2069	   R20   R-DOWNGRADE

2071	   R21   R-NEGOTIATE

2073	   R23   R-OTHER-SIGNALING
2074	   R23   R-RECORDING (R23 was duplicated in previous versions of the
2075	         document)

2077	   R24   (deleted; was lawful intercept)

2079	   R25   R-TRANSCODER

2081	   R26   R-PSTN

2083	   R27   R-ID-BINDING

2085	   R28   R-ACT-ACT

2087	Authors' Addresses

2089	   Dan Wing (editor)
2090	   Cisco Systems, Inc.
2091	   170 West Tasman Drive
2092	   San Jose, CA  95134
2093	   USA

2095	   Email: dwing@cisco.com

2097	   Steffen Fries
2098	   Siemens AG
2099	   Otto-Hahn-Ring 6
2100	   Munich, Bavaria  81739
2101	   Germany

2103	   Email: steffen.fries@siemens.com

2105	   Hannes Tschofenig
2106	   Nokia Siemens Networks
2107	   Otto-Hahn-Ring 6
2108	   Munich, Bavaria  81739
2109	   Germany

2111	   Email: Hannes.Tschofenig@nsn.com
2112	   URI:   http://www.tschofenig.priv.at
2113	   Francois Audet
2114	   Nortel
2115	   4655 Great America Parkway
2116	   Santa Clara, CA  95054
2117	   USA

2119	   Email: audet@nortel.com