idnits 2.17.1 

draft-jones-perc-private-media-reqts-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD',
     or 'RECOMMENDED' is not an accepted usage according to RFC 2119.  Please
     use uppercase 'NOT' together with RFC 2119 keywords (if that is what you
     mean).
     
     Found 'MUST not' in this paragraph:
     
     The final element is the switching MDD, which is responsible for
     forwarding encrypted media packets and conference control information to
     endpoints in the conference.  It is also responsible for conveying
     secured signaling between the endpoints and the key management function,
     acquiring per-hop authentication keys from the KMF, and performing
     per-hop authentication operations for media packets.  This function might
     also aggregate conference control information and initiate various
     conference control requests.  Forwarding of media packets requires that
     the switching MDD have access to RTP headers or header extensions and
     potentially modify those message elements, but the actual media content
     MUST not be decipherable by the switching MDD.

  -- The document date (July 6, 2015) is 3210 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'TBD' is mentioned on line 746, but not defined

  == Unused Reference: 'RFC3261' is defined on line 782, but no explicit
     reference was found in the text

  -- Obsolete informational reference (is this intentional?): RFC 4474
     (Obsoleted by RFC 8224)


     Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                     P. Jones (Ed.)
3	Internet Draft                                                 N. Ismail
4	Intended status: Informational                                 D. Benham
5	Expires: January 6, 2016                                      N. Buckles
6	                                                           Cisco Systems
7	                                                             J. Mattsson
8	                                                                Ericsson
9	                                                               R. Barnes
10	                                                                 Mozilla
11	                                                            July 6, 2015

13	      Private Media Requirements in Privacy Enhanced RTP Conferencing
14	                  draft-jones-perc-private-media-reqts-00

16	Abstract

18	   This document specifies the requirements for ensuring the privacy and
19	   integrity of real-time transport protocol (RTP) media flows between
20	   two or more endpoints communicating through one or more centrally
21	   located media distribution devices (MDDs).

23	Status of this Memo

25	   This Internet-Draft is submitted to IETF in full conformance with the
26	   provisions of BCP 78 and BCP 79.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF).  Note that other groups may also distribute
30	   working documents as Internet-Drafts.  The list of current Internet-
31	   Drafts is at http://datatracker.ietf.org/drafts/current/.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet-Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   This Internet-Draft will expire on January 6, 2015.

40	Copyright Notice

42	   Copyright (c) 2015 IETF Trust and the persons identified as the
43	   document authors. All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (http://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with respect
50	   to this document.  Code Components extracted from this document must
51	   include Simplified BSD License text as described in Section 4.e of
52	   the Trust Legal Provisions and are provided without warranty as
53	   described in the Simplified BSD License.

55	Table of Contents

57	   1. Introduction...................................................2
58	   2. Requirements Language..........................................3
59	   3. Terminology....................................................3
60	   4. Background.....................................................4
61	   5. Motivation for Private Media using switching MDDs..............5
62	      5.1. Switching Media in Cloud Services.........................5
63	      5.2. Private Media Security through Switching..................7
64	   6. Private Media Trust Model......................................8
65	      6.1. Trusted Elements..........................................9
66	      6.2. Untrusted Elements.......................................10
67	   7. Goals and Non-Goals...........................................11
68	      7.1. Goals....................................................11
69	         7.1.1. Ensure End-To-End Confidentiality...................11
70	         7.1.2. Ensure End-To-End Source Authentication of Media....11
71	         7.1.3. Provide a More Efficient Service than "Full-Mesh"...12
72	         7.1.4. Support Cloud-Based Conferencing....................12
73	         7.1.5. Limiting an Endpoint's Access to Content............12
74	         7.1.6. Compatibility with the WebRTC Security Architecture.12
75	      7.2. Non-Goals................................................13
76	         7.2.1. Securing the Endpoints..............................13
77	         7.2.2. Concealing that Communication Occurs................13
78	         7.2.3. Individual Media Source Authentication..............13
79	         7.2.4. Multicast -based Conferencing.......................14
80	   8. Requirements..................................................14
81	   9. IANA Considerations...........................................15
82	   10. Security Considerations......................................15
83	   11. References...................................................15
84	      11.1. Normative References....................................15
85	      11.2. Informative References..................................16
86	   12. Acknowledgments..............................................16
87	   13. Contributors.................................................17
88	   Authors' Addresses...............................................18

90	1. Introduction

92	   Users of multimedia communication products and services have privacy
93	   expectations that are largely satisfied with the use of SRTP
94	   [RFC3711] and related technologies when communicating point-to-point
95	   over the Internet.  When two or more endpoints communicate through a
96	   traditional media server, it is necessary for those endpoints to
97	   share the SRTP master key and salt information with the traditional
98	   media server so that it can authenticate and decrypt received RTP and
99	   RTCP packets.  The key material is needed so that a traditional media
100	   server can perform various operations on the media, such as mixing,
101	   transcoding, and transrating.  The traditional media server also
102	   needs the master key and salt in order to transmit media packets to
103	   other endpoints in the conference.  The need for a traditional media
104	   server to have the master key represents a security risk.

106	   Within a corporate or other isolated environment where all
107	   conferencing resources, including both call control and media
108	   processing functions, are tightly controlled, this security risk can
109	   be effectively managed.  However, managing this risk is becoming
110	   increasing difficult as conferencing resources are deployed in
111	   networks that are not so strictly managed or controlled, including
112	   resources on virtualized servers deployed in third-party cloud
113	   environments.

115	   There are also existing public voice and video conferencing service
116	   providers in which users must place full trust by sharing media
117	   encryption keys in order to use those services.  This exposes
118	   corporations, for example, to a higher risk of being subjected to
119	   corporate espionage.  While it is not the intent of this draft to
120	   suggest that any existing service provider would permit or condone
121	   any illicit use of its service, the fact is that security threats can
122	   come from either internal or external sources and remain undiscovered
123	   for long periods of time.

125	   It is possible to ensure real-time transport protocol (RTP) media
126	   privacy in deployments using one or more centrally located media
127	   distribution devices (MDDs) with limited changes in the security
128	   mechanisms used today.  This document discusses this possibility in
129	   more detail and presents a set of requirements that are neutral with
130	   respect to session signaling protocols.

132	   This document is focused on ensuring the privacy of RTP media in
133	   centralized MDD models only.  Other types of media are out of scope.
134	   Other, non-centralized media distribution models are also out of
135	   scope.

137	2. Requirements Language

139	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
140	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
141	   document are to be interpreted as described in RFC 2119 [RFC2119]
142	   when they appear in ALL CAPS.  These words may also appear in this
143	   document in lower case as plain English words, absent their normative
144	   meanings.

146	3. Terminology

148	   Adversary - An unauthorized entity that may attempt to compromise the
149	   performance of a media distribution device through various means,
150	   including, but not limited to, the transmission of bogus media
151	   packets or attempt to gain access to the plaintext of the media.

153	   Media content - The portion of the RTP (i.e., the encrypted RTP
154	   payload) or other packet containing the actual audio, video, or other
155	   multimedia information that is considered confidential and is subject
156	   to end-to-end encryption.  This does not include, for example, RTP
157	   headers, RTP header extensions, or RTCP packets.

159	   Switching media distribution device - A media distribution device
160	   that does not decrypt RTP media flows or perform processing on the
161	   media payload, but instead simply forwards the received media from a
162	   sender to the other endpoints in a multimedia conference.  A
163	   switching media distribution device may modify some portion of the
164	   RTP header and may often consume and create RTCP messages for
165	   efficient media handling.

167	4. Background

169	   Traditional media servers used for multimedia conferencing would mix,
170	   transcode, transrate, and/or recompose media flows from one or more
171	   conference participants' endpoints, sending out a different audio and
172	   video flow to each endpoint.  For audio, this might entail mixing
173	   some number of input flows that appear to contain audio intended to
174	   be heard by the other participants, with each endpoint receiving a
175	   flow that does not contain that participant's own audio.  For video,
176	   the traditional media server may elect to send only video showing the
177	   current active speaker, a tiled composition of all participants or
178	   the most recent active speakers, a video flow with the active speaker
179	   presented prominently with other participants presented as thumbnail
180	   images, or some other composite arrangement.  It is also common for
181	   audio or video to be transcoded.  A typical traditional media server
182	   is depicted in Figure 1.

184	                           +-------------------+
185	            +---+ --{A}--> |                   | <--{C}-- +---+
186	            | A |          | Media Composition |          | C |
187	            +---+ <-{BCD}- |                   | -{ABD}-> +---+
188	                           |    Transcoders    |
189	            +---+ --{B}--> |    Transraters    | <--{D}-- +---+
190	            | B |          |                   |          | D |
191	            +---+ <-{ACD}- |   Decrypt/Encrypt | -{ABC}-> +---+
192	                           +-------------------+

194	                     Figure 1 - Traditional Media Server

196	   Traditional media servers require a significant amount of processing
197	   power, which in turn translates into a high cost for conferencing
198	   hardware manufacturers.  Significantly, too, it is very difficult to
199	   deploy these servers in a cloud environment due to the high
200	   processing demands, as the specialized hardware found in the
201	   traditional media server does not exist in a cloud environment.

203	   To enable the traditional media server to perform its job, the server
204	   establishes one or more SRTP sessions with each of the conference
205	   endpoints wherein it is given access to the keys required to decrypt
206	   and encrypt media flows from and to each endpoint.  This means that
207	   the traditional media server is necessarily a fully trusted entity in
208	   the communication path.  Any time these servers are deployed in a
209	   network that is not secured, it increases the risk that an adversary
210	   might gain access to cryptographic key material, allowing the
211	   adversary to be able to see and listen to ongoing conferences.  In
212	   some instances, depending on how the hardware is designed and how
213	   keys and certificates are managed, it might be possible for an
214	   adversary to see and listen to previously recorded conferences or
215	   future conferences.

217	   The Secure Real-time Transport Protocol (SRTP) [RFC3711] is a profile
218	   of RTP, which can provide confidentiality, message authentication,
219	   and replay protection to the RTP traffic and to the RTP Control
220	   Protocol (RTCP).  Encryption of header extension in SRTP [RFC6904]
221	   provides a mechanism extending the mechanisms of [RFC3711], to
222	   selectively encrypt RTP header extensions in SRTP.  [RFC3711] and
223	   [RFC6904] solves end-to-end use cases between two endpoints, and does
224	   not consider use cases where a sender delivers media to a receiver
225	   via a cloud-based conferencing service.

227	5. Motivation for Private Media using switching MDDs

229	5.1. Switching Media in Cloud Services

231	   There is a trend in the industry for enterprises to use cloud
232	   services to host multi-party conferences and meet-me services, either
233	   exclusively or to meet peak loads on-demand.  At the same time, there
234	   is shift toward using lightweight, cost-effective switching MDDs in
235	   cloud services that do not necessarily need to mix audio or
236	   composite/transcode video.  Also fueling the use of such lightweight
237	   MDDs is the desire to fully exploit virtualized computing resources
238	   and dynamic scalability potential available in cloud computing
239	   environments.

241	   The increased use of cloud services has exposed a problem.  There are
242	   two different trust domains from a media perspective: endpoints and
243	   other devices in a trusted domain, and MDDs controlled by the cloud
244	   service in an untrusted domain.  Other examples of conference devices
245	   spread across trusted and untrusted domains are likely, but the cloud
246	   service trend is triggering the urgency to address the need to allow
247	   for lightweight media conference while enabling media privacy at the
248	   same time.

250	   With a switching MDD, each endpoint transmits media as it would with
251	   a traditional media server.  However, the switching MDD merely
252	   forwards all or a subset of the media to the other endpoints in the
253	   conference (where at least one other endpoint may be associated with
254	   a cascaded media distribution device), leaving composition to the
255	   receiving endpoint.  It is also worth noting that, for a switching
256	   MDD model to work successfully, each endpoint in the conference must
257	   support the media formats transmitted by all other endpoints in the
258	   conference.  More modern endpoints support multiple codecs and
259	   formats, making this commercially practical.

261	   Figure 2 depicts an example of a switching MDD wherein each endpoint
262	   is receiving the media flows transmitted by each of the other
263	   endpoints in the conference.

265	                           +--------------------+
266	            +---+ --{A}--> |                    | <-{C}--- +---+
267	            | A | <-{B}--- |   Switching MDD    | --{A}--> | C |
268	            |   | <-{C}--- |                    | --{B}--> |   |
269	            +---+ <-{D}--- |                    | --{D}--> +---+
270	                           |       Packet       |
271	            +---+ --{B}--> |   Authentication   | <-{D}--- +---+
272	            | B | <-{A}--- |                    | --{A}--> | D |
273	            |   | <-{C}--- |                    | --{B}--> |   |
274	            +---+ <-{D}--- |   Media Privacy    | --{C}--> +---+
275	                           +--------------------+

277	               Figure 2 - Switching Media Distribution Device

279	   Note - The use of multiple arrows directed toward each endpoint is
280	   not intended to suggest the use of separate RTP sessions.

282	   By using methods such as those described in [RFC6464], it is possible
283	   for the switching MDD to transmit the appropriate audio and video
284	   flows to endpoints without having knowledge of the content of the
285	   encrypted media.  The following "Active Speaker Switching" examples
286	   help illustrate this point.

288	   In Figure 3, endpoints A, B and D receive the video streams from
289	   endpoint C, the currently active speaker, which is receiving video
290	   from endpoint A, the previous active speaker.  Later when endpoint B
291	   becomes the active speaker (Figure 4), endpoints A, C and D will
292	   start to receive video from B, while endpoint B continues to receive
293	   video from endpoint C.  Finally in Figure 5, endpoint A becomes the
294	   active speaker.

296	                           +--------------------+
297	            +---+ --{A}--> |                    | <--{C}-- +---+
298	            | A |          |   Switching MDD    |          | C |*
299	            +---+ <-{C}--- |                    | ---{A}-> +---+
300	                           |                    |
301	            +---+ --{B}--> |                    | <--{D}-- +---+
302	            | B |          |                    |          | D |
303	            +---+ <-{C}--- |                    | ---{C}-> +---+
304	                           +--------------------+

306	                Figure 3 - Endpoint "C" is the Active Speaker

308	                           +--------------------+
309	            +---+ --{A}--> |                    | <--{C}-- +---+
310	            | A |          |   Switching MDD    |          | C |
311	            +---+ <-{B}--- |                    | ---{B}-> +---+
312	                           |                    |
313	            +---+ --{B}--> |                    | <--{D}-- +---+
314	           *| B |          |                    |          | D |
315	            +---+ <-{C}--- |                    | ---{B}-> +---+
316	                           +--------------------+

318	                Figure 4 - Endpoint "B" is the Active Speaker

320	                           +--------------------+
321	            +---+ --{A}--> |                    | <--{C}-- +---+
322	           *| A |          |   Switching MDD    |          | C |
323	            +---+ <-{B}--- |                    | ---{A}-> +---+
324	                           |                    |
325	            +---+ --{B}--> |                    | <--{D}-- +---+
326	            | B |          |                    |          | D |
327	            +---+ <-{A}--- |                    | ---{A}-> +---+
328	                           +--------------------+

330	                Figure 5 - Endpoint "A" is the Active Speaker

332	   Switched media can also enable conferences to scale to include many
333	   more endpoints simultaneously than would be possible with a
334	   traditional media server.  Like traditional media servers, switching
335	   MDDs can also be cascaded or interconnected in a meshed topology to
336	   increase the size of the conference without putting undue burden on
337	   any particular server.

339	5.2. Private Media Security through Switching

341	   A traditional media server, or MCU, establishes an SRTP session with
342	   each endpoint separately, and needs to decrypt packets containing
343	   media for presentation to other endpoints.  By using a switching MDD,
344	   it is possible to keep the media encryption keys private to the
345	   endpoints such that the MDD does not have access to the keys used for
346	   media encryption.  The switching MDD just forwards media received to
347	   each of the other endpoints in the conference.

349	   This provides for a significantly improved security model, as one
350	   can, for example, utilize conferencing resources in the cloud that do
351	   not have to be trusted.  That said, there may be situations where the
352	   switching MDD needs to modify the RTP packet received from an
353	   endpoint, such as by adding or removing an RTP header extension,
354	   modifying the payload type value, etc.  It would be the
355	   responsibility of the switching MDD to ensure that media of the
356	   expected type and containing the correct information is received by a
357	   recipient.

359	   Thus, there is a need to utilize an end-to-end encryption and
360	   authentication key (or pair of keys) and a hop-by-hop encryption and
361	   authentication key (or pair of keys).  The end-to-end encryption and
362	   authentication key(s) is to ensure that media remains private to the
363	   trusted endpoints.  The hop-by-hop authentication key allows the
364	   switching MDD to authenticate RTP and RTCP packets and to optionally
365	   modify certain elements of those packet.  The hop-by-hop encryption
366	   key is to optionally encrypt RTP header extensions and optionally
367	   encrypt RTCP packets.  The current SRTP and related specifications do
368	   not define use of a dual-key (hop-by-hop and end-to-end) approach.
369	   However, such an approach is possible and would result in ensuring
370	   the privacy of media while also enabling the more scalable switched
371	   conferencing model.

373	   This dual-key model does necessitate a change in the way that keys
374	   are managed.  However, the topic of key management is outside the
375	   scope of this requirements document.  High-level assumptions, such as
376	   if the end-to-end context uses a group key as SRTP master key or if
377	   individual SRTP master keys (that may be derived/negotiated from
378	   another group key), are likely to influence the solution derived from
379	   this document.

381	6. Private Media Trust Model

383	   The architectural model suggested in this document enables switching
384	   MDDs to be hosted in domains in which the network elements may have
385	   low trust, or where the trustworthiness is uncertain.  This does not
386	   mean that the service provider is completely untrusted; it simply
387	   means that high enough trust with media decryption is not required.
388	   This has the benefit of protecting the endpoint's media in the case
389	   of external attacks against the MDD.

391	   In this model, certain elements are considered trusted and others are
392	   considered untrusted.  Trust in the context of this document means
393	   that the element can be in possession of the media encryption key(s)
394	   for a past, current, or potentially future conference (or portion
395	   thereof) used to protect media content.

397	   In the general case, only the endpoint and an associated key
398	   management function, which may be integrated with the endpoint or in
399	   a separate stand-alone entity, needs to be trusted.  However, it is
400	   recognized that in certain deployments, some elements that are
401	   classified as untrusted in this document might be placed into the
402	   trusted domain and thus be considered trusted.  One example might be
403	   a gateway, traditional media server or other MDD in a trusted
404	   environment connecting endpoints to the same private media
405	   conference.  This document does not preclude such deployment
406	   combinations, but does not rely on them in order to keep the examples
407	   and model definitions focused on the simple, most general case.

409	   Each of the elements discussed below has a direct or indirect
410	   relationship with each other.  The following diagram depicts the
411	   trust relationships described in the following sub-sections and the
412	   media or signaling interfaces that exist between them, showing the
413	   trusted elements on the left and untrusted elements on the right.
414	   Note that this is a functional diagram and elements may be co-located
415	   or further divided into multiple separate physical entities.
416	   Further, it is not necessary that every interface exist between all
417	   elements, such as both an interface from the endpoint and call
418	   processing function to a key management function, though both are
419	   possible options.

421	                                     |
422	                                     |
423	                 +----------+        |       +-----------------+
424	                 | Endpoint |        |       | Call Processing |
425	                 +----------+        |       +-----------------+
426	                                     |
427	                                     |
428	              +----------------+     |       +-----------------+
429	              | Key Management |     |       | Switching Media |
430	              |    Function    |     |       |     Server      |
431	              +----------------+     |       +-----------------+
432	                                     |
433	                   Trusted           |            Untrusted
434	                   Elements          |            Elements
435	                                     |
436	                                     |

438	          Figure 6 - Relationship of Trusted and Untrusted Elements

440	6.1. Trusted Elements

442	   The endpoint is considered a trusted element, as it will be sourcing
443	   media flows transmitted to other endpoints and will be receiving
444	   media for rendering.  While it is possible for an endpoint to be
445	   compromised and perform in unexpected ways, such as transmitting a
446	   decrypted copy of media content to an adversary, such security issues
447	   and defenses are outside the scope of this document.

449	   The other trusted element is a key management function (KMF), which
450	   may be integrated with the endpoints or exist standalone.  This
451	   function is responsible for providing cryptographic keys to the
452	   endpoints for encrypting and authenticating media content.  The KMF
453	   is also responsible for providing cryptographic keys to the
454	   conferencing resources, such as the MDD, to enable authentication of
455	   media packets received by an endpoint.  Interaction between the KMF
456	   and untrusted call processing functions may be necessary to ensure
457	   endpoints are delivered the appropriate keys.  The KMF needs to be
458	   tightly controlled and managed to prevent exploitation by an
459	   adversary, as any kind of security compromise of the KMF puts the
460	   security of the conference at risk.

462	6.2. Untrusted Elements

464	   The call processing function is responsible for such things as
465	   authenticating the user or endpoint for the purpose of joining a
466	   conference, signing messages, and processing call signaling messages.
467	   This element is responsible for ensuring the integrity, and
468	   optionally the confidentiality, of call signaling messages between
469	   itself, the endpoint, and other network elements.  However, it is
470	   considered an untrusted element for the purposes of this document, as
471	   it cannot be trusted to have access to or be able to gain access to
472	   cryptographic key material that provides privacy and integrity of
473	   media packets.

475	   There might be several independent call processing functions within
476	   an enterprise, service provider network, or the Internet that are
477	   classified as untrusted.  Any signaling information that passes
478	   through these untrusted entities is subject to inspection by that
479	   element and might be altered by an adversary.

481	   Likewise, there may be certain deployment models where the call
482	   processing function is considered trusted.  In such cases, trusted
483	   call processing functions MUST take responsibility for ensuring the
484	   integrity of received messages before delivering those to the
485	   endpoint.  How signaling message integrity is ensured is outside the
486	   scope of this document, but might use such methods as defined in
487	   [RFC4474].

489	   The final element is the switching MDD, which is responsible for
490	   forwarding encrypted media packets and conference control information
491	   to endpoints in the conference.  It is also responsible for conveying
492	   secured signaling between the endpoints and the key management
493	   function, acquiring per-hop authentication keys from the KMF, and
494	   performing per-hop authentication operations for media packets.  This
495	   function might also aggregate conference control information and
496	   initiate various conference control requests.  Forwarding of media
497	   packets requires that the switching MDD have access to RTP headers or
498	   header extensions and potentially modify those message elements, but
499	   the actual media content MUST not be decipherable by the switching
500	   MDD.

502	   Further, the switching MDD does not have the ability to determine
503	   whether an endpoint is authorized to have access to media encryption
504	   keys.  Merely joining a conference MUST NOT be interpreted as having
505	   authority.  Media encryption keys are conveyed to the endpoint by the
506	   KMF in such a way as to prevent the switching MDD from having access
507	   to those keys.

509	   It is assumed that an adversary might have access to the switching
510	   MDD and have the ability to read any of the contents that pass
511	   through.  For this reason, it is untrusted to have access to the
512	   media encryption keys.

514	   As with the call processing functions, it is appreciated that there
515	   may be some deployments wherein the switching MDD is trusted.
516	   However, for the purposes of this document, the switching MDD is
517	   considered untrusted so that we can be ensure to develop a solution
518	   that will work even in the most hostile environments.

520	   It is expected that a switching MDD performs its role in properly
521	   forwarding media packets, taking measures to safeguard against replay
522	   attacks, etc.  If a MDD is exploited, an adversary may do such things
523	   as discard packets, replay packets, or introduce unacceptable delay
524	   in packet delivery.

526	7. Goals and Non-Goals

528	7.1. Goals

530	7.1.1. Ensure End-To-End Confidentiality

532	   The content of the communication and all media needs to be
533	   confidential within the group of entities explicitly invited into the
534	   conference.  An external monitoring adversary should not be able to
535	   deduce the human-to-human communication that actually occurred from
536	   capturing the media packets.

538	   At the same time, it is necessary to allow switching MDDs to
539	   manipulate certain RTP header fields like the payload type value.

541	7.1.2. Ensure End-To-End Source Authentication of Media

543	   In a conference system with multiple endpoints it is vital that the
544	   media content presented to any of the human participants is from the
545	   stated endpoint, and not an adversary that attempts to inject
546	   misleading content.  Nor should an adversary be able to fool the
547	   system into becoming a trusted party in the conference.  Only
548	   explicitly invited parties shall be able to contribute content.

550	7.1.3. Provide a More Efficient Service than "Full-Mesh"

552	   A multi-party conference that has the goals of confidentiality and
553	   source authentication can be established as a "full mesh" (i.e., each
554	   participating endpoint directly addresses each of the other
555	   endpoints).  However, this has a significant issue with the amount of
556	   consumed resources in both the uplink and the downlink from each
557	   endpoint.

559	   A switched conferencing model would yield the efficiencies desired.

561	7.1.4. Support Cloud-Based Conferencing

563	   To achieve cost-effective and scalable conferencing, it must be
564	   possible to run the MDD instances in a cloud-based virtualized
565	   environment.

567	   From a security standpoint, this is a significant issue since the
568	   virtualized server instance and the underlying hardware and software
569	   upon which it runs might not be secure from an adversary.

571	7.1.5. Limiting an Endpoint's Access to Content

573	   Since an invited endpoint will be provided with the content
574	   protection keys, the endpoint can decrypt content from time periods
575	   before and after the endpoint joined the conference.  However, this
576	   is not always desirable.  It should be possible to re-key the content
577	   protection keys every time a participant joins or leaves the
578	   conference so each particular set of endpoints uses a unique key.

580	   This also changes the trust level required on the conference roster
581	   handling at any point and how to keep that accurate and secured.

583	   It should be noted that timely completion of the re-keying operations
584	   become an obstacle in system design and operation.  Thus, it is a
585	   goal to allow for this possibility when it is deemed essential, but
586	   it should not be a requirement on a system to re-key each time the
587	   participant list changes.

589	7.1.6. Compatibility with the WebRTC Security Architecture

591	   It is a goal of this work to ensure compatibility with the WebRTC
592	   security architecture as described in [I.D-rtcweb-security-arch].  As
593	   an example, local resources that are considered a part of the trusted
594	   computing base (TCB), such as keying material derived using DTLS-
595	   SRTP, will remain within the TCB and not exposed to untrusted
596	   entities.

598	   The browser is reliant on an external calling service to convey
599	   signaling information that may open the door for a man-in-the-middle
600	   attack, such as the conveyance of certificate fingerprints over the
601	   interface between the browser and the calling service.  However, as
602	   described in [I.D-rtcweb-security-arch], the browser may utilize
603	   additional services, such as a trusted identify provider, to mitigate
604	   such risks.

606	   Having said the foregoing, this document does not aim to define
607	   requirements for end-to-end security for the WebRTC data channel.

609	7.2. Non-Goals

611	7.2.1. Securing the Endpoints

613	   The security of a communication session requires that the endpoints
614	   are not compromised and that the users are trustworthy.  If not,
615	   credentials and decrypted content may be shared with third parties.
616	   However, this is hard to prevent through system design.  Thus, it
617	   should be assumed that the endpoint is secure and the user is
618	   trustworthy; how to achieve this is out of scope this document.

620	7.2.2. Concealing that Communication Occurs

622	   A non-goal is to attempt to prevent a pervasive monitoring adversary
623	   from knowing that the communication session has occurred.  The reason
624	   for excluding this as a goal is that it is extremely difficult to
625	   achieve, as a pervasive monitoring adversary can be expected to be
626	   able to have knowledge of all IP flows that enter or exit local ISPs,
627	   across links that straddle national borders or internet exchange
628	   points.  To hide the fact communication occurred, the flows required
629	   to achieve the communication session need to be highly difficult to
630	   correlate between different legs of the communication.

632	   At this stage this is deemed too difficult to attempt and will need
633	   to be a subject for further study.  Existing attempts include The
634	   Onion Router (TOR), against which it has been claimed to be possible
635	   to monitor, at least partially, by an adversary with sufficient
636	   reach.

638	   Also of consideration is that trying to conceal the fact that
639	   communication occurred actually makes it more difficult for network
640	   administrators to effectively manage and troubleshoot issues with
641	   conference calls.

643	7.2.3. Individual Media Source Authentication

645	   Although the endpoints in the conference are authenticated, it is not
646	   a goal to provide source authentication of the media at the
647	   individual user level, instead being satisfied with being able to
648	   authenticate media as coming from an invited endpoint or not.

650	   There exist solutions that can provide individual media source
651	   authentication (e.g., TESLA).  However, they impact the performance
652	   or security properties they provide.  Thus, further study is required
653	   to determine impact and resulting security properties if desired to
654	   have individual source authentication.

656	7.2.4. Multicast -based Conferencing

658	   Using multicast to construct a non-centralized media distribution
659	   model is out of scope.  This document is focused only on models where
660	   endpoints, or other devices, participating in a conference unicast
661	   media to a centrally located media distribution device.

663	8. Requirements

665	   The following are the security solution requirements for switched
666	   conferencing that enable end-to-end media privacy between all
667	   endpoints.

669	   Note that while some switching MDDs might be fully trusted entities,
670	   the intent of this solution and purpose for these requirements is to
671	   address those servers that are not trusted.

673	   PM-01:  Switching media distribution device MUST be able to switch
674	           the media between endpoints in a conference without having
675	           access to unencrypted media content.

677	   PM-02:  Solution MUST maintain all current SRTP security goals,
678	           namely the ability to provide for end-to-end confidentiality,
679	           provide for hop-by-hop replay protection, and ensure hop-by-
680	           hop and end-to-end message integrity.

682	   PM-03:  Solution MUST extend replay protection to cover each hop in
683	           the media path, both ensuring that any received packet is
684	           destined for the recipient and not a duplicate.

686	   PM-04:  Keys used for end-to-end encryption and authentication of RTP
687	           payloads and other information deemed unsuitable for access
688	           by the switching media distribution device MUST NOT be
689	           generated by or accessible to any component that is not
690	           trusted.

692	   PM-05:  The switching media distribution device MUST be allowed to
693	           make changes to the RTP header and the RTP header extensions.

695	   PM-06:  A cryptographic context suitable for enabling end-to-end
696	           authenticated encryption MUST be defined.

698	   PM-07:  The switching media distribution device, or any entity that
699	           is not fully trusted, MUST NOT be involved in the user or
700	           endpoint authentication for the purpose of media key
701	           distribution.

703	   PM-08:  The switching media distribution device MUST be able to
704	           switch an already active RTP stream to a new receiver, while
705	           guaranteeing the timely synchronization between the RTP
706	           security context of the transmitter and its current and new
707	           receivers.

709	   PM-09:  It MUST be possible for the switching media distribution
710	           device to determine if a received media packet was
711	           transmitted by an endpoint in possession of a valid hop-by-
712	           hop key for that conference.

714	   PM-10:  It MUST be possible for a conference to be optionally re-
715	           keyed as desired, such as each time a participant joins or
716	           leaves the conference.

718	   PM-11:  Any solution satisfying this requirements document MUST
719	           provide for a means through which WebRTC-compliant endpoints
720	           can participate in a switched conference using private media
721	           as outlined herein.

723	   PM-12:  All RTP senders, including the switching media distribution
724	           device, MUST adhere to all congestion control requirements
725	           that are required by the RTP profile and topology in use,
726	           including RTP circuit breakers [I.D-ietf-avtcore-rtp-circuit-
727	           breakers].  Since the switching media distribution device is
728	           unable to perform transcoding or transrating that requires
729	           access to the unencrypted media, its reaction to congestion
730	           signals is often limited to dropping packets that would
731	           otherwise be forwarded in the absence of congestion, and
732	           signaling congestion to the RTP source.  This is similar to
733	           the congestion control behavior of the Media Switching Mixer
734	           and Selective Forwarding Middlebox/Unit in [I.D-ietf-avtcore-
735	           rtp-topologies-update].

737	   PM-13:  It MUST be possible for a media distribution device or an
738	           endpoint to authenticate a received RTCP packet.

740	9. IANA Considerations

742	   There are no IANA considerations for this document.

744	10. Security Considerations

746	   [TBD]

748	11. References

750	11.1. Normative References

752	   [RFC2119]   Bradner, S., "Key words for use in RFCs to Indicate
753	               Requirement Levels", BCP 14, RFC 2119, March 1997.

755	   [RFC3711]   Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
756	               Norrman, "The Secure Real-time Transport Protocol
757	               (SRTP)", RFC 3711, March 2004.

759	   [RFC6464]   Lennox, J., Ivov, E., and E. Marocco, "A Real-time
760	               Transport Protocol (RTP) Header Extension for Client-to-
761	               Mixer Audio Level Indication", RFC 6464, December 2011.

763	   [I.D-rtcweb-security-arch]
764	               E. Rescorla, "WebRTC Security Architecture", Work in
765	               Progress, March 2015.

767	   [RFC6904]   J. Lennox, "Encryption of Header Extensions in the Secure
768	               Real-time Transport Protocol (SRTP)", RFC 6904, December
769	               2013.

771	   [I.D-ietf-avtcore-rtp-topologies-update]
772	               Westerlund, M., and S. Wenger, "RTP Topologies", Work in
773	               Progress, March 2015.

775	   [I.D-ietf-avtcore-rtp-circuit-breakers]
776	               Perkins, C. S., and V. Singh, "Multimedia Congestion
777	               Control: Circuit Breakers for Unicast RTP Sessions", Work
778	               in Progress, March 2015.

780	11.2. Informative References

782	   [RFC3261]   Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston,
783	               A., Peterson, J., Sparks, R., Handley, M., and E.
784	               Schooler, "SIP: Session Initiation Protocol", RFC 3261,
785	               June 2002.

787	   [RFC4474]   Peterson, J. and C. Jennings, "Enhancements for
788	               Authenticated Identity Management in the Session
789	               Initiation Protocol (SIP)", RFC 4474, August 2006.

791	12. Acknowledgments

793	   The authors would like to thank Marcello Caramma, Matthew Miller,
794	   Christian Oien, Magnus Westerlund, Cullen Jennings, Christer
795	   Holmberg, Bo Burman, Jonathan Lennox, Suhas Nandakumar, Dan Wing,
796	   Roni Even, and Mo Zanaty for their invaluable input.

798	13. Contributors

800	   Yi Cheng
801	   Ericsson
802	   SE-164 80 Stockholm
803	   Sweden

805	   Phone: +46 10 71 17 589
806	   Email: yi.cheng@ericsson.com

808	Authors' Addresses

810	   Paul E. Jones
811	   Cisco Systems, Inc.
812	   7025 Kit Creek Rd.
813	   Research Triangle Park, NC 27709
814	   USA

816	   Phone: +1 919 476 2048
817	   Email: paulej@packetizer.com

819	   Nermeen Ismail
820	   Cisco Systems, Inc.
821	   170 W Tasman Dr.
822	   San Jose
823	   USA

825	   Email: nermeen@cisco.com

827	   David Benham
828	   Cisco Systems, Inc.
829	   170 W Tasman Dr.
830	   San Jose
831	   USA

833	   Email: dbenham@cisco.com

835	   Nathan Buckles
836	   Cisco Systems, Inc.
837	   170 W Tasman Dr.
838	   San Jose
839	   USA

841	   Email: nbuckles@cisco.com

843	   John Mattsson
844	   Ericsson AB
845	   SE-164 80 Stockholm
846	   Sweden

848	   Phone: +46 10 71 43 501
849	   Email: john.mattsson@ericsson.com

851	   Richard Barnes
852	   Mozilla
853	   331 E Evelyn Ave.

855	   Mountain View
856	   USA

858	   Email: rlb@ipv.sx