idnits 2.17.1 

draft-westerlund-avtcore-transport-multiplexing-07.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (October 21, 2013) is 3839 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: 'RFC5234' is defined on line 1013, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-54) exists of
     draft-ietf-mmusic-sdp-bundle-negotiation-05

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  == Outdated reference: A later version (-13) exists of
     draft-ietf-avtcore-multi-media-rtp-session-03

  == Outdated reference: A later version (-12) exists of
     draft-ietf-avtcore-multiplex-guidelines-01

  -- Obsolete informational reference (is this intentional?): RFC  793
     (Obsoleted by RFC 9293)

  -- Obsolete informational reference (is this intentional?): RFC 4960
     (Obsoleted by RFC 9260)

  -- Obsolete informational reference (is this intentional?): RFC 5245
     (Obsoleted by RFC 8445, RFC 8839)

  -- Obsolete informational reference (is this intentional?): RFC 5285
     (Obsoleted by RFC 8285)

  -- Obsolete informational reference (is this intentional?): RFC 5389
     (Obsoleted by RFC 8489)


     Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 6 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                      M. Westerlund
3	Internet-Draft                                                  Ericsson
4	Intended status: Standards Track                           C. S. Perkins
5	Expires: April 24, 2014                            University of Glasgow
6	                                                        October 21, 2013

8	 Multiplexing Multiple RTP Sessions onto a Single Lower-Layer Transport
9	           draft-westerlund-avtcore-transport-multiplexing-07

11	Abstract

13	   This memo defines a mechanism to allow multiple RTP sessions to be
14	   multiplexed onto a single lower-layer transport flow (e.g., onto a
15	   single UDP 5-tuple).  Requirements for multiplexing RTP sessions are
16	   discussed, along with the trade-off between the different options.  A
17	   shim-based multiplexing layer is proposed, along with associated
18	   signalling.

20	Status of This Memo

22	   This Internet-Draft is submitted in full conformance with the
23	   provisions of BCP 78 and BCP 79.

25	   Internet-Drafts are working documents of the Internet Engineering
26	   Task Force (IETF).  Note that other groups may also distribute
27	   working documents as Internet-Drafts.  The list of current Internet-
28	   Drafts is at http://datatracker.ietf.org/drafts/current/.

30	   Internet-Drafts are draft documents valid for a maximum of six months
31	   and may be updated, replaced, or obsoleted by other documents at any
32	   time.  It is inappropriate to use Internet-Drafts as reference
33	   material or to cite them other than as "work in progress."

35	   This Internet-Draft will expire on April 24, 2014.

37	Copyright Notice

39	   Copyright (c) 2013 IETF Trust and the persons identified as the
40	   document authors.  All rights reserved.

42	   This document is subject to BCP 78 and the IETF Trust's Legal
43	   Provisions Relating to IETF Documents
44	   (http://trustee.ietf.org/license-info) in effect on the date of
45	   publication of this document.  Please review these documents
46	   carefully, as they describe your rights and restrictions with respect
47	   to this document.  Code Components extracted from this document must
48	   include Simplified BSD License text as described in Section 4.e of
49	   the Trust Legal Provisions and are provided without warranty as
50	   described in the Simplified BSD License.

52	Table of Contents

54	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
55	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
56	   3.  Motivation  . . . . . . . . . . . . . . . . . . . . . . . . .   4
57	   4.  Requirements  . . . . . . . . . . . . . . . . . . . . . . . .   6
58	   5.  Design Considerations . . . . . . . . . . . . . . . . . . . .   8
59	     5.1.  Location of Multiplexing Shim Header  . . . . . . . . . .   9
60	     5.2.  ICE and DTLS-SRTP Integration . . . . . . . . . . . . . .  10
61	     5.3.  Signalling Fall Back  . . . . . . . . . . . . . . . . . .  10
62	   6.  Specification . . . . . . . . . . . . . . . . . . . . . . . .  11
63	     6.1.  Shim Layer  . . . . . . . . . . . . . . . . . . . . . . .  11
64	     6.2.  Signalling  . . . . . . . . . . . . . . . . . . . . . . .  15
65	     6.3.  SRTP Key Management . . . . . . . . . . . . . . . . . . .  16
66	       6.3.1.  Security Description  . . . . . . . . . . . . . . . .  16
67	       6.3.2.  DTLS-SRTP . . . . . . . . . . . . . . . . . . . . . .  17
68	       6.3.3.  MIKEY . . . . . . . . . . . . . . . . . . . . . . . .  17
69	     6.4.  Examples  . . . . . . . . . . . . . . . . . . . . . . . .  18
70	       6.4.1.  Secure RTP Packet with Multiplexing Shim  . . . . . .  18
71	       6.4.2.  Basic RTP Multiplex Negotiation in SDP  . . . . . . .  19
72	       6.4.3.  Advanced RTP Multiplex Negotiation in SDP . . . . . .  20
73	   7.  Open Issues . . . . . . . . . . . . . . . . . . . . . . . . .  20
74	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  21
75	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  21
76	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  21
77	   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  21
78	     11.1.  Normative References . . . . . . . . . . . . . . . . . .  22
79	     11.2.  Informational References . . . . . . . . . . . . . . . .  22
80	   Appendix A.  Possible Solutions . . . . . . . . . . . . . . . . .  24
81	     A.1.  Header Extension  . . . . . . . . . . . . . . . . . . . .  24
82	     A.2.  Multiplexing Shim . . . . . . . . . . . . . . . . . . . .  25
83	     A.3.  Single Session  . . . . . . . . . . . . . . . . . . . . .  26
84	     A.4.  Use the SRTP MKI field  . . . . . . . . . . . . . . . . .  27
85	     A.5.  Use an Octet in the Padding . . . . . . . . . . . . . . .  28
86	     A.6.  Redefine the SSRC field . . . . . . . . . . . . . . . . .  28
87	   Appendix B.  Comparison . . . . . . . . . . . . . . . . . . . . .  29
88	     B.1.  Support of Multiple RTP Sessions Over Single Transport  .  29
89	     B.2.  Enable Same SSRC Value in Multiple RTP Sessions . . . . .  29
90	       B.2.1.  Avoid SSRC Translation in Gateways/Translation  . . .  29
91	       B.2.2.  Support Existing Extensions . . . . . . . . . . . . .  30
92	     B.3.  Ensure SRTP Functions . . . . . . . . . . . . . . . . . .  30
93	     B.4.  Don't Redefine Used Bits  . . . . . . . . . . . . . . . .  31
94	     B.5.  Firewall Friendly . . . . . . . . . . . . . . . . . . . .  32
95	     B.6.  Monitoring and Reporting  . . . . . . . . . . . . . . . .  33
96	     B.7.  Usable over Multicast . . . . . . . . . . . . . . . . . .  34
97	     B.8.  Incremental Deployment  . . . . . . . . . . . . . . . . .  34
98	     B.9.  Summary and Conclusion  . . . . . . . . . . . . . . . . .  36
99	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  37

101	1.  Introduction

103	   With the ongoing development of the WebRTC conferencing and CLUE
104	   telepresence standards, there is renewed interest in defining a
105	   mechanism that allows multiple RTP sessions [RFC3550] to share a
106	   single lower layer transport, such as a bi-directional UDP flow.  The
107	   main problem driving this is the cost of doing NAT/firewall traversal
108	   for each individual RTP flow.  ICE and other NAT/firewall traversal
109	   solutions are clearly capable of attempting to open multiple flows.
110	   However, there is both increased risk for failure, and an increased
111	   cost in the creation of multiple flows.  The increased cost comes as
112	   slightly higher delay in establishing the traversal, and the amount
113	   of consumed NAT/firewall resources.  The latter might be an
114	   increasing problem in the IPv4 to IPv6 transition period.

116	   There is ongoing work on specifying how and when one RTP session can
117	   contain multiple media types
118	   [I-D.ietf-avtcore-multi-media-rtp-session].  That addresses certain
119	   use cases, while this proposal addresses a different set of use cases
120	   and motivations (discussed further in Section 3).  The classical
121	   method of having each RTP session run over a specific transport flow
122	   is still motivated for a number of use cases, especially when flow
123	   based QoS is to be used for some media streams.

125	   This memo draws up some requirements for consideration on how to
126	   transport multiple RTP sessions over a single lower-layer transport.
127	   These requirements have to be weighted carefully, as no known
128	   solution exists that can fulfil the combined set of requirements
129	   completely.  A number of possible solutions where considered and
130	   discussed with respect to their properties.  Based on that, this memo
131	   defines a multiplexing shim, along with SDP signalling, and examples.
132	   The other considered proposals and the comparison is available as
133	   appendices.

135	2.  Terminology

137	   Unless specifically noted, all mentioning of multiplexing in this
138	   memo refer to the multiplexing of multiple RTP Sessions onto the same
139	   lower layer transport.  It is important to make this distinction as
140	   RTP contains a number of multiplexing points for various purposes,
141	   such as media formats (Payload Type), media sources (SSRC), and RTP
142	   sessions.

144	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
145	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
146	   document are to be interpreted as described in RFC 2119 [RFC2119].

148	3.  Motivation

150	   RTP has always allowed applications to use of multiple RTP sessions,
151	   by using different transport-layer flows for each session [RFC3550].
152	   The primary motivation was to support differential quality of service
153	   per session, using flow-level differentiated services mechanisms, but
154	   it also lets applications separate flows into several RTP sessions to
155	   better reflect application-level semantics where appropriate.

157	   More recently, there has been a desire to send multiple types of
158	   media in a single RTP session.  This uses one RTP session instead of
159	   several RTP sessions, giving up flow-level quality of service, and
160	   semantic separation of traffic, but reducing the number of transport
161	   level flows to ease NAT and firewall traversal.  Clarifications to
162	   the RTP specification to support this can be found in
163	   [I-D.ietf-avtcore-multi-media-rtp-session].

165	   There is also a third option that can be useful in some cases.  This
166	   is to somehow multiplex several RTP sessions onto a single transport
167	   layer flow.  The motivations for why this alternative is needed are
168	   as follows.

170	   To Ease NAT and Firewall Traversal:  The existence of network address
171	      translation (NAT/NAPT) and firewalls on almost all Internet access
172	      has implications for protocols, such as RTP, that were designed to
173	      use multiple transport-layer flows.  Any NAT or firewall traversal
174	      solution has to to ensure that all the necessary transport-layer
175	      flows are established.  This has three impacts:

177	      1.  Increased delay to perform the transport flow establishment

179	      2.  The more transport flows, the more state and the more resource
180	          consumption in the NAT and Firewalls.  When the resource
181	          consumption in NAT/firewalls reaches their limits, unexpected
182	          behaviours usually occur.  Commonly resulting in service
183	          disruptions.

185	      3.  More transport flows means a higher risk that some transport
186	          flow fails to be established, thus preventing the application
187	          to communicate.

189	      Using fewer transport-layer flows, by multiplexing several RTP
190	      sessions onto a single transport-layer flow, reduces the risk of
191	      communication failure, improves establishment behaviour, and
192	      reduces the load on NATs and firewalls.

194	   To Support Application-level Session-layer Semantics:  Applications
195	      can use multiple RTP sessions to separate media streams that have
196	      different uses or purposes.  For example, a group conferencing
197	      application might use one RTP session to distribute high-quality
198	      video of the active speaker, switching the source of that video as
199	      the conversation progresses, coupled with a second RTP session to
200	      send always-on low-quality views of the inactive speakers, making
201	      it easier of the MCU to manage the traffic.  Separation of flows
202	      into different RTP sessions also allows different processing based
203	      on the media type, such as audio and video, in end-points and
204	      middleboxes.  This can give middleboxes the knowledge that any
205	      SSRC within the session is supposed to be processed in a similar
206	      way, saving them the need to perform differential processing on a
207	      per-SSRC basis.

209	      Not all applications need to separate their traffic into different
210	      semantic classes.  And, for those that do, it is clearly possible
211	      to find other multiplexing solutions for many simpler cases, for
212	      example based on signalled semantics for SSRC, or looking at the
213	      payload type and differences in encoding.  This lack of semantic
214	      separation for some flows becomes more critical as the application
215	      semantics get more complex.  For example, an application that has
216	      one set of video streams showing session participants, and another
217	      set that shares an application or presentation slides, would
218	      likely want to separate those streams for reasons such as control,
219	      prioritization, QoS, methods for robustness, etc.  In those cases,
220	      using the RTP session for separation of flows with different
221	      semantics is a powerful tool that can ease the application design,
222	      and something that we would like to preserve when providing a
223	      solution for how to use only a single lower-layer transport.

225	      Multiplexing and the use of different RTP session is discussed
226	      further in [I-D.ietf-avtcore-multiplex-guidelines].

228	   To Allow Use of Certain RTP Extensions:  Different applications use
229	      different sets of RTP extensions.  Several of these extensions are
230	      known to have limitations that prevent them from being used in RTP
231	      sessions that carry different types of media.  This is discussed
232	      more in [I-D.ietf-avtcore-multi-media-rtp-session].  The
233	      extensions that are known to be problematic include parity FEC
234	      [RFC5109], RTP Retransmission in session mode [RFC4588], and some
235	      forms of layered coding.  This prevents some applications from
236	      sending multiple types of media in a single RTP session, forcing
237	      them to use multiple RTP sessions.  To prevent those applications
238	      from having to use several transport-layer flows for the different
239	      RTP sessions, it is desirable to have a way of multiplexing
240	      several RTP sessions on a single transport-layer flow.

242	   The centre of the motivation is to ensure that the use of multiple
243	   RTP sessions is available, and usable, for applications that have no
244	   need for transport-layer separation of their media streams and want
245	   to reduce their exposure to any NAT or Firewall inconsistencies and
246	   minimize the resource consumption.  As a benefit, a well designed
247	   solution will remove the limitations on what existing RTP mechanisms
248	   or extensions that can be used by the application, when compared to
249	   sending multiple media types in a single RTP session.

251	4.  Requirements

253	   This section lists and discusses a number of potential requirements.
254	   However, it is not difficult to realize that it is in fact possible
255	   to put requirements that makes the set of feasible solutions an empty
256	   set.  It is thus necessary to consider which requirements that are
257	   essential to fulfil and which can be compromised on to arrive at a
258	   solution.

260	   Support Use of Multiple RTP Sessions:  As stated in the RTP
261	      specification [RFC3550], "The distinguishing feature of an RTP
262	      session is that each maintains a full, separate space of SSRC
263	      identifiers [...].  The set of participants included in one RTP
264	      session consists of those that can receive an SSRC identifier
265	      transmitted by any one of the participants either in RTP as the
266	      SSRC or a CSRC [...] or in RTCP".  Accordingly, any mechanism to
267	      multiplex several RTP sessions onto a single transport-layer flow
268	      needs to allow each RTP session to use the complete SSRC space,
269	      independent of any other RTP sessions multiplexed onto that
270	      transport-layer flow.

272	      As a corollary of the above, two different RTP sessions that are
273	      being multiplexed onto the same transport-layer flow need to be
274	      able to use the same SSRC value.  This is a absolute requirement,
275	      for two reasons.  Firstly, to avoid mandating SSRC assignment
276	      rules that are coordinated between the sessions.  If the RTP
277	      sessions multiplexed together need to have unique SSRC values,
278	      then additional code that works between RTP Sessions is needed in
279	      the implementations.  Thus raising the bar for implementing this
280	      solution.  In addition, if one gateways between parts of a system
281	      using this multiplexing and parts that aren't multiplexing, the
282	      part that isn't multiplexing also needs to fulfil the requirements
283	      on how SSRC is assigned or force the gateway to translate SSRCs.
284	      Translating SSRC is actually hard as it requires one to understand
285	      the semantics of all current and future RTP and RTCP extensions.
286	      Otherwise a barrier for deploying new extensions is created.
287	      Second, there are some few RTP extensions that currently rely on
288	      being able to use the same SSRC in different RTP sessions,
289	      including parity FEC [RFC5109], RTP Retransmission in session mode
290	      [RFC4588], and some forms of layered coding.

292	   Support the Secure RTP (SRTP) Profile:  SRTP [RFC3711] is one of the
293	      most commonly used security solutions for RTP.  In addition, it is
294	      the only one defined by IETF that is integrated into RTP.  This
295	      integration has several aspects that needs to be considered when
296	      designing a solution for multiplexing RTP sessions on the same
297	      lower layer transport.

299	      Determining Crypto Context:  SRTP first of all needs to know which
300	            session context a received or to-be-sent packet relates to.
301	            It also normally relies on the lower layer transport to
302	            identify the session.  It uses the Master Key Indicator
303	            (MKI), if present, to determine which key set is to be used.
304	            Then the SSRC and sequence number are used by most crypto
305	            suites, including the most common use of AES Counter Mode,
306	            to actually generate the correct cipher stream.

308	      Unencrypted Headers:  SRTP has chosen to leave the RTP headers and
309	            the first two 32-bit words of the first RTCP header
310	            unencrypted, to allow for both header compression and
311	            monitoring to work also in the presence of encryption.  As
312	            these fields are in clear text they are used in most crypto
313	            suites for SRTP to determine how to protect or recover the
314	            plain text.

316	      It is here important to contrast SRTP against a set of other
317	      possible protection mechanisms.  DTLS, TLS, and IPsec are all
318	      protecting and encapsulating the entire RTP and RTCP packets.
319	      They don't perform any partial operations on the RTP and RTCP
320	      packets.  Any change that is considered to be part of the RTP and
321	      RTCP packet is transparent to them, but possibly not to SRTP.
322	      Thus the impact on SRTP operations has to be considered when
323	      defining a mechanism.

325	   Support Legacy Implementations of RTP and RTCP:  The core of RTP is
326	      in use in many systems, and has an extremely large deployed base
327	      with numerous implementations.  Changing any of the RTP or RTCP
328	      packet definitions, outside of defined extension points, is highly
329	      problematic.  First of all, the implementations need to change to
330	      support this new semantics.  Secondly, you get a large transition
331	      period when you have some session participants that support the
332	      new semantics and some that don't.  Combing the two behaviours in
333	      the same session can force the deployment of costly and less than
334	      perfect translation devices.

336	   Support NAT and Firewall Traversal:  It is desirable that current NAT
337	      devices, firewalls, and application level gateways will accept
338	      multiplexed packets from several RTP sessions as they accept
339	      normal RTP packets.  However, in the authors' opinion we can't let
340	      the firewall stifle invention and evolution of the protocol.  It
341	      is also necessary to be aware that a change that will make most
342	      deep inspecting firewall consider the packet as not valid RTP/RTCP
343	      will have a more difficult deployment story.

345	   Support Monitors and Reporting Tools:  It is desirable that a third
346	      party monitor can still operate on the multiplexed RTP Sessions.
347	      It is however likely that they will require an update to correctly
348	      monitor and report on multiplexed RTP Sessions.

350	      Another type of function to consider is packet sniffers and their
351	      selector filters.  These can be impacted by a change of the
352	      fields.  An observation is that many such systems are usually
353	      quite rapidly updated to consider new types of standardized or
354	      simply common packet formats.

356	   Support Use of IP Multicast:  It is desirable that a solution can be
357	      used if RTP and RTCP packets are sent over multicast, both Any
358	      Source Multicast (ASM) and Single Source Multicast (SSM).  The
359	      reason for this requirement is to allow a system using RTP to use
360	      the same configuration regardless of the transport being done over
361	      unicast or multicast.  In addition, multicast can't be claimed to
362	      have an issue with using multiple ports, as each multicast group
363	      has a complete port space scoped by address.

365	   Support Incremental Deployment:  A good solution has the property
366	      that in topologies that contains RTP mixers or Translators, a
367	      single session participant can enable multiplexing without having
368	      any impact on any other session participants.  Thus a node ought
369	      to be able to take a multiplexed packet and then easily send it
370	      out with minimal or no modification on another leg of the session,
371	      where each RTP session is transported over its own lower-layer
372	      transport.  It also needs to be as easy to do the reverse
373	      forwarding operation.

375	5.  Design Considerations

377	   We propose a solution based around a shim layer, inserted between the
378	   transport layer headers and the RTP layer headers, to demultiplex
379	   separate RTP sessions.  The design rationale for using a shim layer
380	   header, as opposed to other demultiplexing points, is discussed in
381	   Appendix A.  In the following we discuss design considerations
382	   regarding placement and use of the shim layer header.

384	5.1.  Location of Multiplexing Shim Header

386	   A major question affecting the SHIM is the location of the SHIM
387	   header providing the Identifier of the session the packet relate to.
388	   This section will discuss in detail about the impact of making the
389	   different choices.

391	   Identified aspects to consider are:

393	   Possibility to Process:  A prefixed shim header, i.e.  between the
394	      transport protocol and the RTP/RTCP packet header has the
395	      advantage that any node on the network that likes to include the
396	      header in any per-packet processing can reach it.  Reasons for
397	      per-packet processing are:

399	      a.  Quality of Service classification

401	      b.  SHIM ingress or egress

403	      c.  Monitoring

405	      Many routers or similar devices can only read and process the
406	      first N bytes of the whole packet, where N is commonly on the
407	      order of 64-128 bytes.  Any other type of processing means putting
408	      the packet on the slow path.  Thus a prefixed solution enables
409	      this processing while a postfixed solution will most likely
410	      forever prevent this type of devices to process it.

412	   Legacy Processing:  RTP packets contain very few fixed bits and are
413	      difficult to distinguish using deep packet inspection without
414	      access to the signalling channel, or without keeping per-flow
415	      state to correlate changes in the (presumed) RTP headers across
416	      packets to gain confidence that the flow is of the expected type.
417	      Firewalls, application-level gateways, and other network entities
418	      that concern themselves with trying to track RTP flows will need
419	      to be updated.  This can create a barrier to deployment.  Using a
420	      postfix shim likely gives the least resistance for initial
421	      deployment.  However, even with a postfix shim, deployment can be
422	      hindered when multiple RTP sessions using the same SSRC values,
423	      since this will appear to give irregular behaviour of the fields
424	      for what the third party believes is one media stream, when it is
425	      actually several multiple streams.  The use of a prefixed shim
426	      will however maintain the long-term capabilities of such devices
427	      assuming they can be updated to include the SHIM header as part of
428	      the classification.

430	   Header Compression:  The different header compression techniques that
431	      has been developed compresses IP/UDP/RTP as complete combination.
432	      If one instead have a IP/UDP/SHIM/RTP then the compression for the
433	      full set might not work or poorly.  Instead only IP/UDP header
434	      compression is likely to be applied.  Thus a prefix will loose
435	      some compression efficiency until compression profiles for IP/UDP/
436	      SHIM/RTP has been developed, implemented and deployed.  Postfix
437	      don't have that issue, but nor can it ever gain anything from
438	      header compression which an prefixed solution could once an
439	      updated profile is deployed.  Postfix also will have reduced
440	      efficiency compressing sessions when the same SSRC is used in two
441	      different RTP sessions as the RTP header fields like sequence
442	      number, etc., will not behave as expected and need frequent
443	      explicit updates.

445	   The question of a prefixed or a postfixed shim header comes down to a
446	   trade-off between long term usability and deployment issues.  A
447	   prefixed shim offers a good long term possibility to adapt any
448	   network function that needs to take the shim header into account, but
449	   at the same time any function that tries to analyse packets might
450	   block the packets and hinder deployment.  A postfixed shim will
451	   likely have the best short-term deployment possibilities, but long
452	   term this choice will likely prevent many network nodes that like to
453	   be capable of separating the RTP sessions being multiplexed together
454	   from successfully doing that.  After discussion in the working group
455	   it has been determined that a prefixed shim is the preferred
456	   solution.

458	5.2.  ICE and DTLS-SRTP Integration

460	   When using ICE [RFC5245] or DTLS-SRTP [RFC5764] or both with RTP
461	   there exist the issue that RTP, STUN [RFC5389] and DTLS-SRTP are
462	   simultaneously in use over the same lower layer transport flow, like
463	   UDP.  This multiplexing is based on the value of the first byte of
464	   the lower layer transport payload as discussed in Section 5.1.2 of
465	   DTLS-SRTP [RFC5764].

467	   The replacement of a single RTP session with the multiple RTP
468	   sessions identified by a SHIM ought not be misidentified to be either
469	   STUN or DTLS-SRTP or any other protocol intending to take the
470	   available free code-points in the range 193-255 (Decimal).  Thus a
471	   prefixed SHIM needs to have its first byte have the two first bits
472	   set to 10 (Binary).  Having the SHIM share the identity of RTP is not
473	   an issue as there has to be mutual agreement that the SHIM is used
474	   instead of RTP.

476	5.3.  Signalling Fall Back
477	   Both SIP and WebRTC applications use SDP signalling to describe the
478	   RTP sessions and transport layer connections used in a call.  It is
479	   therefore necessary to consider how to signal multiple RTP sessions
480	   multiplexed onto a single lower layer transport within SDP.  It is
481	   also important to consider backwards compatibility with any legacy
482	   applications that do not understand any proposed SDP extension.

484	   An SDP session description is built up using media ("m=") lines
485	   describing media flows, with associated connection ("c=") lines
486	   describing the transport layer flows.  In the usual offer/answer use
487	   of SDP the communicating parties use a single c= line to represent
488	   the IP-layer path, with one m= line per type of media, running each
489	   type of media on a separate transport layer port, and hence a
490	   separate RTP session.  This gives a clean separation of RTP sessions,
491	   but requires multiple transport layer flows to be used, complicating
492	   NAT/firewall traversal.

494	   The SDP bundle extension [I-D.ietf-mmusic-sdp-bundle-negotiation]
495	   provides a way to signal that several m= lines are to be bundled
496	   together into a single RTP session running on a single transport
497	   layer port.  This is essentially the opposite semantic to the one we
498	   want: it combines seemingly disparate RTP sessions into one using a
499	   single transport layer flow, while we seek to use a single transport
500	   layer flow, but keep the sessions separate.  Accordingly, we do not
501	   re-use the bundle mechanism.

503	   We do, however, want to allow the case where an application would
504	   prefer to use separate RTP sessions multiplexed over a single lower
505	   layer transport, because that simplifies processing, but fall back to
506	   using the bundle mechanism if necessary.  Similarly, fall back to
507	   using separate RTP sessions on separate transport layer flows needs
508	   to be supported.

510	6.  Specification

512	   This section contains the specification of the RTP session
513	   multiplexing SHIM, using an explicit session identifier of the
514	   encapsulated payload.

516	6.1.  Shim Layer

518	   This solution is based on a shim layer that is inserted in the stack
519	   between the RTP and RTCP packets and the transport layer being used
520	   by the RTP sessions.  Thus the layering is as shown in Figure 1.

522	                         +-------------------------+
523	                         |    RTP / RTCP Packet    |
524	                         +-------------------------+
525	                         |     Session ID Layer    |
526	                         +-------------------------+
527	                         | Transport Layer Header  |
528	                         +-------------------------+
529	                         |  Network Layer Header   |
530	                         +-------------------------+

532	              Figure 1: Stack view with session ID layer shim

534	   The above stack is in fact a layered one as it does allow multiple
535	   RTP Sessions to be multiplexed on top of the Session ID shim layer.
536	   This enables the example presented in Figure 2 where four sessions,
537	   S1-S4, are sent over the same Transport layer, and where the Session
538	   ID layer will combine and encapsulate them with the session ID on
539	   transmission and separate and decapsulate them on reception.

541	                         +-------------------------+
542	                         | S1  |  S2  |  S3  |  S4 |
543	                         +-------------------------+
544	                         |     Session ID Layer    |
545	                         +-------------------------+
546	                         | Transport Layer Header  |
547	                         +-------------------------+
548	                         |  Network Layer Header   |
549	                         +-------------------------+

551	    Figure 2: Example with four RTP sessions on top of session ID layer

553	   The Session ID layer encapsulates one RTP or RTCP packet from a given
554	   RTP session and prefixes a 4-octet Session ID layer shim header to
555	   the packet.  The Session ID layer shim header is depicted in Figure 3
556	   and comprises a 2 bit fixed header (10b), 14 reserved bits, and a 16
557	   bits unsigned integer field with the Session ID (SID) value.

559	       0                   1                   2                   3
560	       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
561	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
562	      |1 0|         reserved          |       Session ID (SID)        |
563	      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

565	                  Figure 3: Session ID layer shim header

567	   Each RTP session being multiplexed on top of a given transport layer
568	   is assigned either a single or a pair of unique SID in the range
569	   0-65535.  The reason for assigning a pair of SIDs to a given RTP
570	   session are for RTP Sessions that doesn't support "Multiplexing RTP
571	   Data and Control Packets on a Single Port" [RFC5761] to still be able
572	   to use a single 5-tuple.  The reasons for supporting this extra
573	   functionality is that RTP and RTCP multiplexing based on the payload
574	   type/packet type fields enforces certain restrictions on the RTP
575	   sessions.  These restrictions might not be acceptable.  As this
576	   solution does not have these restrictions, performing RTP and RTCP
577	   multiplexing in this way has benefits.

579	   Each Session ID value space is scoped by the underlying transport
580	   protocol.  Common transport protocols like UDP [RFC0768], DCCP
581	   [RFC4340], TCP [RFC0793], and SCTP [RFC4960] can all be scoped by one
582	   or more 5-tuple (Transport protocol, source address and port,
583	   destination address and port).  The case of multiple 5-tuples occur
584	   in the case of multi-unicast topologies, also called meshed
585	   multiparty RTP sessions or in case any application would need more
586	   than 32768 RTP sessions.

588	      0                   1                   2                   3
589	      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
590	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
591	     |1 0|         reserved          |       Session ID (SID)        |
592	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+
593	     |V=2|P|X|  CC   |M|     PT      |       sequence number         | |
594	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
595	     |                           timestamp                           | |
596	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
597	     |           synchronization source (SSRC) identifier            | |
598	     +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ |
599	     |            contributing source (CSRC) identifiers             | |
600	     |                               ....                            | |
601	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
602	     |                   RTP extension (OPTIONAL)                    | |
603	   +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
604	   | |                          payload  ...                         | |
605	   | |                               +-------------------------------+ |
606	   | |                               | RTP padding   | RTP pad count | |
607	   +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+
608	   | ~                     SRTP MKI (OPTIONAL)                       ~ |
609	   | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
610	   | :                 authentication tag (RECOMMENDED)              : |
611	   | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
612	   +- Encrypted Portion*                      Authenticated Portion ---+

614	          Figure 4: SRTP Packet encapsulated by Session ID Layer

616	      0                   1                   2                   3
617	      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
618	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
619	     |1 0|         reserved          |       Session ID (SID)        |
620	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+
621	     |V=2|P|    RC   |   PT=SR or RR |               length          | |
622	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
623	     |                         SSRC of sender                        | |
624	   +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ |
625	   | ~                          sender info                          ~ |
626	   | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
627	   | ~                         report block 1                        ~ |
628	   | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
629	   | ~                         report block 2                        ~ |
630	   | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
631	   | ~                              ...                              ~ |
632	   | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
633	   | |V=2|P|    SC   |  PT=SDES=202  |             length            | |
634	   | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ |
635	   | |                          SSRC/CSRC_1                          | |
636	   | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
637	   | ~                           SDES items                          ~ |
638	   | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ |
639	   | ~                              ...                              ~ |
640	   +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ |
641	   | |E|                         SRTCP index                         | |
642	   | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+
643	   | ~                     SRTCP MKI (OPTIONAL)                      ~ |
644	   | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
645	   | :                     authentication tag                        : |
646	   | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
647	   +-- Encrypted Portion                    Authenticated Portion -----+

649	          Figure 5: SRTCP packet encapsulated by Session ID layer

651	   The processing in a receiver when the Session ID layer is present
652	   will be to

654	   1.  Pick up the packet from the lower layer transport

656	   2.  Inspect the SID field value

658	   3.  Strip the SID field from the packet

660	   4.  Forward it to the (S)RTP Session context identified by the SID
661	       value

663	6.2.  Signalling

665	   There are several aspects to negotiating the use of multiple RTP
666	   sessions multiplexing onto a single transport layer flow within SDP.
667	   Firstly, the SDP offer needs to indicate the desire the use the shim-
668	   based multiplexing scheme and suggest a transport layer port for the
669	   multiplex.  Then, if the answering party agrees to use the shim, they
670	   need to agree on the transport layer port to use, and assign session
671	   ID values for the individual RTP sessions.  This all needs to be done
672	   in a manner that allows graceful fall back to separate RTP sessions,
673	   or a single bundled RTP session.

675	   This section defines how to negotiate the use of the Session ID shim
676	   layer, using the SDP [RFC4566] offer/answer model [RFC3264].  A new
677	   SDP grouping semantics is defined, "SHIM", along with a new media
678	   type to represent the shim layer.  The grouping semantics allow each
679	   media description ("m=" line) associated with a 'SHIM' group to be
680	   identified, and associated with the multiplexed transport flow.

682	   When it is desired to use multiple RTP sessions multiplexed over a
683	   single lower layer transport flow, the SDP offer will contain one
684	   "m=" line for each RTP session, plus one additional "m=" line
685	   representing the transport layer flow to be used for the multiplex.
686	   The "m=" lines that represent the media will flows be created as-if
687	   the multiplex was not present, with transport layer ports assigned in
688	   the usual manner.  The "m=" line representing the multiplex will also
689	   have a transport layer port assigned, and will use the "application/
690	   rtp-shim" media type running over UDP (i.e., it will be signalled as
691	   "m=application <port> udp rtp-shim" in the SDP).  All the "m=" lines
692	   representing the media flows and the multiplexing shim will be part
693	   of an SDP group, with "SHIM" semantics.

695	   There MUST be exactly one "m=" line representing an RTP multiplex in
696	   each "SHIM" group in the SDP offer.  If the offer contains more than
697	   one "m=" line representing an RTP multiplex in a single "SHIM" group,
698	   then the answering party MUST reject all the RTP multiplexes in that
699	   "SHIM" group.  A "SHIM" group that does not include any "m=" line
700	   representing an RTP multiplex is malformed; the answering party MUST
701	   reject all "m=" lines in that "SHIM" group.

703	   If the answering party does not understand, or does not want to use,
704	   the RTP multiplexing shim, it will reject the "m=" line for the flow
705	   representing the multiplex.  This is be done by setting the port for
706	   that "m=" line to zero in the answer.  The endpoints will then fall
707	   back to using separate RTP sessions for each "m=" line, with separate
708	   transport layer flows for each on the assigned ports.

710	   If the answering party chooses to use the multiplexing shim, it will
711	   return an answer that includes a valid port for the multiplex.  The
712	   ports for the other media lines in the SHIM group that the answering
713	   party wants to accept MUST be set to port 9 (the discard port) to
714	   indicate that the media for those ports is to be sent as part of the
715	   multiplex (the intuition is that the separate port is discarded, and
716	   only the multiplex remains).  Ports for some "m=" lines in the SHIM
717	   group MAY be set to zero to reject some or all of the flows in the
718	   group.

720	      (tbd: it is an open issue whether the answering party is allowed
721	      to accept some "m=" lines from the SHIM group into the multiplex
722	      while sending others as separate flows on their own ports)

724	   If the multiplex was accepted, multiplexed media corresponding to the
725	   "m=" lines whose port was set to 9 in the answer will start to flow.
726	   This multiplexed media MUST use the shim on the transport layer ports
727	   corresponding to the "m=" line of the multiplexing shim.  The session
728	   identifiers used in the shim MUST match the ports that were included
729	   in the "m=" lines in the offer.  The transport layer ports included
730	   in those "m=" lines MUST NOT be used for media, and the offering
731	   party SHOULD issue a follow-up offer closing down the "m=" lines used
732	   for those ports (i.e., setting the ports in their "m=" line to 9) and
733	   keeping just the multiplex.

735	      (tbd: an alternative would be for the answer to reject all except
736	      the multiplex stream by setting their ports to zero, but include
737	      an attribute for each rejected "m=" line to indicate that if it is
738	      to form part of the multiplex.  This can perhaps be expected to
739	      work better with middleboxes, but is a more significant change to
740	      offer/answer processing at the endpoints.)

742	6.3.  SRTP Key Management

744	   Key management for SRTP do needs discussion as we do cause multiple
745	   SRTP sessions to exist on the same underlying transport flow.  Thus
746	   we need to ensure that the key management mechanism still are
747	   properly associated with the SRTP session context it intends to key.
748	   To ensure that we do look at the three SRTP key management mechanism
749	   that IETF has specified, one after another.

751	6.3.1.  Security Description

753	   Session Description Protocol (SDP) Security Descriptions for Media
754	   Streams [RFC4568] as being based on SDP has no issue with the RTP
755	   session multiplexing on lower layer specified here.  The reason is
756	   that the actual keying is done using a media level SDP attribute.
757	   Thus the attribute is already associated with a particular media
758	   description.  A media description that also will have an instance of
759	   the "a=session-mux-id" attribute carrying the SID value/pair used
760	   with this particular crypto parameters.

762	6.3.2.  DTLS-SRTP

764	   Datagram Transport Layer Security (DTLS) Extension to Establish Keys
765	   for the Secure Real-time Transport Protocol (SRTP) [RFC5764] is a
766	   keying mechanism that works on the media plane on the same lower
767	   layer transport that SRTP/SRTCP will be transported over.

769	   The most direct solution would be to use the SHIM and the SID context
770	   identifier to be applied also on DTLS packets.  Thus using the same
771	   SID that is used with RTP and/or RTCP also for the DTLS message
772	   intended to key that particular SRTP and/or SRTCP flow(s).  This of
773	   course requires independent usage of DTLS-SRTP for each RTP session.
774	   In addition it requires changing the layering for DTLS-SRTP as well
775	   as RTP.  Thus this behaviour doesn't gain you anything in regards to
776	   key-management when using SHIM and have some costs.

778	   Instead we propose that an DTLS-SRTP key-derivation change is
779	   introduced.  By including the Session ID value in the derivation of
780	   the keying material a single DTLS-SRTP key-management operation could
781	   apply keys and parameters for all the RTP sessions in the same
782	   transport flow.  Thus the keying cost is significantly reduced,
783	   especially in regards to network communication and delay impact and
784	   vulnerability to packet loss.

786	   Details to be written up.

788	6.3.3.  MIKEY

790	   MIKEY: Multimedia Internet KEYing [RFC3830] is a key management
791	   protocol that has several transports.  In some cases it is used
792	   directly on a transport protocol such as UDP, but there is also a
793	   specification for how MIKEY is used with SDP "Key Management
794	   Extensions for Session Description Protocol (SDP) and Real Time
795	   Streaming Protocol (RTSP)" [RFC4567].

797	   Lets start with the later, i.e.  the SDP transport, which shares the
798	   properties with Security Description in that is can be associated
799	   with a particular media description in a SDP.  As long as one avoids
800	   using the session level attribute one can be certain to correctly
801	   associate the key exchange with a given SRTP/SRTCP context.

803	   It does appear that MIKEY directly over a lower layer transport
804	   protocol will have similar issues as DTLS.

806	6.4.  Examples

808	6.4.1.  Secure RTP Packet with Multiplexing Shim

810	   The figure below contains an example Secure RTP packet with the RTP
811	   multiplexing shim header, encapsulated by a UDP packet.  The RTP
812	   multiplexing shim immediately follows the UDP header, and is followed
813	   by the encapsulated secure RTP packet.  The Secure RTP authentication
814	   tag protects the RTP packet only; it does not authenticate the RTP
815	   multiplexing shim or the UDP headers.

817	      0                   1                   2                   3
818	      0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
819	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
820	     | Source Port                   | Destination Port              | U
821	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ D
822	     | Length                        | Checksum                      | P
823	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
824	     |1 0|         reserved          |       Session ID (SID)        |
825	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+
826	     |V=2|P|X|  CC   |M|     PT      |       sequence number         | |
827	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
828	     |                           timestamp                           | |
829	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
830	     |           synchronization source (SSRC) identifier            | |
831	     +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ |
832	     |            contributing source (CSRC) identifiers             | |
833	     |                               ....                            | |
834	     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
835	     |                   RTP extension (OPTIONAL)                    | |
836	   +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
837	   | |                          payload  ...                         | |
838	   | |                               +-------------------------------+ |
839	   | |                               | RTP padding   | RTP pad count | |
840	   +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+
841	   | ~                     SRTP MKI (OPTIONAL)                       ~ |
842	   | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
843	   | :                 authentication tag (RECOMMENDED)              : |
844	   | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
845	   +- Encrypted Portion*                      Authenticated Portion ---+

847	               SRTP Packet Encapsulated by Session ID Layer

849	6.4.2.  Basic RTP Multiplex Negotiation in SDP

851	   This section contains SDP offer/answer examples.  In the below SDP
852	   offer, one audio and one video is being offered.  The audio is using
853	   session identifier 10000, and the video is using session identifier
854	   10002.  If the answer were to reject the "m=application...rtp-shim"
855	   line, then separate RTP sessions would be set up for the audio and
856	   video on ports 10000 and 10002 respectively.

858	   v=0
859	   o=alice 2890844526 2890844526 IN IP4 atlanta.example.com
860	   s=
861	   c=IN IP4 atlanta.example.com
862	   t=0 0
863	   a=group:SHIM foo bar baz
864	   m=audio 10000 RTP/AVP 0 8 97
865	   b=AS:200
866	   a=mid:foo
867	   a=rtpmap:0 PCMU/8000
868	   a=rtpmap:8 PCMA/8000
869	   a=rtpmap:97 iLBC/8000
870	   m=video 10002 RTP/AVP 31 32
871	   b=AS:1000
872	   a=mid:bar
873	   a=rtpmap:31 H261/90000
874	   a=rtpmap:32 MPV/90000
875	   m=application 10004 udp rtp-shim
876	   a=mid:baz

878	   The SDP answer from an end-point that supports the RTP multiplexing
879	   shim follows.  Note that the ports on the audio and video lines are
880	   set to 9, to indicate that these flows are included in the multiplex.
881	   The port of the m= line corresponding to the multiplex is set to the
882	   transport port used for the multiplex.

884	   v=0
885	   o=bob 2808844564 2808844564 IN IP4 biloxi.example.com
886	   s=
887	   c=IN IP4 biloxi.example.com
888	   t=0 0
889	   a=group:SHIM foo bar baz
890	   m=audio 9 RTP/AVP 0
891	   b=AS:200
892	   a=mid:foo
893	   a=rtpmap:0 PCMU/8000
894	   m=video 9 RTP/AVP 32
895	   b=AS:1000
896	   a=mid:bar
897	   a=rtpmap:32 MPV/90000
898	   m=application 10004 udp rtp-shim
899	   a=mid:baz

901	   The SDP answer from an end-point that does not support this SHIM.
902	   The ports for the audio and video lines are kept, and the port is set
903	   to 0 in the "m=" line corresponding to the multiplex.

905	   v=0
906	   o=bob 2808844564 2808844564 IN IP4 biloxi.example.com
907	   s=
908	   c=IN IP4 biloxi.example.com
909	   t=0 0
910	   a=group:SHIM foo bar baz
911	   m=audio 10000 RTP/AVP 0
912	   b=AS:200
913	   a=mid:foo
914	   a=rtpmap:0 PCMU/8000
915	   m=video 10002 RTP/AVP 32
916	   b=AS:1000
917	   a=mid:bar
918	   a=rtpmap:32 MPV/90000
919	   m=application 0 udp rtp-shim
920	   a=mid:baz

922	6.4.3.  Advanced RTP Multiplex Negotiation in SDP

924	   (tbd: add more examples)

926	7.  Open Issues

928	   This work is still at a relatively early phase.  This section
929	   contains a list of open issues where the author desires some input.

931	   1.  In Section 6.2 there is a discussion of which parameters that
932	       need to be configured.  The scope of these rules and if they do
933	       make sense needs additional discussion.

935	   2.  Can we provide better control so that applications that doesn't
936	       desire fall back to single RTP session when Multiplexing shim
937	       fails to be supported but Bundle is supported ends up with a
938	       better alternative?

940	   3.  The details for how to do key-derivation, preferably in such a
941	       way that it can be reused by multiple key-management solutions
942	       like MIKEY and DTLS-SRTP

944	   4.  The signalling solution will be revisited when the BUNDLE
945	       solution discussion has yield some result.

947	8.  IANA Considerations

949	   (tbd: register the application/rtp-shim media type)

951	   (tbd: register the "SHIM" semantics for the RTP grouping framework

953	9.  Security Considerations

955	   The security properties of the Session ID layer is depending on what
956	   mechanism is used to protect the RTP and RTCP packets of a given RTP
957	   session.  If IPsec or transport layer security solutions such as DTLS
958	   or TLS are being used then both the encapsulated RTP/RTCP packets and
959	   the session ID layer will be protected by that security mechanism.
960	   Thus potentially providing both confidentiality, integrity and source
961	   authentication.  If SRTP is used, the session ID layer will not be
962	   directly protected by SRTP.  However, it will be implicitly integrity
963	   protected (assuming the RTP/RTCP packet is integrity protected) as
964	   the only function of the field is to identify the session context.
965	   Thus any modification of the SID field will attempt to retrieve the
966	   wrong SRTP crypto context.  If that retrieval fails, the packet will
967	   be anyway be discarded.  If it is successful, the context will not
968	   lead to successful verification of the packet.

970	10.  Acknowledgements

972	   This memo is based on the input from various people, especially in
973	   the context of the RTCWEB discussion of how to use only a single
974	   lower layer transport.  The RTP and RTCP packet figures are borrowed
975	   from RFC3711.  The SDP example is extended from the one present in
976	   [I-D.ietf-mmusic-sdp-bundle-negotiation].  Eric Rescorla contributed
977	   the basic idea of optimizing the DTLS-SRTP key-management by
978	   modifying the key derivation process.

980	   The proposal in Appendix A.5 is original suggested by Colin Perkins.
981	   The idea in Appendix A.6 is from an Internet Draft
982	   [I-D.rosenberg-rtcweb-rtpmux] written by Jonathan Rosenberg et.  al.
983	   The proposal in Appendix A.3 is a result of discussion by a group of
984	   people at IETF meeting #81 in Quebec.

986	11.  References
987	11.1.  Normative References

989	   [I-D.ietf-mmusic-sdp-bundle-negotiation]
990	              Holmberg, C., Alvestrand, H., and C. Jennings,
991	              "Multiplexing Negotiation Using Session Description
992	              Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp-
993	              bundle-negotiation-05 (work in progress), October 2013.

995	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
996	              Requirement Levels", BCP 14, RFC 2119, March 1997.

998	   [RFC3264]  Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model
999	              with Session Description Protocol (SDP)", RFC 3264, June
1000	              2002.

1002	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1003	              Jacobson, "RTP: A Transport Protocol for Real-Time
1004	              Applications", STD 64, RFC 3550, July 2003.

1006	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
1007	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
1008	              RFC 3711, March 2004.

1010	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
1011	              Description Protocol", RFC 4566, July 2006.

1013	   [RFC5234]  Crocker, D. and P. Overell, "Augmented BNF for Syntax
1014	              Specifications: ABNF", STD 68, RFC 5234, January 2008.

1016	11.2.  Informational References

1018	   [I-D.ietf-avtcore-multi-media-rtp-session]
1019	              Westerlund, M., Perkins, C., and J. Lennox, "Sending
1020	              Multiple Types of Media in a Single RTP Session", draft-
1021	              ietf-avtcore-multi-media-rtp-session-03 (work in
1022	              progress), July 2013.

1024	   [I-D.ietf-avtcore-multiplex-guidelines]
1025	              Westerlund, M., Perkins, C., and H. Alvestrand,
1026	              "Guidelines for using the Multiplexing Features of RTP to
1027	              Support Multiple Media Streams", draft-ietf-avtcore-
1028	              multiplex-guidelines-01 (work in progress), July 2013.

1030	   [I-D.lennox-rtcweb-rtp-media-type-mux]
1031	              Rosenberg, J. and J. Lennox, "Multiplexing Multiple Media
1032	              Types In a Single Real-Time Transport Protocol (RTP)
1033	              Session", draft-lennox-rtcweb-rtp-media-type-mux-00 (work
1034	              in progress), October 2011.

1036	   [I-D.rosenberg-rtcweb-rtpmux]
1037	              Rosenberg, J., Jennings, C., Peterson, J., Kaufman, M.,
1038	              Rescorla, E., and T. Terriberry, "Multiplexing of Real-
1039	              Time Transport Protocol (RTP) Traffic for Browser based
1040	              Real-Time Communications (RTC)", draft-rosenberg-rtcweb-
1041	              rtpmux-00 (work in progress), July 2011.

1043	   [RFC0768]  Postel, J., "User Datagram Protocol", STD 6, RFC 768,
1044	              August 1980.

1046	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7, RFC
1047	              793, September 1981.

1049	   [RFC3830]  Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K.
1050	              Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830,
1051	              August 2004.

1053	   [RFC4340]  Kohler, E., Handley, M., and S. Floyd, "Datagram
1054	              Congestion Control Protocol (DCCP)", RFC 4340, March 2006.

1056	   [RFC4567]  Arkko, J., Lindholm, F., Naslund, M., Norrman, K., and E.
1057	              Carrara, "Key Management Extensions for Session
1058	              Description Protocol (SDP) and Real Time Streaming
1059	              Protocol (RTSP)", RFC 4567, July 2006.

1061	   [RFC4568]  Andreasen, F., Baugher, M., and D. Wing, "Session
1062	              Description Protocol (SDP) Security Descriptions for Media
1063	              Streams", RFC 4568, July 2006.

1065	   [RFC4588]  Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
1066	              Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
1067	              July 2006.

1069	   [RFC4960]  Stewart, R., "Stream Control Transmission Protocol", RFC
1070	              4960, September 2007.

1072	   [RFC5109]  Li, A., "RTP Payload Format for Generic Forward Error
1073	              Correction", RFC 5109, December 2007.

1075	   [RFC5245]  Rosenberg, J., "Interactive Connectivity Establishment
1076	              (ICE): A Protocol for Network Address Translator (NAT)
1077	              Traversal for Offer/Answer Protocols", RFC 5245, April
1078	              2010.

1080	   [RFC5285]  Singer, D. and H. Desineni, "A General Mechanism for RTP
1081	              Header Extensions", RFC 5285, July 2008.

1083	   [RFC5389]  Rosenberg, J., Mahy, R., Matthews, P., and D. Wing,
1084	              "Session Traversal Utilities for NAT (STUN)", RFC 5389,
1085	              October 2008.

1087	   [RFC5506]  Johansson, I. and M. Westerlund, "Support for Reduced-Size
1088	              Real-Time Transport Control Protocol (RTCP): Opportunities
1089	              and Consequences", RFC 5506, April 2009.

1091	   [RFC5761]  Perkins, C. and M. Westerlund, "Multiplexing RTP Data and
1092	              Control Packets on a Single Port", RFC 5761, April 2010.

1094	   [RFC5764]  McGrew, D. and E. Rescorla, "Datagram Transport Layer
1095	              Security (DTLS) Extension to Establish Keys for the Secure
1096	              Real-time Transport Protocol (SRTP)", RFC 5764, May 2010.

1098	Appendix A.  Possible Solutions

1100	   This section documents the solutions explored when selecting a SHIM
1101	   based one and discusses their feasibility.

1103	A.1.  Header Extension

1105	   One proposal is to define an RTP header extension [RFC5285] that
1106	   explicitly enumerates the session identifier in each packet.  This
1107	   proposal has some merits regarding RTP, since it uses an existing
1108	   extension mechanism; it explicitly enumerates the session allowing
1109	   for third parties to associate the packet to a given RTP session; and
1110	   it works with SRTP as currently defined since a header extension is
1111	   by default not encrypted, and is thus readable by the receiving stack
1112	   without needing to guess which session it belongs to and attempt to
1113	   decrypt it.  This approach does, however, conflict with the
1114	   requirement from [RFC5285] that "header extensions using this
1115	   specification MUST only be used for data that can be safely ignored
1116	   by the recipient", since correct processing of the received packet
1117	   depends on using the header extension to demultiplex it to the
1118	   correct RTP session.

1120	   Using a header extension also result in the session ID is in the
1121	   integrity protected part of the packet.  Thus a translator between
1122	   multiplexed and non-multiplexed has the options:

1124	   1.  to be part of the security context to verify the field

1126	   2.  to be part of the security context to verify the field and remove
1127	       it before forwarding the packet

1129	   3.  to be outside of the security context and leave the header
1130	       extension in the packet.  However, that requires successful
1131	       negotiation of the header extension, but not of the
1132	       functionality, with the receiving end-points.

1134	   The biggest existing hurdle for this solution is that there exist no
1135	   header extension field in the RTCP packets.  This requires defining a
1136	   solution for RTCP that allows carrying the explicit indicator,
1137	   preferably in a position that isn't encrypted by SRTCP.  However, the
1138	   current SRTCP definition does not offer such a position in the
1139	   packet.

1141	   Modifying the RR or SR packets is possible using profile specific
1142	   extensions.  However, that has issues when it comes to deployment and
1143	   in addition any information placed there would end up in the
1144	   encrypted part.

1146	   Another alternative could be to define another RTCP packet type that
1147	   only contains the common header, using the 5 bits in the first byte
1148	   of the common header to carry a session id.  That would allow SRTCP
1149	   to work correctly as long it accepts this new packet type being the
1150	   first in the packet.  Allowing a non-SR/RR packet as the first packet
1151	   in a compound RTCP packet is also needed if an implementation is to
1152	   support Reduced Size RTCP packets [RFC5506].  The remaining downside
1153	   with this is that all stack implementations supporting multiplexing
1154	   would need to modify its RTCP compound packet rules to include this
1155	   packet type first.  Thus a translator box between supporting nodes
1156	   and non-supporting nodes needs to be in the crypto context.

1158	   This solution's per packet overhead is expected to be 64-bits for
1159	   RTCP.  For RTP it is 64-bits if no header extension was otherwise
1160	   used, and an additional 16 bits (short header), or 24 bits plus (if
1161	   needed) padding to next 32-bits boundary if other header extensions
1162	   are used.

1164	A.2.  Multiplexing Shim

1166	   This proposal is to prefix or postfix all RTP and RTCP packets with a
1167	   session ID field.  This field would be outside of the normal RTP and
1168	   RTCP packets, thus having no impact on the RTP and RTCP packets and
1169	   their processing.  An additional step of demultiplexing processing
1170	   would be added prior to RTP stack processing to determine in which
1171	   RTP session context the packet is to be included.  This has also no
1172	   impact on SRTP/SRTCP as the shim layer would be outside of its
1173	   protection context.  The shim layer's session ID is however
1174	   implicitly integrity protected as any error in the field will result
1175	   in the packet being placed in the wrong or non-existing context, thus
1176	   resulting in a integrity failure if processed by SRTP/SRTCP.

1178	   This proposal is quite simple to implement in any gateway or
1179	   translating device that goes from a multiplexed to a non-multiplexed
1180	   domain or vice versa, as only an additional field needs to be added
1181	   to or removed from the packet.

1183	   The main downside of this proposal is that it is very likely to
1184	   trigger a firewall response from any deep packet inspection device.
1185	   If the field is prefixed, the RTP fields are not matching the
1186	   heuristics field (unless the shim is designed to look like an RTP
1187	   header, in which case the payload length is unlikely to match the
1188	   expected value) and thus are likely preventing classification of the
1189	   packet as an RTP packet.  If it is postfixed, it is likely classified
1190	   as an RTP packet but might not correctly validate if the content
1191	   validation is such that the payload length is expected to match
1192	   certain values.  It is expected that a postfixed shim will be less
1193	   problematic than a prefixed shim in this regard, but we are lacking
1194	   hard data on this.

1196	   This solution's per packet overhead is 1 byte.

1198	A.3.  Single Session

1200	   Given the difficulty of multiplexing several RTP sessions onto a
1201	   single lower-layer transport, it's tempting to send multiple media
1202	   streams in a single RTP session.  Doing this avoids the need to de-
1203	   multiplex several sessions on a single transport, but at the cost of
1204	   losing the RTP session as a separator for different type of streams.
1205	   Lacking different RTP sessions to demultiplex incoming packets, a
1206	   receiver will have to dig deeper into the packet before determining
1207	   what to do with it.  Care has to be taken in that inspection.  For
1208	   example, it is important to be careful to ensure that each real media
1209	   source uses its own SSRC in the session and that this SSRC doesn't
1210	   change media type.

1212	   The loss of the RTP session as a separator for different usages or
1213	   purpose would be an minor issue if the only difference between the
1214	   RTP sessions is the media type.  In this case, the application could
1215	   use the Payload Type field to identify the media type.  The loss of
1216	   the RTP Session functionality is however severe, if the application
1217	   uses the RTP Session for separating different treatments, contexts
1218	   etc.  Then you would need additional signalling to bind the different
1219	   sources to groups which can help make the necessary distinctions.

1221	   However, the loss of the RTP session as separator is not the only
1222	   issue with this approach.  The RTP Multiplexing Architecture
1223	   [I-D.ietf-avtcore-multiplex-guidelines] discusses a number of issues
1224	   in Section 6.7.  These include RTCP bandwidth differences,
1225	   limitations in the number of payload types, media aware RTP mixers
1226	   and interactions with Legacy end-points.

1228	   Additional attention needs to be placed on this important aspect.  In
1229	   multi-party situations using central nodes there exist some
1230	   difficulties in having a legacy implementation using multiple RTP
1231	   sessions interworking with an end-point having only a single RTP
1232	   session across the central node.  The main reason is the fact that
1233	   the one using single session with multiple media types has only one
1234	   SSRC space, while the other end-points have multiple spaces.  Thus
1235	   translation might have to occur because there is several RTP sessions
1236	   using the same SSRC value.  This has both limitations, processing
1237	   overhead and the possibility of becoming an deployment obstacle for
1238	   new RTP/RTCP extensions.

1240	   This approach has been proposed in the RTCWeb context in
1241	   [I-D.lennox-rtcweb-rtp-media-type-mux] and
1242	   [I-D.ietf-mmusic-sdp-bundle-negotiation].  These drafts describe how
1243	   to signal multiple media streams multiplexed into a single RTP
1244	   session, and address some of the issues raised here and in
1245	   Section 6.7 of the RTP Multiplexing Architecture
1246	   [I-D.ietf-avtcore-multiplex-guidelines] draft.

1248	   This method has several limitations that limits its usage as solution
1249	   in providing multiple RTP sessions on the same lower layer transport.
1250	   However, we acknowledge that there are some uses for which this
1251	   method can be sufficient and which can accept the methods limitations
1252	   and downsides.  The RTCWEB WG has a working assumption to support
1253	   this method.  For more details of this method, see the relevant
1254	   drafts under development.  We do include this method in the
1255	   comparison to provide a more complete picture of the pro and cons of
1256	   this method.

1258	   This solution has no per packet overhead.  The signalling overhead
1259	   will be a different question.

1261	A.4.  Use the SRTP MKI field

1263	   This proposal is to overload the MKI SRTP/SRTCP identifier to not
1264	   only identify a particular crypto context, but also identify the
1265	   actual RTP Session.  This clearly is a miss use of the MKI field,
1266	   however it appears to be with little negative implications.  SRTP
1267	   already supports handling of multiple crypto contexts.

1269	   The two major downsides with this proposal is first the fact that it
1270	   requires using SRTP/SRTCP to multiplex multiple sessions on a single
1271	   lower layer transport.  The second issue is that the session ID
1272	   parameter needs to be put into the various key-management schemes and
1273	   to make them understand that the reason to establish multiple crypto
1274	   contexts is because they are connected to various RTP Sessions.
1275	   Considering that SRTP have at least 3 used keying mechanisms, DTLS-
1276	   SRTP [RFC5764], Security Descriptions [RFC4568], and MIKEY [RFC3830],
1277	   this is not an insignificant amount of work.

1279	   This solution has 32-bit per packet overhead, but only if the MKI was
1280	   not already used.

1282	A.5.  Use an Octet in the Padding

1284	   The basics of this proposal is to have the RTP packet and the last
1285	   (mandated by RFC3550) RTCP packet in a compound to include padding,
1286	   at least 2 bytes.  One byte for the padding count (last byte) and one
1287	   byte just before the padding count containing the session ID.

1289	   This proposal uses bytes to carry the session ID that have no defined
1290	   value and is intended to be ignored by the receiver.  From that
1291	   perspective it only causes packet expansion that is supported and
1292	   handled by all existing equipment.  If an implementation fails to
1293	   understand that it is needs to interpret this padding byte to learn
1294	   the session ID, it will see a mostly coherent RTP session except
1295	   where SSRCs overlap or where the payload types overlap.  However,
1296	   reporting on the individual sources or forwarding the RTCP RR are not
1297	   completely without merit.

1299	   There is one downside of this proposal and that has to do with SRTP.
1300	   To be able to determine the crypto context, it is necessary to access
1301	   to the encrypted payload of the packet.  Thus, the only mechanism
1302	   available for a receiver to solve this issue is to try the existing
1303	   crypto contexts for any session on the same lower layer transport and
1304	   then use the one where the packet decrypts and verifies correctly.
1305	   Thus for transport flows with many crypto contexts, an attacker could
1306	   simply generate packets that don't validate to force the receiver to
1307	   try all crypto contexts they have rather than immediately discard it
1308	   as not matching a context.  A receiver can mitigate this somewhat by
1309	   using heuristics based on the RTP header fields to determine which
1310	   context applies for a received packet, but this is not a complete
1311	   solution.

1313	   This solution has a 16-bit per packet overhead.

1315	A.6.  Redefine the SSRC field
1316	   The Rosenberg et.  al.  Internet draft "Multiplexing of Real-Time
1317	   Transport Protocol (RTP) Traffic for Browser based Real-Time
1318	   Communications (RTC)" [I-D.rosenberg-rtcweb-rtpmux] proposed to
1319	   redefine the SSRC field.  This has the advantage of no packet
1320	   expansion.  It also looks like regular RTP.  However, it has a number
1321	   of implications.  First of all it prevents any RTP functionality that
1322	   require the same SSRC in multiple RTP sessions.

1324	   Secondly its interoperability with end-point using multiple RTP
1325	   sessions are problematic.  Such interoperability will requires an
1326	   SSRC translator function in the gateway node to ensure that the SSRCs
1327	   fulfil the semantic rules of the different domains.  That translator
1328	   is actually far from easy as it needs to understand the semantics of
1329	   all RTP and RTCP extensions that include SSRC/CSRC.  This as it is
1330	   necessary to know when a particular matching 32-bit pattern is an
1331	   SSRC field and when the field is just a combination of other fields
1332	   that create the same matching 32-bit pattern.  Thus there is a
1333	   possibility that such a translator becomes a obstacle in deploying
1334	   future RTP/RTCP extensions.  In addition the translator actually have
1335	   significant overhead when SRTP are in use.  This as a verification
1336	   that the packet is authentic, decryption, SSRC translation,
1337	   encryption and finally generation of authentication tags are needed.
1338	   In addition the translator has to be part of the security context.

1340	   This solution has no per packet overhead.

1342	Appendix B.  Comparison

1344	   This section compares the above potential solutions with the
1345	   requirements.  Motivations are provided in addition to a high level
1346	   metric of successfully, partially and failing to meet requirement.
1347	   In the end a summary table (Figure 6) of the high level value are
1348	   provided.

1350	B.1.  Support of Multiple RTP Sessions Over Single Transport

1352	   This one is easy to determine.  Only the single session proposal
1353	   fails this requirement as it is not at all designed to meet it.  The
1354	   rest fully support this requirement.

1356	B.2.  Enable Same SSRC Value in Multiple RTP Sessions

1358	   Based on the discussion in Section 4 two sub-requirements have been
1359	   derived.

1361	B.2.1.  Avoid SSRC Translation in Gateways/Translation
1362	   This sub-requirement is derived based on the desire to avoid having
1363	   gateways or translators perform full SSRC translation to minimize
1364	   complexity, avoid the requirement to have gateways in security
1365	   context, and as a hinder to long-term evolution.  Two of the
1366	   proposals have issues with this, due to their lack of support for
1367	   multiple 32-bit SSRC spaces and lacking possibility to have the same
1368	   SSRC value in multiple RTP sessions.  The proposals that have these
1369	   properties and thus are marked as failing are the Single Session and
1370	   Redefine the SSRC field.  The other proposals are all successful in
1371	   meeting this requirement.

1373	B.2.2.  Support Existing Extensions

1375	   The second sub-requirement is how well the proposals support using
1376	   the existing RTP mechanisms.  Here both Single Session and Redefine
1377	   the SSRC field will have clear issues as they cannot support the same
1378	   full 32-bit SSRC value in two different RTP sessions.  This is
1379	   clearly an issue for the XOR based FEC.  RTP retransmission and
1380	   scalable encoding are minor issues as there exist alternatives to
1381	   those mechanisms that works with the structure of these two
1382	   proposals.  Thus we give them a fail.  The Header Extension gets a
1383	   partial due to unclear interaction between putting in an header
1384	   extension and these mechanisms.

1386	B.3.  Ensure SRTP Functions

1388	   This requirement is about ensuring both secure and efficient usage of
1389	   SRTP.  The Octet in Padding field proposal gets a fail as the
1390	   receiving end-point cannot determine the intended RTP session prior
1391	   to de-encryption of the padding field.  Thus a catch-22 arises which
1392	   can only be resolved by trying all session contexts and see what
1393	   decrypts.  This causes a security vulnerability as an attacker can
1394	   inject a packet which does not meet any of the session contexts.  The
1395	   receiver will then attempt decryption and authentication of it using
1396	   all its session contexts, increasing the amount of wasted resources
1397	   by a factor equal to the number of multiplexed sessions.  Thus this
1398	   proposal gets a fail.

1400	   The proposal of Overloading the SRTP MKI field as session identifier
1401	   gets a partial due to the fact that it cannot use SRTP's key-
1402	   management mechanism out of the box.  It forces the key-management
1403	   mechanism and the SRTP implementations to maintain the MKI-to-RTP
1404	   session bindings to maintain secure and correct function.

1406	   The Redefine the SSRC field gets a partial due to its need to modify
1407	   the key-management mechanisms to correctly identify the partial SSRC
1408	   space the parameters applies to.  Similarly, the SRTP implementation
1409	   also needs to be updated to correctly support this security context
1410	   differentiation.

1412	   The header extension based solution gets a less severe partial than
1413	   Redefine the SSRC and the MKI.  It will however have an issue when
1414	   using a gateway to a domain that does not multiplex multiple RTP
1415	   sessions over the same transport.  Then the gateway will require to
1416	   be in the security context to be able to add or remove the header
1417	   extension as it is in the part of the packet that is integrity
1418	   protected by SRTP.

1420	   The remaining two proposals do not affect SRTP mechanisms and thus
1421	   successfully meet this requirement.

1423	B.4.  Don't Redefine Used Bits

1425	   This requirement is all about RTP and RTCP header fields having a
1426	   given definition ought not be changed as it can cause
1427	   interoperability problems between modified and non-modified
1428	   implementations.  This becomes especially problematic in RTP sessions
1429	   used for multi-party sessions.

1431	   Redefine the SSRC field gets a big fail on this as it redefines the
1432	   SSRC field, a core field in RTP.  It has been identified that such a
1433	   change will have issues since if it gets connected to a non-modified
1434	   end-point that randomly assigns the SSRC, as supposed by RFC 3550,
1435	   those SSRCs will be distributed over different RTP sessions at the
1436	   modified end-point.  Also other functions using the SSRC field, not
1437	   understanding the additional semantics of the SSRC field, is likely
1438	   to have issues.

1440	   Using the SRTP MKI field to identify a session is overloading that
1441	   field with double semantics.  This likely has minimal negative impact
1442	   in RTP since it ought to be possible to have the SRTP stack use the
1443	   MKI field to both look up the security context and which output RTP
1444	   session the processed packet belongs to.  However, this redefinition
1445	   clearly creates issues with the key-management scheme.  That will
1446	   have to be modified to handle both this change and deal with the
1447	   interoperability issues when negotiating its usage.  This gets a full
1448	   fail due to that it makes the problem someone else's, namely the RTP
1449	   implementers.

1451	   Defining an Octet in the Padding field redefines a field, whose
1452	   definition is to have zero value and is expected to be ignored by the
1453	   receiver according to the original semantics.  Thus this is one of
1454	   the more benign modifications one can do, however this can still
1455	   cause issues in implementations that unnecessarily check the field
1456	   values, or in Firewalls.  This is judged to be partially meeting the
1457	   requirement.

1459	   The Header Extension proposal does in fact not redefine any currently
1460	   used bits in RTP.  The header extension would be a correctly
1461	   identified extension with its own definition.  However, it does
1462	   redefine a rule on what header extensions are for.  The RTCP solution
1463	   however would have more severe impact as it would need to redefine
1464	   the standard meaning of an RTCP packet header in addition to the
1465	   default compound packet rules.  Due to these issues the proposal
1466	   fails to meet this requirement.

1468	   The multiplexing shim and the single session both successfully meet
1469	   this requirement.

1471	B.5.  Firewall Friendly

1473	   This requirement is clearly difficult to judge as firewall
1474	   implementations are highly different in both implementation, scope of
1475	   what it investigates in packets, and set policies.  A reasonable goal
1476	   is to minimize the likeliness that rules and policies intended to let
1477	   RTP media streams pass, will also let these streams through when
1478	   multiplexing RTP sessions over a single transport.  The below
1479	   analysis shows that no solution is truly firewall friendly and all
1480	   are judged as being partially meeting this goal.  However, the reason
1481	   why it is believed that a firewall might react to the streams are
1482	   quite different.

1484	   The Single Session and Redefine the SSRC field are likely the least
1485	   suspect solutions from a firewall perspective.  However, as their
1486	   transport flows contain multiple SSRCs with payloads that indicate
1487	   likely multiple different media types they are still likely to make a
1488	   picky firewall block the transport.  This is especially true for
1489	   Firewalls that take signalling messages into account where it will
1490	   expect a particular media type in a given context.  A non upgraded
1491	   firewall might in fact produce two different contexts with
1492	   overlapping transport parameters where both rules will receive media
1493	   streams of the other media type that are outside of the allowed rule.
1494	   However, to be clear if these proposals doesn't get through, none of
1495	   the other will either as they all will have this behaviour.

1497	   The header extension proposal is potentially problematic for two
1498	   reasons.  The first reason, which also other proposals has, is
1499	   related to that the same SSRC value can exist in two RTP sessions
1500	   over the same underlying flow.  Anyone tracking the sequence number
1501	   and timestamp will react badly as the second media stream with the
1502	   same SSRC causes constant jumps back and forth in these fields
1503	   compared to the first stream, if packets are transmitted
1504	   simultaneously for both SSRCs.  This issue can likely only be solved
1505	   by having the Firewalls that like to track flows to also use the
1506	   session identifier to create context.  This is possible as the header
1507	   extension will be in the clear and in the front.  The second issue is
1508	   that the header extension itself can get the firewall to react.
1509	   Especially very picky ones that expect packets with certain media
1510	   types to have certain packet lengths.  They are not compatible with a
1511	   header extension.

1513	   The Multiplexing Shim shares the issue with multiple flows for the
1514	   same SSRC.  Firewalls and deep packet inspection cause the shim
1515	   placement to be in question.  If it is a pre-fixed shim, it prevents
1516	   the packet from looking like regular IP/UDP/RTP packets and be
1517	   correctly classified in Firewalls and DPI engines.  However, if one
1518	   puts it last, it is unlikely that any firewall or DPI ever will be
1519	   able to take the session context into account as it is at the end of
1520	   the packet.  This as many line rate processing devices only take a
1521	   certain amount of the headers into account.

1523	   The SRTP MKI field is likely the solution that has least firewall and
1524	   DPI issues, after the single RTP session.  There is no additional
1525	   suspect field.  The only difference from a single RTP session in the
1526	   transport flow is the fact that multiple MKI are guaranteed to be
1527	   used.  However, that can occur also in a single RTP session usage.
1528	   Thus the only issues are the one shared with single session and the
1529	   one that several RTP media streams can use the same SSRC.

1531	   The octet in the padding field has, in addition to the issues the
1532	   SRTP MKI field has, the single issue that it redefines something that
1533	   is supposed to be zero into a value.  Thus potentially causing a
1534	   deeply inspecting firewall to clamp the flow in fear of covert
1535	   channel or non-compliance.

1537	B.6.  Monitoring and Reporting

1539	   The monitoring and reporting requirement considers several aspects.
1540	   How useful monitoring can one get from an existing legacy monitor,
1541	   and secondary any issues in upgrading them to handle the selected
1542	   solution.  Thirdly, packet selector filters and packet sniffers
1543	   concerns are considered.

1545	   In general one can expect the proposals that have only a single SSRC
1546	   space to work better with legacy.  Thus both Single Session and
1547	   Redefine SSRC space can gather and report data on media flows most
1548	   likely.  The only potential issue is that due to the different media
1549	   types and clock rates, some failure can occur.  In particular a third
1550	   party monitor can be targeted to a specific media type, like
1551	   monitoring VoIP.  That monitor will have problems processing any
1552	   video packets correctly and generate the VoIP specific metrics for
1553	   any video sending SSRC.  In general, no legacy solution for
1554	   monitoring will be able to correctly create the sub-contexts that
1555	   each RTP session has in the solutions, without update to handle the
1556	   new semantics.  Also when it comes to the packet filtering and
1557	   selector filters, fine grained control can only be accomplished
1558	   implementing the new semantics.  Therefore only the Single Session
1559	   meets this requirement fully.

1561	   Redefine the SSRC field is close to fully meeting the requirement,
1562	   however due to that there exist a session structure that is hidden to
1563	   anyone that is not upgraded to understand the semantics, this only
1564	   gets a partial.

1566	   The other proposals all can have multiple RTP sessions using the same
1567	   SSRC.  This will create significant issues for any legacy third party
1568	   monitor.  Only an updated monitor, or for that matter packet
1569	   selector, can pick out the individual media streams and their
1570	   associated RTCP traffic.  Thus all these proposals gets a failure to
1571	   meet the requirement.

1573	B.7.  Usable over Multicast

1575	   As discussed earlier the goal with having the option usable also over
1576	   multicast is to remove the need to produce different media streams
1577	   for transport over unicast and multicast.  All of the proposals
1578	   successfully meet the requirement.

1580	B.8.  Incremental Deployment

1582	   The possibility to deploy the usage of the multiplexing of multiple
1583	   RTP sessions over a single transport, especially in the context of
1584	   multi-party sessions, is a great benefit for any of the proposals.
1585	   Thus not all end-point implementations needs to be upgraded before
1586	   one start enabling it in the central node and any signalling.

1588	   Considering a centralized multi-party application where some
1589	   participants are using multiple transport flows and you want to
1590	   enable one particular participant to use the single transport to the
1591	   central node, one criteria stands out.  The possibility to have one
1592	   RTP session per transport in one leg, and in the next multiplex them
1593	   together with minimal complexity and packet changes.  Here there are
1594	   significant differences.

1596	   The Multiplexing Shim has the least overhead for this.  As the
1597	   central node or gateway between deployments only needs to either add
1598	   or remove the shim identifier and then forward the packet over the
1599	   corresponding transport, either a joint one on the single transport
1600	   side, or over the individual one on the multiple transport side.

1602	   The SRTP MKI field proposal is almost as good, as the only main
1603	   difference is the need to coordinate the used MKIs on the non-
1604	   multiplexed legs so that there is no overlap between the RTP
1605	   sessions.  And if there is, the MKI can be translated in gateway as
1606	   SRTP has no integrity protection over the MKI.  Thus both
1607	   multiplexing shim and SRTP MKI field does successfully meet this
1608	   requirement.

1610	   The Header Extension supports multiple full 32-bit SSRC spaces and
1611	   can thus handle all the RTP sessions without need for any SSRC
1612	   translation, however this proposal does run into the problem that the
1613	   gateway needs to be in the security context to be able to add or
1614	   remove the header extension when SRTP is used.  In addition to the
1615	   security implications of that, there is a complexity overhead due to
1616	   the need to redo the authentication tags on all RTP/RTCP packets.
1617	   Thus it gets a partial.

1619	   The Octet in the Padding field share issues with the header extension
1620	   but have even higher complexities for this.  The reason is that the
1621	   padding field is also encrypted.  Thus to add or remove it (although
1622	   removing it might be unnecessary) forces the end-point to encrypt at
1623	   least that byte also, and for ciphers that are not stream-ciphers,
1624	   the whole packet needs to be re-encrypted.  Thus this proposal gets a
1625	   very weak partially meeting the requirement.

1627	   The Single Session and Redefine the SSRC field do not allow several
1628	   vanilla RTP sessions to be connected to these proposals.  The reason
1629	   is the single 32-bit SSRC space they have.  Single Session only has
1630	   one session and the Redefine the SSRC fields uses some of the bits as
1631	   session identifier.  This forces the gateway to translate the SSRC
1632	   whenever it does not fulfil the rules or semantics of the multiplexed
1633	   side.  For Redefine SSRC field this becomes almost constant as the
1634	   session identifier part of the SSRC has to be the same over all SSRCs
1635	   from the same session.  For Single Session it might only be needed
1636	   when there otherwise would be an SSRC collision between the sessions.
1637	   This further assumes that the non-multiplexed side would never use
1638	   any of the RTP mechanisms that require the same SSRC in multiple RTP
1639	   sessions, as they cannot be gatewayed at all.  When translating an
1640	   SSRC there is first of all an overhead, with SRTP that includes a
1641	   complete authenticate, decrypt, encrypt and create a new
1642	   authentication tag cycle.  In addition, the SSRC translation could
1643	   potentially be a deployment obstacle for new RTP/RTCP extensions that
1644	   has to be understood by the translator to be correctly translated.
1645	   Therefore these two proposals gets a fail to meet the requirements.

1647	B.9.  Summary and Conclusion

1649	   This section contains a summary table of the high level outcome
1650	   against the different requirements.

1652	   A table mapping the requirements against the ID numbers used in the
1653	   table is the following:

1655	   1: Support multiple RTP sessions over one transport flow

1657	   2: Enable same SSRC value in multiple RTP sessions

1659	      2.1:  Avoid SSRC translation in gateways/translators

1661	      2.2:  Support existing extensions

1663	   3: Ensure SRTP functions

1665	   4: Don't Redefine used bits

1667	   5: Firewall Friendly

1669	   6: Monitoring and Reporting still needs to function

1671	   7: Usable over Multicast

1673	   8: Incremental deployment

1675	   OH:  Overhead in Bytes.  + means variable

1677	         ---------------+---+---+---+---+---+---+---+---+---+----
1678	         Solution       | 1 |2.1|2.2| 3 | 4 | 5 | 6 | 7 | 8 | OH
1679	         ---------------+---+---+---+---+---+---+---+---+---+----
1680	         Header Ext.    | S | S | P | P | F | P | F | S | P | 8+
1681	         Multiplex Shim | S | S | S | S | S | P | F | S | S | 1
1682	         Single Session | F | F | F | S | S | P | S | S | F | 0
1683	         SRTP MKI Field | S | S | S | P | F | P | F | S | S | 4
1684	         Padding Field  | S | S | S | F | P | P | F | S | P | 2
1685	         Redefine SSRC  | S | F | F | P | F | P | P | S | S | 0
1686	         ---------------+---+---+---+---+---+---+---+---+---+----

1688	    Figure 6: Summary Table of Evaluation (Successfully (S), Partially
1689	                   (P) or Fails (F) to meet requirement)

1691	   Considering these options, the authors would recommend that AVTCORE
1692	   standardize a solution based on a post or prefixed multiplexing
1693	   field, i.e.  a shim approach combined with the appropriate signalling
1694	   as described in Appendix A.2.

1696	Authors' Addresses

1698	   Magnus Westerlund
1699	   Ericsson
1700	   Farogatan 6
1701	   SE-164 80 Kista
1702	   Sweden

1704	   Phone: +46 10 714 82 87
1705	   Email: magnus.westerlund@ericsson.com

1707	   Colin Perkins
1708	   University of Glasgow
1709	   School of Computing Science
1710	   Glasgow  G12 8QQ
1711	   United Kingdom

1713	   Email: csp@csperkins.org
1714	   URI:   http://csperkins.org/