idnits 2.17.1 

draft-ietf-rtcweb-rtp-usage-06.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (February 25, 2013) is 4077 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-06) exists of
     draft-ietf-avtcore-6222bis-00

  == Outdated reference: A later version (-03) exists of
     draft-ietf-avtcore-avp-codecs-00

  == Outdated reference: A later version (-13) exists of
     draft-ietf-avtcore-multi-media-rtp-session-01

  == Outdated reference: A later version (-18) exists of
     draft-ietf-avtcore-rtp-circuit-breakers-02

  == Outdated reference: A later version (-11) exists of
     draft-ietf-avtext-multiple-clock-rates-08

  == Outdated reference: A later version (-54) exists of
     draft-ietf-mmusic-sdp-bundle-negotiation-03

  == Outdated reference: A later version (-11) exists of
     draft-ietf-rtcweb-audio-01

  == Outdated reference: A later version (-19) exists of
     draft-ietf-rtcweb-overview-06

  == Outdated reference: A later version (-12) exists of
     draft-ietf-rtcweb-security-04

  == Outdated reference: A later version (-20) exists of
     draft-ietf-rtcweb-security-arch-06

  == Outdated reference: A later version (-07) exists of
     draft-westerlund-avtcore-transport-multiplexing-04

  ** Obsolete normative reference: RFC 5285 (Obsoleted by RFC 8285)

  == Outdated reference: A later version (-16) exists of
     draft-ietf-rtcweb-use-cases-and-requirements-10

  == Outdated reference: A later version (-03) exists of
     draft-westerlund-avtcore-multiplex-architecture-02


     Summary: 1 error (**), 0 flaws (~~), 14 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         C. Perkins
3	Internet-Draft                                     University of Glasgow
4	Intended status: Standards Track                           M. Westerlund
5	Expires: August 29, 2013                                        Ericsson
6	                                                                  J. Ott
7	                                                        Aalto University
8	                                                       February 25, 2013

10	  Web Real-Time Communication (WebRTC): Media Transport and Use of RTP
11	                     draft-ietf-rtcweb-rtp-usage-06

13	Abstract

15	   The Web Real-Time Communication (WebRTC) framework provides support
16	   for direct interactive rich communication using audio, video, text,
17	   collaboration, games, etc. between two peers' web-browsers.  This
18	   memo describes the media transport aspects of the WebRTC framework.
19	   It specifies how the Real-time Transport Protocol (RTP) is used in
20	   the WebRTC context, and gives requirements for which RTP features,
21	   profiles, and extensions need to be supported.

23	Status of this Memo

25	   This Internet-Draft is submitted in full conformance with the
26	   provisions of BCP 78 and BCP 79.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF).  Note that other groups may also distribute
30	   working documents as Internet-Drafts.  The list of current Internet-
31	   Drafts is at http://datatracker.ietf.org/drafts/current/.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet-Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   This Internet-Draft will expire on August 29, 2013.

40	Copyright Notice

42	   Copyright (c) 2013 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (http://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with respect
50	   to this document.  Code Components extracted from this document must
51	   include Simplified BSD License text as described in Section 4.e of
52	   the Trust Legal Provisions and are provided without warranty as
53	   described in the Simplified BSD License.

55	Table of Contents

57	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
58	   2.  Rationale  . . . . . . . . . . . . . . . . . . . . . . . . . .  4
59	   3.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  5
60	   4.  WebRTC Use of RTP: Core Protocols  . . . . . . . . . . . . . .  6
61	     4.1.  RTP and RTCP . . . . . . . . . . . . . . . . . . . . . . .  6
62	     4.2.  Choice of the RTP Profile  . . . . . . . . . . . . . . . .  7
63	     4.3.  Choice of RTP Payload Formats  . . . . . . . . . . . . . .  8
64	     4.4.  RTP Session Multiplexing . . . . . . . . . . . . . . . . .  8
65	     4.5.  RTP and RTCP Multiplexing  . . . . . . . . . . . . . . . .  9
66	     4.6.  Reduced Size RTCP  . . . . . . . . . . . . . . . . . . . . 10
67	     4.7.  Symmetric RTP/RTCP . . . . . . . . . . . . . . . . . . . . 10
68	     4.8.  Choice of RTP Synchronisation Source (SSRC)  . . . . . . . 10
69	     4.9.  Generation of the RTCP Canonical Name (CNAME)  . . . . . . 11
70	   5.  WebRTC Use of RTP: Extensions  . . . . . . . . . . . . . . . . 11
71	     5.1.  Conferencing Extensions  . . . . . . . . . . . . . . . . . 11
72	       5.1.1.  Full Intra Request (FIR) . . . . . . . . . . . . . . . 12
73	       5.1.2.  Picture Loss Indication (PLI)  . . . . . . . . . . . . 13
74	       5.1.3.  Slice Loss Indication (SLI)  . . . . . . . . . . . . . 13
75	       5.1.4.  Reference Picture Selection Indication (RPSI)  . . . . 13
76	       5.1.5.  Temporal-Spatial Trade-off Request (TSTR)  . . . . . . 13
77	       5.1.6.  Temporary Maximum Media Stream Bit Rate Request
78	               (TMMBR)  . . . . . . . . . . . . . . . . . . . . . . . 13
79	     5.2.  Header Extensions  . . . . . . . . . . . . . . . . . . . . 14
80	       5.2.1.  Rapid Synchronisation  . . . . . . . . . . . . . . . . 14
81	       5.2.2.  Client-to-Mixer Audio Level  . . . . . . . . . . . . . 14
82	       5.2.3.  Mixer-to-Client Audio Level  . . . . . . . . . . . . . 15
83	   6.  WebRTC Use of RTP: Improving Transport Robustness  . . . . . . 15
84	     6.1.  Negative Acknowledgements and RTP Retransmission . . . . . 15
85	     6.2.  Forward Error Correction (FEC) . . . . . . . . . . . . . . 16
86	   7.  WebRTC Use of RTP: Rate Control and Media Adaptation . . . . . 17
87	     7.1.  Boundary Conditions and Circuit Breakers . . . . . . . . . 17
88	     7.2.  RTCP Extensions for Congestion Control . . . . . . . . . . 18
89	     7.3.  RTCP Limitations for Congestion Control  . . . . . . . . . 18
90	     7.4.  Congestion Control Interoperability With Legacy Systems  . 19
91	   8.  WebRTC Use of RTP: Performance Monitoring  . . . . . . . . . . 20
92	   9.  WebRTC Use of RTP: Future Extensions . . . . . . . . . . . . . 20
93	   10. Signalling Considerations  . . . . . . . . . . . . . . . . . . 20
94	   11. WebRTC API Considerations  . . . . . . . . . . . . . . . . . . 22
95	   12. RTP Implementation Considerations  . . . . . . . . . . . . . . 23
96	     12.1. RTP Sessions and PeerConnections . . . . . . . . . . . . . 23
97	     12.2. Multiple Sources . . . . . . . . . . . . . . . . . . . . . 24
98	     12.3. Multiparty . . . . . . . . . . . . . . . . . . . . . . . . 25
99	     12.4. SSRC Collision Detection . . . . . . . . . . . . . . . . . 26
100	     12.5. Contributing Sources and the CSRC List . . . . . . . . . . 27
101	     12.6. Media Synchronization  . . . . . . . . . . . . . . . . . . 27
102	     12.7. Multiple RTP End-points  . . . . . . . . . . . . . . . . . 28
103	     12.8. Simulcast  . . . . . . . . . . . . . . . . . . . . . . . . 29
104	     12.9. Differentiated Treatment of Flows  . . . . . . . . . . . . 29
105	   13. Open Issues  . . . . . . . . . . . . . . . . . . . . . . . . . 31
106	   14. IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 32
107	   15. Security Considerations  . . . . . . . . . . . . . . . . . . . 32
108	   16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 33
109	   17. References . . . . . . . . . . . . . . . . . . . . . . . . . . 33
110	     17.1. Normative References . . . . . . . . . . . . . . . . . . . 33
111	     17.2. Informative References . . . . . . . . . . . . . . . . . . 36
112	   Appendix A.  Supported RTP Topologies  . . . . . . . . . . . . . . 38
113	     A.1.  Point to Point . . . . . . . . . . . . . . . . . . . . . . 38
114	     A.2.  Multi-Unicast (Mesh) . . . . . . . . . . . . . . . . . . . 41
115	     A.3.  Mixer Based  . . . . . . . . . . . . . . . . . . . . . . . 44
116	       A.3.1.  Media Mixing . . . . . . . . . . . . . . . . . . . . . 44
117	       A.3.2.  Media Switching  . . . . . . . . . . . . . . . . . . . 47
118	       A.3.3.  Media Projecting . . . . . . . . . . . . . . . . . . . 50
119	     A.4.  Translator Based . . . . . . . . . . . . . . . . . . . . . 53
120	       A.4.1.  Transcoder . . . . . . . . . . . . . . . . . . . . . . 53
121	       A.4.2.  Gateway / Protocol Translator  . . . . . . . . . . . . 54
122	       A.4.3.  Relay  . . . . . . . . . . . . . . . . . . . . . . . . 56
123	     A.5.  End-point Forwarding . . . . . . . . . . . . . . . . . . . 60
124	     A.6.  Simulcast  . . . . . . . . . . . . . . . . . . . . . . . . 61
125	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 62

127	1.  Introduction

129	   The Real-time Transport Protocol (RTP) [RFC3550] provides a framework
130	   for delivery of audio and video teleconferencing data and other real-
131	   time media applications.  Previous work has defined the RTP protocol,
132	   along with numerous profiles, payload formats, and other extensions.
133	   When combined with appropriate signalling, these form the basis for
134	   many teleconferencing systems.

136	   The Web Real-Time communication (WebRTC) framework provides the
137	   protocol building blocks to support direct, interactive, real-time
138	   communication using audio, video, collaboration, games, etc., between
139	   two peers' web-browsers.  This memo describes how the RTP framework
140	   is to be used in the WebRTC context.  It proposes a baseline set of
141	   RTP features that are to be implemented by all WebRTC-aware end-
142	   points, along with suggested extensions for enhanced functionality.

144	   The WebRTC overview [I-D.ietf-rtcweb-overview] outlines the complete
145	   WebRTC framework, of which this memo is a part.

147	   The structure of this memo is as follows.  Section 2 outlines our
148	   rationale in preparing this memo and choosing these RTP features.
149	   Section 3 defines requirement terminology.  Requirements for core RTP
150	   protocols are described in Section 4 and suggested RTP extensions are
151	   described in Section 5.  Section 6 outlines mechanisms that can
152	   increase robustness to network problems, while Section 7 describes
153	   congestion control and rate adaptation mechanisms.  The discussion of
154	   mandated RTP mechanisms concludes in Section 8 with a review of
155	   performance monitoring and network management tools that can be used
156	   in the WebRTC context.  Section 9 gives some guidelines for future
157	   incorporation of other RTP and RTP Control Protocol (RTCP) extensions
158	   into this framework.  Section 10 describes requirements placed on the
159	   signalling channel.  Section 11 discusses the relationship between
160	   features of the RTP framework and the WebRTC application programming
161	   interface (API), and Section 12 discusses RTP implementation
162	   considerations.  This memo concludes with an appendix discussing
163	   several different RTP Topologies, and how they affect the RTP
164	   session(s) and various implementation details of possible realization
165	   of central nodes.

167	2.  Rationale

169	   The RTP framework comprises the RTP data transfer protocol, the RTP
170	   control protocol, and numerous RTP payload formats, profiles, and
171	   extensions.  This range of add-ons has allowed RTP to meet various
172	   needs that were not envisaged by the original protocol designers, and
173	   to support many new media encodings, but raises the question of what
174	   extensions are to be supported by new implementations.  The
175	   development of the WebRTC framework provides an opportunity for us to
176	   review the available RTP features and extensions, and to define a
177	   common baseline feature set for all WebRTC implementations of RTP.
178	   This builds on the past 15 years development of RTP to mandate the
179	   use of extensions that have shown widespread utility, while still
180	   remaining compatible with the wide installed base of RTP
181	   implementations where possible.

183	   Other RTP and RTCP extensions not discussed in this document can be
184	   implemented by WebRTC end-points if they are beneficial for new use
185	   cases.  However, they are not necessary to address the WebRTC use
186	   cases and requirements identified to date
187	   [I-D.ietf-rtcweb-use-cases-and-requirements].

189	   While the baseline set of RTP features and extensions defined in this
190	   memo is targeted at the requirements of the WebRTC framework, it is
191	   expected to be broadly useful for other conferencing-related uses of
192	   RTP.  In particular, it is likely that this set of RTP features and
193	   extensions will be appropriate for other desktop or mobile video
194	   conferencing systems, or for room-based high-quality telepresence
195	   applications.

197	3.  Terminology

199	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
200	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
201	   document are to be interpreted as described in [RFC2119].  The RFC
202	   2119 interpretation of these key words applies only when written in
203	   ALL CAPS.  Lower- or mixed-case uses of these key words are not to be
204	   interpreted as carrying special significance in this memo.

206	   We define the following terms:

208	   RTP Media Stream:  A sequence of RTP packets, and associated RTCP
209	      packets, using a single synchronisation source (SSRC) that
210	      together carries part or all of the content of a specific Media
211	      Type from a specific sender source within a given RTP session.

213	   RTP Session:  As defined by [RFC3550], the endpoints belonging to the
214	      same RTP Session are those that share a single SSRC space.  That
215	      is, those endpoints can see an SSRC identifier transmitted by any
216	      one of the other endpoints.  An endpoint can see an SSRC either
217	      directly in RTP and RTCP packets, or as a contributing source
218	      (CSRC) in RTP packets from a mixer.  The RTP Session scope is
219	      hence decided by the endpoints' network interconnection topology,
220	      in combination with RTP and RTCP forwarding strategies deployed by
221	      endpoints and any interconnecting middle nodes.

223	   WebRTC MediaStream:  The MediaStream concept defined by the W3C in
224	      the API.

226	   Other terms are used according to their definitions from the RTP
227	   Specification [RFC3550] and WebRTC overview
228	   [I-D.ietf-rtcweb-overview] documents.

230	4.  WebRTC Use of RTP: Core Protocols

232	   The following sections describe the core features of RTP and RTCP
233	   that need to be implemented, along with the mandated RTP profiles and
234	   payload formats.  Also described are the core extensions providing
235	   essential features that all WebRTC implementations need to implement
236	   to function effectively on today's networks.

238	4.1.  RTP and RTCP

240	   The Real-time Transport Protocol (RTP) [RFC3550] is REQUIRED to be
241	   implemented as the media transport protocol for WebRTC.  RTP itself
242	   comprises two parts: the RTP data transfer protocol, and the RTP
243	   control protocol (RTCP).  RTCP is a fundamental and integral part of
244	   RTP, and MUST be implemented in all WebRTC applications.

246	   The following RTP and RTCP features are sometimes omitted in limited
247	   functionality implementations of RTP, but are REQUIRED in all WebRTC
248	   implementations:

250	   o  Support for use of multiple simultaneous SSRC values in a single
251	      RTP session, including support for RTP end-points that send many
252	      SSRC values simultaneously.

254	   o  Random choice of SSRC on joining a session; collision detection
255	      and resolution for SSRC values (but see also Section 4.8).

257	   o  Support for reception of RTP data packets containing CSRC lists,
258	      as generated by RTP mixers, and RTCP packets relating to CSRCs.

260	   o  Support for sending correct synchronization information in the
261	      RTCP Sender Reports, to allow a receiver to implement lip-sync,
262	      with RECOMMENDED support for the rapid RTP synchronisation
263	      extensions (see Section 5.2.1).

265	   o  Support for sending and receiving RTCP SR, RR, SDES, and BYE
266	      packet types, with OPTIONAL support for other RTCP packet types;
267	      implementations MUST ignore unknown RTCP packet types.

269	   o  Support for multiple end-points in a single RTP session, and for
270	      scaling the RTCP transmission interval according to the number of
271	      participants in the session; support for randomised RTCP
272	      transmission intervals to avoid synchronisation of RTCP reports;
273	      support for RTCP timer reconsideration.

275	   o  Support for configuring the RTCP bandwidth as a fraction of the
276	      media bandwidth, and for configuring the fraction of the RTCP
277	      bandwidth allocated to senders, e.g., using the SDP "b=" line.

279	   It is known that a significant number of legacy RTP implementations,
280	   especially those targeted at VoIP-only systems, do not support all of
281	   the above features, and in some cases do not support RTCP at all.
282	   Implementers are advised to consider the requirements for graceful
283	   degradation when interoperating with legacy implementations.

285	   Other implementation considerations are discussed in Section 12.

287	4.2.  Choice of the RTP Profile

289	   The complete specification of RTP for a particular application domain
290	   requires the choice of an RTP Profile.  For WebRTC use, the "Extended
291	   Secure RTP Profile for Real-time Transport Control Protocol (RTCP)-
292	   Based Feedback (RTP/SAVPF)" [RFC5124] as extended by
293	   [I-D.ietf-avtcore-avp-codecs] MUST be implemented.  This builds on
294	   the basic RTP/AVP profile [RFC3551], the RTP profile for RTCP-based
295	   feedback (RTP/AVPF) [RFC4585], and the secure RTP profile (RTP/SAVP)
296	   [RFC3711].

298	   The RTCP-based feedback extensions [RFC4585] are needed for the
299	   improved RTCP timer model, that allows more flexible transmission of
300	   RTCP packets in response to events, rather than strictly according to
301	   bandwidth.  This is vital for being able to report congestion events.
302	   These extensions also save RTCP bandwidth, and will commonly only use
303	   the full RTCP bandwidth allocation if there are many events that
304	   require feedback.  They are also needed to make use of the RTP
305	   conferencing extensions discussed in Section 5.1.

307	      Note: The enhanced RTCP timer model defined in the RTP/AVPF
308	      profile is backwards compatible with legacy systems that implement
309	      only the base RTP/AVP profile, given some constraints on parameter
310	      configuration such as the RTCP bandwidth value and "trr-int" (the
311	      most important factor for interworking with RTP/AVP end-points via
312	      a gateway is to set the trr-int parameter to a value representing
313	      4 seconds).

315	   The secure RTP profile [RFC3711] is needed to provide media
316	   encryption, integrity protection, replay protection and a limited
317	   form of source authentication.  WebRTC implementations MUST NOT send
318	   packets using the basic RTP/AVP profile or the RTP/AVPF profile; they
319	   MUST employ the full RTP/SAVPF profile to protect all RTP and RTCP
320	   packets that are generated.  The default and mandatory to implement
321	   transforms listed in Section 5 of [RFC3711] SHALL apply.

323	   Implementations MUST support DTLS-SRTP [RFC5764] for key-management.
324	   Other key management schemes MAY be supported.

326	4.3.  Choice of RTP Payload Formats

328	   Implementations MUST follow the WebRTC Audio Codec and Processing
329	   Requirements [I-D.ietf-rtcweb-audio] and SHOULD follow the updated
330	   recommendations for audio codecs in the RTP/AVP Profile
331	   [I-D.ietf-avtcore-avp-codecs].  Support for other audio codecs is
332	   OPTIONAL.

334	   (tbd: the mandatory to implement video codec is not yet decided)

336	   Endpoints MAY signal support for multiple RTP payload formats, or
337	   multiple configurations of a single RTP payload format, provided each
338	   payload format uses a different RTP payload type number.  An endpoint
339	   that has signalled support for multiple RTP payload formats SHOULD
340	   accept data in any of those payload formats at any time, unless it
341	   has previously signalled limitations on its decoding capability.
342	   This requirement is constrained if several media types are sent in
343	   the same RTP session.  In such a case, a source (SSRC) is restricted
344	   to switching only between the RTP payload formats signalled for the
345	   media type that is being sent by that source; see Section 4.4.  To
346	   support rapid rate adaptation by changing codec, RTP does not require
347	   advance signalling for changes between RTP payload formats that were
348	   signalled during session set-up.

350	   An RTP sender that changes between two RTP payload types that use
351	   different RTP clock rates MUST follow the recommendations in Section
352	   4.1 of [I-D.ietf-avtext-multiple-clock-rates].  RTP receivers MUST
353	   follow the recommendations in Section 4.3 of
354	   [I-D.ietf-avtext-multiple-clock-rates], in order to support sources
355	   that switch between clock rates in an RTP session (these
356	   recommendations for receivers are backwards compatible with the case
357	   where senders use only a single clock rate).

359	4.4.  RTP Session Multiplexing

361	   An association amongst a set of participants communicating with RTP
362	   is known as an RTP session.  A participant can be involved in
363	   multiple RTP sessions at the same time.  In a multimedia session,
364	   each medium has typically been carried in a separate RTP session with
365	   its own RTCP packets (i.e., one RTP session for the audio, with a
366	   separate RTP session using a different transport address for the
367	   video; if SDP is used, this corresponds to one RTP session for each
368	   "m=" line in the SDP).  WebRTC implementations of RTP are REQUIRED to
369	   implement support for multimedia sessions in this way, for
370	   compatibility with legacy systems.

372	   In today's networks, however, with the widespread use of Network
373	   Address/Port Translators (NAT/NAPT) and Firewalls (FW), it is
374	   desirable to reduce the number of transport addresses used by real-
375	   time media applications using RTP by combining all RTP media streams
376	   in a single RTP session.  Using a single RTP session also effects the
377	   possibility for differentiated treatment of media flows.  This is
378	   further discussed in Section 12.9.  WebRTC implementations of RTP are
379	   REQUIRED to support transport of all RTP media streams, independent
380	   of media type, in a single RTP session according to
381	   [I-D.ietf-avtcore-multi-media-rtp-session].  If such RTP session
382	   set-up is to be used, this MUST be negotiated during the signalling
383	   phase [I-D.ietf-mmusic-sdp-bundle-negotiation].

385	   Support for multiple RTP sessions over a single UDP flow as defined
386	   by [I-D.westerlund-avtcore-transport-multiplexing] is RECOMMENDED/
387	   OPTIONAL.  If multiple RTP sessions are to be multiplexed onto a
388	   single UDP flow, this MUST be negotiated during the signalling phase.

390	      (tbd: No consensus on the level of support of Multiple RTP
391	      sessions over a single UDP flow.)

393	   Further discussion about when different RTP session structures and
394	   multiplexing methods are suitable can be found in the memo on
395	   Guidelines for using the Multiplexing Features of RTP
396	   [I-D.westerlund-avtcore-multiplex-architecture].

398	4.5.  RTP and RTCP Multiplexing

400	   Historically, RTP and RTCP have been run on separate transport layer
401	   addresses (e.g., two UDP ports for each RTP session, one port for RTP
402	   and one port for RTCP).  With the increased use of Network Address/
403	   Port Translation (NAPT) this has become problematic, since
404	   maintaining multiple NAT bindings can be costly.  It also complicates
405	   firewall administration, since multiple ports need to be opened to
406	   allow RTP traffic.  To reduce these costs and session set-up times,
407	   support for multiplexing RTP data packets and RTCP control packets on
408	   a single port for each RTP session is REQUIRED, as specified in
409	   [RFC5761].  For backwards compatibility, implementations are also
410	   REQUIRED to support sending of RTP and RTCP to separate destination
411	   ports.

413	   Note that the use of RTP and RTCP multiplexed onto a single transport
414	   port ensures that there is occasional traffic sent on that port, even
415	   if there is no active media traffic.  This can be useful to keep NAT
416	   bindings alive, and is the recommend method for application level
417	   keep-alives of RTP sessions [RFC6263].

419	4.6.  Reduced Size RTCP

421	   RTCP packets are usually sent as compound RTCP packets, and [RFC3550]
422	   requires that those compound packets start with an Sender Report (SR)
423	   or Receiver Report (RR) packet.  When using frequent RTCP feedback
424	   messages under the RTP/AVPF Profile [RFC4585] these statistics are
425	   not needed in every packet, and unnecessarily increase the mean RTCP
426	   packet size.  This can limit the frequency at which RTCP packets can
427	   be sent within the RTCP bandwidth share.

429	   To avoid this problem, [RFC5506] specifies how to reduce the mean
430	   RTCP message size and allow for more frequent feedback.  Frequent
431	   feedback, in turn, is essential to make real-time applications
432	   quickly aware of changing network conditions, and to allow them to
433	   adapt their transmission and encoding behaviour.  Support for sending
434	   RTCP feedback packets as [RFC5506] non-compound packets is REQUIRED,
435	   but MUST be negotiated using the signalling channel before use.  For
436	   backwards compatibility, implementations are also REQUIRED to support
437	   the use of compound RTCP feedback packets if the remote endpoint does
438	   not agree to the use of non-compound RTCP in the signalling exchange.

440	4.7.  Symmetric RTP/RTCP

442	   To ease traversal of NAT and firewall devices, implementations are
443	   REQUIRED to implement and use Symmetric RTP [RFC4961].  This requires
444	   that the IP address and port used for sending and receiving RTP and
445	   RTCP packets are identical.  The reasons for using symmetric RTP is
446	   primarily to avoid issues with NAT and Firewalls by ensuring that the
447	   flow is actually bi-directional and thus kept alive and registered as
448	   flow the intended recipient actually wants.  In addition, it saves
449	   resources, specifically ports at the end-points, but also in the
450	   network as NAT mappings or firewall state is not unnecessary bloated.
451	   Also the amount of QoS state is reduced.

453	4.8.  Choice of RTP Synchronisation Source (SSRC)

455	   Implementations are REQUIRED to support signalled RTP SSRC values,
456	   using the "a=ssrc:" SDP attribute defined in Sections 4.1 and 5 of
457	   [RFC5576], and MUST also support the "previous-ssrc" source attribute
458	   defined in Section 6.2 of [RFC5576].  Other attributes defined in
459	   [RFC5576] MAY be supported.

461	   Use of the "a=ssrc:" attribute is OPTIONAL.  Implementations MUST
462	   support random SSRC assignment, and MUST support SSRC collision
463	   detection and resolution, both according to [RFC3550].

465	4.9.  Generation of the RTCP Canonical Name (CNAME)

467	   The RTCP Canonical Name (CNAME) provides a persistent transport-level
468	   identifier for an RTP endpoint.  While the Synchronisation Source
469	   (SSRC) identifier for an RTP endpoint can change if a collision is
470	   detected, or when the RTP application is restarted, its RTCP CNAME is
471	   meant to stay unchanged, so that RTP endpoints can be uniquely
472	   identified and associated with their RTP media streams within a set
473	   of related RTP sessions.  For proper functionality, each RTP endpoint
474	   needs to have a unique RTCP CNAME value.

476	   The RTP specification [RFC3550] includes guidelines for choosing a
477	   unique RTP CNAME, but these are not sufficient in the presence of NAT
478	   devices.  In addition, long-term persistent identifiers can be
479	   problematic from a privacy viewpoint.  Accordingly, support for
480	   generating a short-term persistent RTCP CNAMEs following
481	   [I-D.ietf-avtcore-6222bis] is RECOMMENDED.

483	   An WebRTC end-point MUST support reception of any CNAME that matches
484	   the syntax limitations specified by the RTP specification [RFC3550]
485	   and cannot assume that any CNAME will be chosen according to the form
486	   suggested above.

488	5.  WebRTC Use of RTP: Extensions

490	   There are a number of RTP extensions that are either needed to obtain
491	   full functionality, or extremely useful to improve on the baseline
492	   performance, in the WebRTC application context.  One set of these
493	   extensions is related to conferencing, while others are more generic
494	   in nature.  The following subsections describe the various RTP
495	   extensions mandated or suggested for use within the WebRTC context.

497	5.1.  Conferencing Extensions

499	   RTP is inherently a group communication protocol.  Groups can be
500	   implemented using a centralised server, multi-unicast, or using IP
501	   multicast.  While IP multicast was popular in early deployments, in
502	   today's practice, overlay-based conferencing dominates, typically
503	   using one or more central servers to connect endpoints in a star or
504	   flat tree topology.  These central servers can be implemented in a
505	   number of ways as discussed in Appendix A, and in the memo on RTP
506	   Topologies [I-D.westerlund-avtcore-rtp-topologies-update].

508	   As discussed in Section 3.7 of
509	   [I-D.westerlund-avtcore-rtp-topologies-update], the use of a video
510	   switching MCU makes the use of RTCP for congestion control, or any
511	   type of quality reports, very problematic.  Also, as discussed in
512	   section 3.8 of [I-D.westerlund-avtcore-rtp-topologies-update], the
513	   use of a content modifying MCU with RTCP termination breaks RTP loop
514	   detection and removes the ability for receivers to identify active
515	   senders.  RTP Transport Translators (Topo-Translator) are not of
516	   immediate interest to WebRTC, although the main difference compared
517	   to point to point is the possibility of seeing multiple different
518	   transport paths in any RTCP feedback.  Accordingly, only Point to
519	   Point (Topo-Point-to-Point), Multiple concurrent Point to Point
520	   (Mesh) and RTP Mixers (Topo-Mixer) topologies are needed to achieve
521	   the use-cases to be supported in WebRTC initially.  These RECOMMENDED
522	   topologies are expected to be supported by all WebRTC end-points
523	   (these topologies require no special RTP-layer support in the end-
524	   point if the RTP features mandated in this memo are implemented).

526	   The RTP extensions described below to be used with centralised
527	   conferencing -- where one RTP Mixer (e.g., a conference bridge)
528	   receives a participant's RTP media streams and distributes them to
529	   the other participants -- are not necessary for interoperability; an
530	   RTP endpoint that does not implement these extensions will work
531	   correctly, but might offer poor performance.  Support for the listed
532	   extensions will greatly improve the quality of experience and, to
533	   provide a reasonable baseline quality, some these extensions are
534	   mandatory to be supported by WebRTC end-points.

536	   The RTCP conferencing extensions are defined in Extended RTP Profile
537	   for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/
538	   AVPF) [RFC4585] and the "Codec Control Messages in the RTP Audio-
539	   Visual Profile with Feedback (AVPF)" (CCM) [RFC5104] and are fully
540	   usable by the Secure variant of this profile (RTP/SAVPF) [RFC5124].

542	5.1.1.  Full Intra Request (FIR)

544	   The Full Intra Request is defined in Sections 3.5.1 and 4.3.1 of the
545	   Codec Control Messages [RFC5104].  This message is used to make the
546	   mixer request a new Intra picture from a participant in the session.
547	   This is used when switching between sources to ensure that the
548	   receivers can decode the video or other predictive media encoding
549	   with long prediction chains.  It is REQUIRED that WebRTC senders
550	   understand the react to this feedback message since it greatly
551	   improves the user experience when using centralised mixer-based
552	   conferencing; support for sending the FIR message is OPTIONAL.

554	5.1.2.  Picture Loss Indication (PLI)

556	   The Picture Loss Indication is defined in Section 6.3.1 of the RTP/
557	   AVPF profile [RFC4585].  It is used by a receiver to tell the sending
558	   encoder that it lost the decoder context and would like to have it
559	   repaired somehow.  This is semantically different from the Full Intra
560	   Request above as there could be multiple ways to fulfil the request.
561	   It is REQUIRED that WebRTC senders understand and react to this
562	   feedback message as a loss tolerance mechanism; receivers MAY send
563	   PLI messages.

565	5.1.3.  Slice Loss Indication (SLI)

567	   The Slice Loss Indicator is defined in Section 6.3.2 of the RTP/AVPF
568	   profile [RFC4585].  It is used by a receiver to tell the encoder that
569	   it has detected the loss or corruption of one or more consecutive
570	   macro blocks, and would like to have these repaired somehow.  The use
571	   of this feedback message is OPTIONAL as a loss tolerance mechanism.

573	5.1.4.  Reference Picture Selection Indication (RPSI)

575	   Reference Picture Selection Indication (RPSI) is defined in Section
576	   6.3.3 of the RTP/AVPF profile [RFC4585].  Some video coding standards
577	   allow the use of older reference pictures than the most recent one
578	   for predictive coding.  If such a codec is in used, and if the
579	   encoder has learned about a loss of encoder-decoder synchronisation,
580	   a known-as-correct reference picture can be used for future coding.
581	   The RPSI message allows this to be signalled.  Support for RPSI
582	   messages is OPTIONAL.

584	5.1.5.  Temporal-Spatial Trade-off Request (TSTR)

586	   The temporal-spatial trade-off request and notification are defined
587	   in Sections 3.5.2 and 4.3.2 of [RFC5104].  This request can be used
588	   to ask the video encoder to change the trade-off it makes between
589	   temporal and spatial resolution, for example to prefer high spatial
590	   image quality but low frame rate.  Support for TSTR requests and
591	   notifications is OPTIONAL.

593	5.1.6.  Temporary Maximum Media Stream Bit Rate Request (TMMBR)

595	   This feedback message is defined in Sections 3.5.4 and 4.2.1 of the
596	   Codec Control Messages [RFC5104].  This message and its notification
597	   message are used by a media receiver to inform the sending party that
598	   there is a current limitation on the amount of bandwidth available to
599	   this receiver.  This can be various reasons for this: for example, an
600	   RTP mixer can use this message to limit the media rate of the sender
601	   being forwarded by the mixer (without doing media transcoding) to fit
602	   the bottlenecks existing towards the other session participants.  It
603	   is REQUIRED that this feedback message is supported.  WebRTC senders
604	   are REQUIRED to implement support for TMMBR messages, and MUST follow
605	   bandwidth limitations set by a TMMBR message received for their SSRC.
606	   The sending of TMMBR requests is OPTIONAL.

608	5.2.  Header Extensions

610	   The RTP specification [RFC3550] provides the capability to include
611	   RTP header extensions containing in-band data, but the format and
612	   semantics of the extensions are poorly specified.  The use of header
613	   extensions is OPTIONAL in the WebRTC context, but if they are used,
614	   they MUST be formatted and signalled following the general mechanism
615	   for RTP header extensions defined in [RFC5285], since this gives
616	   well-defined semantics to RTP header extensions.

618	   As noted in [RFC5285], the requirement from the RTP specification
619	   that header extensions are "designed so that the header extension may
620	   be ignored" [RFC3550] stands.  To be specific, header extensions MUST
621	   only be used for data that can safely be ignored by the recipient
622	   without affecting interoperability, and MUST NOT be used when the
623	   presence of the extension has changed the form or nature of the rest
624	   of the packet in a way that is not compatible with the way the stream
625	   is signalled (e.g., as defined by the payload type).  Valid examples
626	   might include metadata that is additional to the usual RTP
627	   information.

629	5.2.1.  Rapid Synchronisation

631	   Many RTP sessions require synchronisation between audio, video, and
632	   other content.  This synchronisation is performed by receivers, using
633	   information contained in RTCP SR packets, as described in the RTP
634	   specification [RFC3550].  This basic mechanism can be slow, however,
635	   so it is RECOMMENDED that the rapid RTP synchronisation extensions
636	   described in [RFC6051] be implemented.  The rapid synchronisation
637	   extensions use the general RTP header extension mechanism [RFC5285],
638	   which requires signalling, but are otherwise backwards compatible.

640	5.2.2.  Client-to-Mixer Audio Level

642	   The Client to Mixer Audio Level extension [RFC6464] is an RTP header
643	   extension used by a client to inform a mixer about the level of audio
644	   activity in the packet to which the header is attached.  This enables
645	   a central node to make mixing or selection decisions without decoding
646	   or detailed inspection of the payload, reducing the complexity in
647	   some types of central RTP nodes.  It can also save decoding resources
648	   in receivers, which can choose to decode only the most relevant RTP
649	   media streams based on audio activity levels.

651	   The Client-to-Mixer Audio Level [RFC6464] extension is RECOMMENDED to
652	   be implemented.  If it is implemented, it is REQUIRED that the header
653	   extensions are encrypted according to
654	   [I-D.ietf-avtcore-srtp-encrypted-header-ext] since the information
655	   contained in these header extensions can be considered sensitive.

657	5.2.3.  Mixer-to-Client Audio Level

659	   The Mixer to Client Audio Level header extension [RFC6465] provides
660	   the client with the audio level of the different sources mixed into a
661	   common mix by a RTP mixer.  This enables a user interface to indicate
662	   the relative activity level of each session participant, rather than
663	   just being included or not based on the CSRC field.  This is a pure
664	   optimisations of non critical functions, and is hence OPTIONAL to
665	   implement.  If it is implemented, it is REQUIRED that the header
666	   extensions are encrypted according to
667	   [I-D.ietf-avtcore-srtp-encrypted-header-ext] since the information
668	   contained in these header extensions can be considered sensitive.

670	6.  WebRTC Use of RTP: Improving Transport Robustness

672	   There are some tools that can make RTP flows robust against Packet
673	   loss and reduce the impact on media quality.  However, they all add
674	   extra bits compared to a non-robust stream.  These extra bits need to
675	   be considered, and the aggregate bit-rate MUST be rate-controlled.
676	   Thus, improving robustness might require a lower base encoding
677	   quality, but has the potential to deliver that quality with fewer
678	   errors.  The mechanisms described in the following sub-sections can
679	   be used to improve tolerance to packet loss.

681	6.1.  Negative Acknowledgements and RTP Retransmission

683	   As a consequence of supporting the RTP/SAVPF profile, implementations
684	   will support negative acknowledgements (NACKs) for RTP data packets
685	   [RFC4585].  This feedback can be used to inform a sender of the loss
686	   of particular RTP packets, subject to the capacity limitations of the
687	   RTCP feedback channel.  A sender can use this information to optimise
688	   the user experience by adapting the media encoding to compensate for
689	   known lost packets, for example.

691	   Senders are REQUIRED to understand the Generic NACK message defined
692	   in Section 6.2.1 of [RFC4585], but MAY choose to ignore this feedback
693	   (following Section 4.2 of [RFC4585]).  Receivers MAY send NACKs for
694	   missing RTP packets; [RFC4585] provides some guidelines on when to
695	   send NACKs.  It is not expected that a receiver will send a NACK for
696	   every lost RTP packet, rather it needs to consider the cost of
697	   sending NACK feedback, and the importance of the lost packet, to make
698	   an informed decision on whether it is worth telling the sender about
699	   a packet loss event.

701	   The RTP Retransmission Payload Format [RFC4588] offers the ability to
702	   retransmit lost packets based on NACK feedback.  Retransmission needs
703	   to be used with care in interactive real-time applications to ensure
704	   that the retransmitted packet arrives in time to be useful, but can
705	   be effective in environments with relatively low network RTT (an RTP
706	   sender can estimate the RTT to the receivers using the information in
707	   RTCP SR and RR packets).  The use of retransmissions can also
708	   increase the forward RTP bandwidth, and can potentially worsen the
709	   problem if the packet loss was caused by network congestion.  We
710	   note, however, that retransmission of an important lost packet to
711	   repair decoder state can have lower cost than sending a full intra
712	   frame.  It is not appropriate to blindly retransmit RTP packets in
713	   response to a NACK.  The importance of lost packets and the
714	   likelihood of them arriving in time to be useful needs to be
715	   considered before RTP retransmission is used.

717	   Receivers are REQUIRED to implement support for RTP retransmission
718	   packets [RFC4588].  Senders MAY send RTP retransmission packets in
719	   response to NACKs if the RTP retransmission payload format has been
720	   negotiated for the session, and if the sender believes it is useful
721	   to send a retransmission of the packet(s) referenced in the NACK.  An
722	   RTP sender is not expected to retransmit every NACKed packet.

724	6.2.  Forward Error Correction (FEC)

726	   The use of Forward Error Correction (FEC) can provide an effective
727	   protection against some degree of packet loss, at the cost of steady
728	   bandwidth overhead.  There are several FEC schemes that are defined
729	   for use with RTP.  Some of these schemes are specific to a particular
730	   RTP payload format, others operate across RTP packets and can be used
731	   with any payload format.  It needs to be noted that using redundant
732	   encoding or FEC will lead to increased play out delay, which needs to
733	   be considered when choosing the redundancy or FEC formats and their
734	   respective parameters.

736	   If an RTP payload format negotiated for use in a WebRTC session
737	   supports redundant transmission or FEC as a standard feature of that
738	   payload format, then that support MAY be used in the WebRTC session,
739	   subject to any appropriate signalling.

741	   There are several block-based FEC schemes that are designed for use
742	   with RTP independent of the chosen RTP payload format.  At the time
743	   of this writing there is no consensus on which, if any, of these FEC
744	   schemes is appropriate for use in the WebRTC context.  Accordingly,
745	   this memo makes no recommendation on the choice of block-based FEC
746	   for WebRTC use.

748	7.  WebRTC Use of RTP: Rate Control and Media Adaptation

750	   WebRTC will be used in heterogeneous network environments using a
751	   variety set of link technologies, including both wired and wireless
752	   links, to interconnect potentially large groups of users around the
753	   world.  As a result, the network paths between users can have widely
754	   varying one-way delays, available bit-rates, load levels, and traffic
755	   mixtures.  Individual end-points can open one or more RTP sessions to
756	   each participant in a WebRTC conference, and there can be several
757	   participants.  Each of these RTP sessions can contain different types
758	   of media, and the type of media, bit rate, and number of flows can be
759	   highly asymmetric.  Non-RTP traffic can share the network paths RTP
760	   flows.  Since the network environment is not predictable or stable,
761	   WebRTC endpoints MUST ensure that the RTP traffic they generate can
762	   adapt to match changes in the available network capacity.

764	   The quality of experience for users of WebRTC implementation is very
765	   dependent on effective adaptation of the media to the limitations of
766	   the network.  End-points have to be designed so they do not transmit
767	   significantly more data than the network path can support, except for
768	   very short time periods, otherwise high levels of network packet loss
769	   or delay spikes will occur, causing media quality degradation.  The
770	   limiting factor on the capacity of the network path might be the link
771	   bandwidth, or it might be competition with other traffic on the link
772	   (this can be non-WebRTC traffic, traffic due to other WebRTC flows,
773	   or even competition with other WebRTC flows in the same session).

775	   An effective media congestion control algorithm is therefore an
776	   essential part of the WebRTC framework.  However, at the time of this
777	   writing, there is no standard congestion control algorithm that can
778	   be used for interactive media applications such as WebRTC flows.
779	   Some requirements for congestion control algorithms for WebRTC
780	   sessions are discussed in [I-D.jesup-rtp-congestion-reqs], and it is
781	   expected that a future version of this memo will mandate the use of a
782	   congestion control algorithm that satisfies these requirements.

784	7.1.  Boundary Conditions and Circuit Breakers

786	   In the absence of a concrete congestion control algorithm, all WebRTC
787	   implementations MUST implement the RTP circuit breaker algorithm that
788	   is in described [I-D.ietf-avtcore-rtp-circuit-breakers].  The circuit
789	   breaker defines a conservative boundary condition for safe operation,
790	   chosen such that applications that trigger the circuit breaker will
791	   almost certainly be causing severe network congestion.  Any future
792	   RTP congestion control algorithms are expected to operate within the
793	   envelope allowed by the circuit breaker.

795	   The session establishment signalling will also necessarily establish
796	   boundaries to which the media bit-rate will conform.  The choice of
797	   media codecs provides upper- and lower-bounds on the supported bit-
798	   rates that the application can utilise to provide useful quality, and
799	   the packetization choices that exist.  In addition, the signalling
800	   channel can establish maximum media bit-rate boundaries using the SDP
801	   "b=AS:" or "b=CT:" lines, and the RTP/AVPF Temporary Maximum Media
802	   Stream Bit Rate (TMMBR) Requests (see Section 5.1.6 of this memo).
803	   The combination of media codec choice and signalled bandwidth limits
804	   SHOULD be used to limit traffic based on known bandwidth limitations,
805	   for example the capacity of the edge links, to the extent possible.

807	7.2.  RTCP Extensions for Congestion Control

809	   As described in Section 5.1.6, the Temporary Maximum Media Stream Bit
810	   Rate (TMMBR) request is supported by WebRTC senders.  This request
811	   can be used by a media receiver to impose limitations on the media
812	   sender based on the receiver's determined bit-rate limitations, to
813	   provide a limited means of congestion control.

815	   (tbd: What other RTP/RTCP extensions are needed?)

817	   With proprietary congestion control algorithms issues can arise when
818	   different algorithms and implementations interact in a communication
819	   session.  If the different implementations have made different
820	   choices in regards to the type of adaptation, for example one sender
821	   based, and one receiver based, then one could end up in situation
822	   where one direction is dual controlled, when the other direction is
823	   not controlled.

825	   (tbd: How to ensure that both paths and sender and receiver based
826	   solutions can interact)

828	7.3.  RTCP Limitations for Congestion Control

830	   Experience with the congestion control algorithms of TCP [RFC5681],
831	   TFRC [RFC5348], and DCCP [RFC4341], [RFC4342], [RFC4828], has shown
832	   that feedback on packet arrivals needs to be sent roughly once per
833	   round trip time.  We note that the real-time media traffic might not
834	   have to adapt to changing path conditions as rapidly as needed for
835	   the elastic applications TCP was designed for, but frequent feedback
836	   is still needed to allow the congestion control algorithm to track
837	   the path dynamics.

839	   The total RTCP bandwidth is limited in its transmission rate to a
840	   fraction of the RTP traffic (by default 5%).  RTCP packets are larger
841	   than, e.g., TCP ACKs (even when non-compound RTCP packets are used).
842	   The RTP media stream bit rate thus limits the maximum feedback rate
843	   as a function of the mean RTCP packet size.

845	   Interactive communication might not be able to afford waiting for
846	   packet losses to occur to indicate congestion, because an increase in
847	   play out delay due to queuing (most prominent in wireless networks)
848	   can easily lead to packets being dropped due to late arrival at the
849	   receiver.  Therefore, more sophisticated cues might need to be
850	   reported -- to be defined in a suitable congestion control framework
851	   as noted above -- which, in turn, increase the report size again.
852	   For example, different RTCP XR report blocks (jointly) provide the
853	   necessary details to implement a variety of congestion control
854	   algorithms, but the (compound) report size grows quickly.

856	   In group communication, the share of RTCP bandwidth needs to be
857	   shared by all group members, reducing the capacity and thus the
858	   reporting frequency per node.

860	   Example: assuming 512 kbit/s video yields 3200 bytes/s RTCP
861	   bandwidth, split across two entities in a point-to-point session.  An
862	   endpoint could thus send a report of 100 bytes about every 70ms or
863	   for every other frame in a 30 fps video.

865	7.4.  Congestion Control Interoperability With Legacy Systems

867	   There are legacy implementations that do not implement RTCP, and
868	   hence do not provide any congestion feedback.  Congestion control
869	   cannot be performed with these end-points.  WebRTC implementations
870	   that need to interwork with such end-points MUST limit their
871	   transmission to a low rate, equivalent to a VoIP call using a low
872	   bandwidth codec, that is unlikely to cause any significant
873	   congestion.

875	   When interworking with legacy implementations that support RTCP using
876	   the RTP/AVP profile [RFC3551], congestion feedback is provided in
877	   RTCP RR packets every few seconds.  Implementations that have to
878	   interwork with such end-points MUST ensure that they keep within the
879	   RTP circuit breaker [I-D.ietf-avtcore-rtp-circuit-breakers]
880	   constraints to limit the congestion they can cause.

882	   If a legacy end-point supports RTP/AVPF, this enables negotiation of
883	   important parameters for frequent reporting, such as the "trr-int"
884	   parameter, and the possibility that the end-point supports some
885	   useful feedback format for congestion control purpose such as TMMBR
886	   [RFC5104].  Implementations that have to interwork with such end-
887	   points MUST ensure that they stay within the RTP circuit breaker
888	   [I-D.ietf-avtcore-rtp-circuit-breakers] constraints to limit the
889	   congestion they can cause, but might find that they can achieve
890	   better congestion response depending on the amount of feedback that
891	   is available.

893	8.  WebRTC Use of RTP: Performance Monitoring

895	   RTCP does contains a basic set of RTP flow monitoring metrics like
896	   packet loss and jitter.  There are a number of extensions that could
897	   be included in the set to be supported.  However, in most cases which
898	   RTP monitoring that is needed depends on the application, which makes
899	   it difficult to select which to include when the set of applications
900	   is very large.

902	   Exposing some metrics in the WebRTC API needs to be considered
903	   allowing the application to gather the measurements of interest.
904	   However, security implications for the different data sets exposed
905	   will need to be considered in this.

907	   (tbd: If any RTCP XR metrics need to be added is still an open
908	   question, but possible to extend at a later stage)

910	9.  WebRTC Use of RTP: Future Extensions

912	   It is possible that the core set of RTP protocols and RTP extensions
913	   specified in this memo will prove insufficient for the future needs
914	   of WebRTC applications.  In this case, future updates to this memo
915	   MUST be made following the Guidelines for Writers of RTP Payload
916	   Format Specifications [RFC2736] and Guidelines for Extending the RTP
917	   Control Protocol [RFC5968], and SHOULD take into account any future
918	   guidelines for extending RTP and related protocols that have been
919	   developed.

921	   Authors of future extensions are urged to consider the wide range of
922	   environments in which RTP is used when recommending extensions, since
923	   extensions that are applicable in some scenarios can be problematic
924	   in others.  Where possible, the WebRTC framework will adopt RTP
925	   extensions that are of general utility, to enable easy implementation
926	   of a gateway to other applications using RTP, rather than adopt
927	   mechanisms that are narrowly targeted at specific WebRTC use cases.

929	10.  Signalling Considerations

931	   RTP is built with the assumption of an external signalling channel
932	   that can be used to configure the RTP sessions and their features.
933	   The basic configuration of an RTP session consists of the following
934	   parameters:

936	   RTP Profile:  The name of the RTP profile to be used in session.  The
937	      RTP/AVP [RFC3551] and RTP/AVPF [RFC4585] profiles can interoperate
938	      on basic level, as can their secure variants RTP/SAVP [RFC3711]
939	      and RTP/SAVPF [RFC5124].  The secure variants of the profiles do
940	      not directly interoperate with the non-secure variants, due to the
941	      presence of additional header fields in addition to any
942	      cryptographic transformation of the packet content.  As WebRTC
943	      requires the usage of the RTP/SAVPF profile this can be inferred
944	      as there is only a single profile, but in SDP this is still
945	      information that has to be signalled.  Interworking functions
946	      might transform this into RTP/SAVP for a legacy use case by
947	      indicating to the WebRTC end-point a RTP/SAVPF end-point and
948	      limiting the usage of the a=rtcp attribute to indicate a trr-int
949	      value of 4 seconds.

951	   Transport Information:  Source and destination IP address(s) and
952	      ports for RTP and RTCP MUST be signalled for each RTP session.  In
953	      WebRTC these transport addresses will be provided by ICE that
954	      signals candidates and arrives at nominated candidate address
955	      pairs.  If RTP and RTCP multiplexing [RFC5761] is to be used, such
956	      that a single port is used for RTP and RTCP flows, this MUST be
957	      signalled (see Section 4.5).  If several RTP sessions are to be
958	      multiplexed onto a single transport layer flow, this MUST also be
959	      signalled (see Section 4.4).

961	   RTP Payload Types, media formats, and media format
962	   parameters:  The mapping between media type names (and hence the RTP
963	      payload formats to be used) and the RTP payload type numbers MUST
964	      be signalled.  Each media type MAY also have a number of media
965	      type parameters that MUST also be signalled to configure the codec
966	      and RTP payload format (the "a=fmtp:" line from SDP).

968	   RTP Extensions:  The RTP extensions to be used SHOULD be agreed upon,
969	      including any parameters for each respective extension.  At the
970	      very least, this will help avoiding using bandwidth for features
971	      that the other end-point will ignore.  But for certain mechanisms
972	      there is requirement for this to happen as interoperability
973	      failure otherwise happens.

975	   RTCP Bandwidth:  Support for exchanging RTCP Bandwidth values to the
976	      end-points will be necessary.  This SHALL be done as described in
977	      "Session Description Protocol (SDP) Bandwidth Modifiers for RTP
978	      Control Protocol (RTCP) Bandwidth" [RFC3556], or something
979	      semantically equivalent.  This also ensures that the end-points
980	      have a common view of the RTCP bandwidth, this is important as too
981	      different view of the bandwidths can lead to failure to
982	      interoperate.

984	   These parameters are often expressed in SDP messages conveyed within
985	   an offer/answer exchange.  RTP does not depend on SDP or on the
986	   offer/answer model, but does require all the necessary parameters to
987	   be agreed upon, and provided to the RTP implementation.  We note that
988	   in the WebRTC context it will depend on the signalling model and API
989	   how these parameters need to be configured but they will be need to
990	   either set in the API or explicitly signalled between the peers.

992	11.  WebRTC API Considerations

994	   The WebRTC API and its media function have the concept of a WebRTC
995	   MediaStream that consists of zero or more tracks.  A track is an
996	   individual stream of media from any type of media source like a
997	   microphone or a camera, but also conceptual sources, like a audio mix
998	   or a video composition, are possible.  The tracks within a WebRTC
999	   MediaStream are expected to be synchronized.

1001	   A track correspond to the media received with one particular SSRC.
1002	   There might be additional SSRCs associated with that SSRC, like for
1003	   RTP retransmission or Forward Error Correction.  However, one SSRC
1004	   will identify an RTP media stream and its timing.

1006	   As a result, a WebRTC MediaStream is a collection of SSRCs carrying
1007	   the different media included in the synchronised aggregate.
1008	   Therefore, also the synchronization state associated with the
1009	   included SSRCs are part of concept.  It is important to consider that
1010	   there can be multiple different WebRTC MediaStreams containing a
1011	   given Track (SSRC).  To avoid unnecessary duplication of media at the
1012	   transport level in such cases, a need arises for a binding defining
1013	   which WebRTC MediaStreams a given SSRC is associated with at the
1014	   signalling level.

1016	   A proposal for how the binding between WebRTC MediaStreams and SSRC
1017	   can be done is specified in "Cross Session Stream Identification in
1018	   the Session Description Protocol" [I-D.alvestrand-rtcweb-msid].

1020	   (tbd: This text needs to be improved and achieved consensus on.
1021	   Interim meeting in June 2012 shows large differences in opinions.)

1023	   (tbd: It is an open question whether these considerations are best
1024	   discussed in this draft, in the W3C WebRTC API spec, or elsewhere.

1026	12.  RTP Implementation Considerations

1028	   The following discussion provides some guidance on the implementation
1029	   of the RTP features described in this memo.  The focus is on a WebRTC
1030	   end-point implementation perspective, and while some mention is made
1031	   of the behaviour of middleboxes, that is not the focus of this memo.

1033	12.1.  RTP Sessions and PeerConnections

1035	   An RTP session is an association among RTP nodes, which have a single
1036	   shared SSRC space.  An RTP session can include a large number of end-
1037	   points and nodes, each sourcing, sinking, manipulating, or reporting
1038	   on the RTP media streams being sent within the RTP session.

1040	   A PeerConnection is a point-to-point association between an end-point
1041	   and some other peer node.  That peer node can be either an end-point
1042	   or a centralized processing node of some type.  Hence, an RTP session
1043	   can terminate immediately at the far end of a PeerConnection, or it
1044	   might continue as further discussed below for multiparty sessions
1045	   (Section 12.3) and sessions with multiple end points (Section 12.7).

1047	   A PeerConnection can contain one or more RTP sessions, depending on
1048	   how it is set up, and how many UDP flows it uses.  A common usage has
1049	   been to have one RTP session per media type, e.g. one for audio and
1050	   one for video, each sent over a different UDP flow.  However, the
1051	   default usage in WebRTC will be to use one RTP session for all media
1052	   types, with RTP and RTCP multiplexing (Section 4.5) also mandated.
1053	   This RTP session then uses only one UDP flow.  However, for legacy
1054	   interworking and flow-based network prioritization (Section 12.9), a
1055	   WebRTC end-point needs to support a mode of operation where one RTP
1056	   session per media type is used.  Currently, each RTP session has to
1057	   use its own UDP flow in this case, however it might be possible to
1058	   multiplex several RTP sessions over a single UDP flow, see
1059	   Section 4.4.

1061	   The multi-unicast- or mesh-based multi-party topology (Figure 1) is a
1062	   good example for this section as it concerns the relation between RTP
1063	   sessions and PeerConnections.  In this topology, each participant
1064	   sends individual unicast RTP/UDP/IP flows to each of the other
1065	   participants using independent PeerConnections in a full mesh.  This
1066	   topology has the benefit of not requiring central nodes.  The
1067	   downside is that it increases the used bandwidth at each sender by
1068	   requiring one copy of the RTP media streams for each participant that
1069	   are part of the same session beyond the sender itself.  Hence, this
1070	   topology is limited to scenarios with few participants unless the
1071	   media is very low bandwidth.

1073	                              +---+      +---+
1074	                              | A |<---->| B |
1075	                              +---+      +---+
1076	                                ^         ^
1077	                                 \       /
1078	                                  \     /
1079	                                   v   v
1080	                                   +---+
1081	                                   | C |
1082	                                   +---+

1084	                          Figure 1: Multi-unicast

1086	   The multi-unicast topology could be implemented as a single RTP
1087	   session, spanning multiple peer-to-peer transport layer connections,
1088	   or as several pairwise RTP sessions, one between each pair of peers.
1089	   To maintain a coherent mapping between the relation between RTP
1090	   sessions and PeerConnections we recommend that one implements this as
1091	   individual RTP sessions.  The only downside is that end-point A will
1092	   not learn of the quality of any transmission happening between B and
1093	   C based on RTCP.  This has not been seen as a significant downside as
1094	   no one has yet seen a clear need for why A would need to know about
1095	   the B's and C's communication.  An advantage of using separate RTP
1096	   sessions is that it enables using different media bit-rates to the
1097	   different peers, thus not forcing B to endure the same quality
1098	   reductions if there are limitations in the transport from A to C as C
1099	   will.

1101	12.2.  Multiple Sources

1103	   A WebRTC end-point might have multiple cameras, microphones or audio
1104	   inputs and thus a single end-point can source multiple RTP media
1105	   streams of the same media type concurrently.  Even if an end-point
1106	   does not have multiple media sources of the same media type it has to
1107	   support transmission using multiple SSRCs concurrently in the same
1108	   RTP session.  This is due to the requirement on an WebRTC end-point
1109	   to support multiple media types in one RTP session.  For example, one
1110	   audio and one video source can result in the end-point sending with
1111	   two different SSRCs in the same RTP session.  As multi-party
1112	   conferences are supported, as discussed below in Section 12.3, a
1113	   WebRTC end-point will need to be capable of receiving, decoding and
1114	   play out multiple RTP media streams of the same type concurrently.

1116	   tbd: Are any mechanism needed to signal limitations in the number of
1117	   active SSRC that an end-point can handle?

1119	12.3.  Multiparty

1121	   There are numerous situations and clear use cases for WebRTC
1122	   supporting RTP sessions supporting multi-party.  This can be realized
1123	   in a number of ways using a number of different implementation
1124	   strategies.  In the following, the focus is on the different set of
1125	   WebRTC end-point requirements that arise from different sets of
1126	   multi-party topologies.

1128	   The multi-unicast mesh (Figure 1)-based multi-party topology
1129	   discussed above provides a non-centralized solution but can incur a
1130	   heavy tax on the end-points' outgoing paths.  It can also consume
1131	   large amount of encoding resources if each outgoing stream is
1132	   specifically encoded.  If an encoding is transmitted to multiple
1133	   parties, as in some implementations of the mesh case, a requirement
1134	   on the end-point becomes to be able to create RTP media streams
1135	   suitable for multiple destinations requirements.  These requirements
1136	   can both be dependent on transport path and the different end-points
1137	   preferences related to play out of the media.

1139	                    +---+      +------------+      +---+
1140	                    | A |<---->|            |<---->| B |
1141	                    +---+      |            |      +---+
1142	                               |   Mixer    |
1143	                    +---+      |            |      +---+
1144	                    | C |<---->|            |<---->| D |
1145	                    +---+      +------------+      +---+

1147	                Figure 2: RTP Mixer with Only Unicast Paths

1149	   A Mixer (Figure 2) is an RTP end-point that optimizes the
1150	   transmission of RTP media streams from certain perspectives, either
1151	   by only sending some of the received RTP media stream to any given
1152	   receiver or by providing a combined RTP media stream out of a set of
1153	   contributing streams.  There are various methods of implementation as
1154	   discussed in Appendix A.3.  A common aspect is that these central
1155	   nodes can use a number of tools to control the media encoding
1156	   provided by a WebRTC end-point.  This includes functions like
1157	   requesting breaking the encoding chain and have the encoder produce a
1158	   so called Intra frame.  Another is limiting the bit-rate of a given
1159	   stream to better suit the mixer view of the multiple down-streams.
1160	   Others are controlling the most suitable frame-rate, picture
1161	   resolution, the trade-off between frame-rate and spatial quality.

1163	   A mixer gets a significant responsibility to correctly perform
1164	   congestion control, source identification, manage synchronization
1165	   while providing the application with suitable media optimizations.

1167	   Mixers also need to be trusted nodes when it comes to security as it
1168	   manipulates either RTP or the media itself before sending it on
1169	   towards the end-point(s), thus they need to be able to decrypt and
1170	   then encrypt it before sending it out.

1172	12.4.  SSRC Collision Detection

1174	   The RTP standard [RFC3550] requires any RTP implementation to have
1175	   support for detecting and handling SSRC collisions, i.e., resolve the
1176	   conflict when two different end-points use the same SSRC value.  This
1177	   requirement also applies to WebRTC end-points.  There are several
1178	   scenarios where SSRC collisions can occur.

1180	   In a point-to-point session where each SSRC is associated with either
1181	   of the two end-points and where the main media carrying SSRC
1182	   identifier will be announced in the signalling channel, a collision
1183	   is less likely to occur due to the information about used SSRCs
1184	   provided by Source-Specific SDP Attributes [RFC5576].  Still if both
1185	   end-points start uses an new SSRC identifier prior to having
1186	   signalled it to the peer and received acknowledgement on the
1187	   signalling message, there can be collisions.  The Source-Specific SDP
1188	   Attributes [RFC5576] contains no mechanism to resolve SSRC collisions
1189	   or reject a end-points usage of an SSRC.

1191	   There could also appear SSRC values that are not signalled.  This is
1192	   more likely than it appears as certain RTP functions need extra SSRCs
1193	   to provide functionality related to another (the "main") SSRC, for
1194	   example, SSRC multiplexed RTP retransmission [RFC4588].  In those
1195	   cases, an end-point can create a new SSRC that strictly doesn't need
1196	   to be announced over the signalling channel to function correctly on
1197	   both RTP and PeerConnection level.

1199	   The more likely case for SSRC collision is that multiple end-points
1200	   in a multiparty conference create new sources and signals those
1201	   towards the central server.  In cases where the SSRC/CSRC are
1202	   propagated between the different end-points from the central node
1203	   collisions can occur.

1205	   Another scenario is when the central node manages to connect an end-
1206	   point's PeerConnection to another PeerConnection the end-point
1207	   already has, thus forming a loop where the end-point will receive its
1208	   own traffic.  While is is clearly considered a bug, it is important
1209	   that the end-point is able to recognise and handle the case when it
1210	   occurs.  This case becomes even more problematic when media mixers,
1211	   and so on, are involved, where the stream received is a different
1212	   stream but still contains this client's input.

1214	   These SSRC/CSRC collisions can only be handled on RTP level as long
1215	   as the same RTP session is extended across multiple PeerConnections
1216	   by a RTP middlebox.  To resolve the more generic case where multiple
1217	   PeerConnections are interconnected, then identification of the media
1218	   source(s) part of a MediaStreamTrack being propagated across multiple
1219	   interconnected PeerConnection needs to be preserved across these
1220	   interconnections.

1222	12.5.  Contributing Sources and the CSRC List

1224	   RTP allows a mixer, or other RTP-layer middlebox, to combine media
1225	   flows from multiple sources to form a new media flow.  The RTP data
1226	   packets in that new flow will include a Contributing Source (CSRC)
1227	   list, indicating which original SSRCs contributed to the combined
1228	   packet.  As described in Section 4.1, implementations need to support
1229	   reception of RTP data packets containing a CSRC list and RTCP packets
1230	   that relate to sources present in the CSRC list.

1232	   The CSRC list can change on a packet-by-packet basis, depending on
1233	   the mixing operation being performed.  Knowledge of what sources
1234	   contributed to a particular RTP packet can be important if the user
1235	   interface indicates which participants are active in the session.
1236	   Changes in the CSRC list included in packets needs to be exposed to
1237	   the WebRTC application using some API, if the application is to be
1238	   able to track changes in session participation.  It is desirable to
1239	   map CSRC values back into WebRTC MediaStream identities as they cross
1240	   this API, to avoid exposing the SSRC/CSRC name space to JavaScript
1241	   applications.

1243	   If the mixer-to-client audio level extension [RFC6465] is being used
1244	   in the session (see Section 5.2.3), the information in the CSRC list
1245	   is augmented by audio level information for each contributing source.
1246	   This information can usefully be exposed in the user interface.

1248	   This memo does not require implementations to be able to add a CSRC
1249	   list to outgoing RTP packets.  It is expected that the any CSRC list
1250	   will be added by a mixer or other middlebox that performs in-network
1251	   processing of RTP streams.  If there is a desire to allow end-system
1252	   mixing, the requirement in Section 4.1 will need to be updated to
1253	   support setting the CSRC list in outgoing RTP data packets.

1255	12.6.  Media Synchronization

1257	   When an end-point sends media from more than one media source, it
1258	   needs to consider if (and which of) these media sources are to be
1259	   synchronized.  In RTP/RTCP, synchronisation is provided by having a
1260	   set of RTP media streams be indicated as coming from the same
1261	   synchronisation context and logical end-point by using the same CNAME
1262	   identifier.

1264	   The next provision is that the internal clocks of all media sources,
1265	   i.e., what drives the RTP timestamp, can be correlated to a system
1266	   clock that is provided in RTCP Sender Reports encoded in an NTP
1267	   format.  By correlating all RTP timestamps to a common system clock
1268	   for all sources, the timing relation of the different RTP media
1269	   streams, also across multiple RTP sessions can be derived at the
1270	   receiver and, if desired, the streams can be synchronized.  The
1271	   requirement is for the media sender to provide the correlation
1272	   information; it is up to the receiver to use it or not.

1274	12.7.  Multiple RTP End-points

1276	   Some usages of RTP beyond the recommend topologies result in that an
1277	   WebRTC end-point sending media in an RTP session out over a single
1278	   PeerConnection will receive receiver reports from multiple RTP
1279	   receivers.  Note that receiving multiple receiver reports is expected
1280	   because any RTP node that has multiple SSRCs has to report to the
1281	   media sender.  The difference here is that they are multiple nodes,
1282	   and thus will likely have different path characteristics.

1284	   RTP Mixers can create a situation where an end-point experiences a
1285	   situation in-between a session with only two end-points and multiple
1286	   end-points.  Mixers are expected to not forward RTCP reports
1287	   regarding RTP media streams across themselves.  This is due to the
1288	   difference in the RTP media streams provided to the different end-
1289	   points.  The original media source lacks information about a mixer's
1290	   manipulations prior to sending it the different receivers.  This
1291	   scenario also results in that an end-point's feedback or requests
1292	   goes to the mixer.  When the mixer can't act on this by itself, it is
1293	   forced to go to the original media source to fulfil the receivers
1294	   request.  This will not necessarily be explicitly visible any RTP and
1295	   RTCP traffic, but the interactions and the time to complete them will
1296	   indicate such dependencies.

1298	   The topologies in which an end-point receives receiver reports from
1299	   multiple other end-points are the centralized relay, multicast and an
1300	   end-point forwarding an RTP media stream.  Having multiple RTP nodes
1301	   receive an RTP flow and send reports and feedback about it has
1302	   several impacts.  As previously discussed (Section 12.3) any codec
1303	   control and rate control needs to be capable of merging the
1304	   requirements and preferences to provide a single best encoding
1305	   according to the situation RTP media stream.  Specifically, when it
1306	   comes to congestion control it needs to be capable of identifying the
1307	   different end-points to form independent congestion state information
1308	   for each different path.

1310	   Providing source authentication in multi-party scenarios is a
1311	   challenge.  In the mixer-based topologies, end-points source
1312	   authentication is based on, firstly, verifying that media comes from
1313	   the mixer by cryptographic verification and, secondly, trust in the
1314	   mixer to correctly identify any source towards the end-point.  In RTP
1315	   sessions where multiple end-points are directly visible to an end-
1316	   point, all end-points will have knowledge about each others' master
1317	   keys, and can thus inject packets claimed to come from another end-
1318	   point in the session.  Any node performing relay can perform non-
1319	   cryptographic mitigation by preventing forwarding of packets that
1320	   have SSRC fields that came from other end-points before.  For
1321	   cryptographic verification of the source SRTP would require
1322	   additional security mechanisms, like TESLA for SRTP [RFC4383].

1324	12.8.  Simulcast

1326	   This section discusses simulcast in the meaning of providing a node,
1327	   for example a Mixer, with multiple different encoded versions of the
1328	   same media source.  In the WebRTC context, this could be accomplished
1329	   in two ways.  One is to establish multiple PeerConnection all being
1330	   feed the same set of WebRTC MediaStreams.  Another method is to use
1331	   multiple WebRTC MediaStreams that are differently configured when it
1332	   comes to the media parameters.  This would result in that multiple
1333	   different RTP Media Streams (SSRCs) being in used with different
1334	   encoding based on the same media source (camera, microphone).

1336	   When intending to use simulcast it is important that this is made
1337	   explicit so that the end-points don't automatically try to optimize
1338	   away the different encodings and provide a single common version.
1339	   Thus, some explicit indications that the intent really is to have
1340	   different media encodings is likely needed.  It is to be noted that
1341	   it might be a central node, rather than an WebRTC end-point that
1342	   would benefit from receiving simulcast media sources.

1344	   tbd: How to perform simulcast needs to be determined and the
1345	   appropriate API or signalling for its usage needs to be defined.

1347	12.9.  Differentiated Treatment of Flows

1349	   There are use cases for differentiated treatment of RTP media
1350	   streams.  Such differentiation can happen at several places in the
1351	   system.  First of all is the prioritization within the end-point
1352	   sending the media, which controls, both which RTP media streams that
1353	   will be sent, and their allocation of bit-rate out of the current
1354	   available aggregate as determined by the congestion control.

1356	   It is expected that the WebRTC API will allow the application to
1357	   indicate relative priorities for different MediaStreamTracks.  These
1358	   priorities can then be used to influence the local RTP processing,
1359	   especially when it comes to congestion control response in how to
1360	   divide the available bandwidth between the RTP flows.  Any changes in
1361	   relative priority will also need to be considered for RTP flows that
1362	   are associated with the main RTP flows, such as RTP retransmission
1363	   streams and FEC.  The importance of such associated RTP traffic flows
1364	   is dependent on the media type and codec used, in regards to how
1365	   robust that codec is to packet loss.  However, a default policy might
1366	   to be to use the same priority for associated RTP flows as for the
1367	   primary RTP flow.

1369	   Secondly, the network can prioritize packet flows, including RTP
1370	   media streams.  Typically, differential treatment includes two steps,
1371	   the first being identifying whether an IP packet belongs to a class
1372	   that has to be treated differently, the second the actual mechanism
1373	   to prioritize packets.  This is done according to three methods;

1375	   DiffServ:  The end-point marks a packet with a DiffServ code point to
1376	      indicate to the network that the packet belongs to a particular
1377	      class.

1379	   Flow based:  Packets that need to be given a particular treatment are
1380	      identified using a combination of IP and port address.

1382	   Deep Packet Inspection:  A network classifier (DPI) inspects the
1383	      packet and tries to determine if the packet represents a
1384	      particular application and type that is to be prioritized.

1386	   Flow-based differentiation will provide the same treatment to all
1387	   packets within a flow, i.e., relative prioritization is not possible.
1388	   Moreover, if the resources are limited it might not be possible to
1389	   provide differential treatment compared to best-effort for all the
1390	   flows in a WebRTC application.  When flow-based differentiation is
1391	   available the WebRTC application needs to know about it so that it
1392	   can provide the separation of the RTP media streams onto different
1393	   UDP flows to enable a more granular usage of flow based
1394	   differentiation.  That way at least providing different
1395	   prioritization of audio and video if desired by application.

1397	   DiffServ assumes that either the end-point or a classifier can mark
1398	   the packets with an appropriate DSCP so that the packets are treated
1399	   according to that marking.  If the end-point is to mark the traffic
1400	   two requirements arise in the WebRTC context: 1) The WebRTC
1401	   application or browser has to know which DSCP to use and that it can
1402	   use them on some set of RTP media streams. 2) The information needs
1403	   to be propagated to the operating system when transmitting the
1404	   packet.  These issues are discussed in DSCP and other packet markings
1405	   for RTCWeb QoS [I-D.ietf-rtcweb-qos].

1407	   For packet based marking schemes it would be possible in the context
1408	   to mark individual RTP packets differently based on the relative
1409	   priority of the RTP payload.  For example video codecs that has I,P
1410	   and B pictures could prioritise any payloads carrying only B frames
1411	   less, as these are less damaging to loose.  But as default policy all
1412	   RTP packets related to a media stream ought to be provided with the
1413	   same prioritization.

1415	   It is also important to consider how RTCP packets associated with a
1416	   particular RTP media flow need to be marked.  RTCP compound packets
1417	   with Sender Reports (SR), ought to be marked with the same priority
1418	   as the RTP media flow itself, so the RTCP-based round-trip time (RTT)
1419	   measurements are done using the same flow priority as the media flow
1420	   experiences.  RTCP compound packets containing RR packet ought to be
1421	   sent with the priority used by the majority of the RTP media flows
1422	   reported on.  RTCP packets containing time-critical feedback packets
1423	   can use higher priority to improve the timeliness and likelihood of
1424	   delivery of such feedback.

1426	13.  Open Issues

1428	   This section contains a summary of the open issues or to be done
1429	   things noted in the document:

1431	   1.   Need to add references to the RTP payload format for the Video
1432	        Codec chosen in Section 4.3.

1434	   2.   The methods and solutions for RTP multiplexing over a single
1435	        transport is not yet finalized in Section 4.4.

1437	   3.   RTP congestion control algorithms will probably require some
1438	        feedback information to be conveyed in RTCP.  Are the tools that
1439	        are mandated by this memo sufficient, or do we need additional
1440	        information Section 7.2?

1442	   4.   RTP congestion control could be implementing using either a
1443	        sender-based algorithm or a receiver-based algorithm.  To ensure
1444	        interoperability, does this memo need to mandate which end is in
1445	        charge of congestion control for a path Section 7.2?

1447	   5.   Still open if any RTCP XR performance metrics are needed, as
1448	        discussed in Section 8.

1450	   6.   The API mapping to RTP level concepts has to be agreed and
1451	        documented in Section 11.

1453	   7.   An open question if any requirements are needed to agree and
1454	        limit the number of simultaneously used media sources (SSRCs)
1455	        within an RTP session.  See Section 12.2.

1457	   8.   The method for achieving simulcast of a media source has to be
1458	        decided as discussed in Section 12.8.

1460	   9.   Possible documentation of what support for differentiated
1461	        treatment that are needed on RTP level as the API and the
1462	        network level specification matures as discussed in
1463	        Section 12.9.

1465	   10.  Editing of Appendix A to remove redundancy between this and the
1466	        update of RTP Topologies
1467	        [I-D.westerlund-avtcore-rtp-topologies-update].

1469	14.  IANA Considerations

1471	   This memo makes no request of IANA.

1473	   Note to RFC Editor: this section is to be removed on publication as
1474	   an RFC.

1476	15.  Security Considerations

1478	   The overall security architecture for WebRTC is described in
1479	   [I-D.ietf-rtcweb-security-arch], and security considerations for the
1480	   WebRTC framework are described in [I-D.ietf-rtcweb-security].  These
1481	   considerations apply to this memo also.

1483	   The security considerations of the RTP specification, the RTP/SAVPF
1484	   profile, and the various RTP/RTCP extensions and RTP payload formats
1485	   that form the complete protocol suite described in this memo apply.
1486	   We do not believe there are any new security considerations resulting
1487	   from the combination of these various protocol extensions.

1489	   The Extended Secure RTP Profile for Real-time Transport Control
1490	   Protocol (RTCP)-Based Feedback [RFC5124] (RTP/SAVPF) provides
1491	   handling of fundamental issues by offering confidentiality, integrity
1492	   and partial source authentication.  A mandatory to implement media
1493	   security solution is (tbd).

1495	   RTCP packets convey a Canonical Name (CNAME) identifier that is used
1496	   to associate media flows that need to be synchronised across related
1497	   RTP sessions.  Inappropriate choice of CNAME values can be a privacy
1498	   concern, since long-term persistent CNAME identifiers can be used to
1499	   track users across multiple WebRTC calls.  Section 4.9 of this memo
1500	   provides guidelines for generation of untraceable CNAME values that
1501	   alleviate this risk.

1503	   The guidelines in [RFC6562] apply when using variable bit rate (VBR)
1504	   audio codecs such as Opus (see Section 4.3 for discussion of mandated
1505	   audio codecs).  These guidelines in [RFC6562] also apply, but are of
1506	   lesser importance, when using the client-to-mixer audio level header
1507	   extensions (Section 5.2.2) or the mixer-to-client audio level header
1508	   extensions (Section 5.2.3).

1510	16.  Acknowledgements

1512	   The authors would like to thank Harald Alvestrand, Cary Bran, Charles
1513	   Eckel and Cullen Jennings for valuable feedback.

1515	17.  References

1517	17.1.  Normative References

1519	   [I-D.ietf-avtcore-6222bis]
1520	              Rescorla, E. and A. Begen, "Guidelines for Choosing RTP
1521	              Control Protocol (RTCP) Canonical Names (CNAMEs)",
1522	              draft-ietf-avtcore-6222bis-00 (work in progress),
1523	              December 2012.

1525	   [I-D.ietf-avtcore-avp-codecs]
1526	              Terriberry, T., "Update to Recommended Codecs for the AVP
1527	              RTP Profile", draft-ietf-avtcore-avp-codecs-00 (work in
1528	              progress), January 2013.

1530	   [I-D.ietf-avtcore-multi-media-rtp-session]
1531	              Westerlund, M., Perkins, C., and J. Lennox, "Multiple
1532	              Media Types in an RTP Session",
1533	              draft-ietf-avtcore-multi-media-rtp-session-01 (work in
1534	              progress), October 2012.

1536	   [I-D.ietf-avtcore-rtp-circuit-breakers]
1537	              Perkins, C. and V. Singh, "Multimedia Congestion Control:
1538	              Circuit Breakers for Unicast RTP Sessions",
1539	              draft-ietf-avtcore-rtp-circuit-breakers-02 (work in
1540	              progress), February 2013.

1542	   [I-D.ietf-avtcore-srtp-encrypted-header-ext]
1543	              Lennox, J., "Encryption of Header Extensions in the Secure
1544	              Real-Time Transport Protocol (SRTP)",
1545	              draft-ietf-avtcore-srtp-encrypted-header-ext-05 (work in
1546	              progress), February 2013.

1548	   [I-D.ietf-avtext-multiple-clock-rates]
1549	              Petit-Huguenin, M. and G. Zorn, "Support for Multiple
1550	              Clock Rates in an RTP Session",
1551	              draft-ietf-avtext-multiple-clock-rates-08 (work in
1552	              progress), November 2012.

1554	   [I-D.ietf-mmusic-sdp-bundle-negotiation]
1555	              Holmberg, C., Alvestrand, H., and C. Jennings,
1556	              "Multiplexing Negotiation Using Session Description
1557	              Protocol (SDP) Port Numbers",
1558	              draft-ietf-mmusic-sdp-bundle-negotiation-03 (work in
1559	              progress), February 2013.

1561	   [I-D.ietf-rtcweb-audio]
1562	              Valin, J. and C. Bran, "WebRTC Audio Codec and Processing
1563	              Requirements", draft-ietf-rtcweb-audio-01 (work in
1564	              progress), November 2012.

1566	   [I-D.ietf-rtcweb-overview]
1567	              Alvestrand, H., "Overview: Real Time Protocols for Brower-
1568	              based Applications", draft-ietf-rtcweb-overview-06 (work
1569	              in progress), February 2013.

1571	   [I-D.ietf-rtcweb-security]
1572	              Rescorla, E., "Security Considerations for RTC-Web",
1573	              draft-ietf-rtcweb-security-04 (work in progress),
1574	              January 2013.

1576	   [I-D.ietf-rtcweb-security-arch]
1577	              Rescorla, E., "RTCWEB Security Architecture",
1578	              draft-ietf-rtcweb-security-arch-06 (work in progress),
1579	              January 2013.

1581	   [I-D.westerlund-avtcore-transport-multiplexing]
1582	              Westerlund, M. and C. Perkins, "Multiple RTP Sessions on a
1583	              Single Lower-Layer Transport",
1584	              draft-westerlund-avtcore-transport-multiplexing-04 (work
1585	              in progress), October 2012.

1587	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1588	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1590	   [RFC2736]  Handley, M. and C. Perkins, "Guidelines for Writers of RTP
1591	              Payload Format Specifications", BCP 36, RFC 2736,
1592	              December 1999.

1594	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1595	              Jacobson, "RTP: A Transport Protocol for Real-Time
1596	              Applications", STD 64, RFC 3550, July 2003.

1598	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
1599	              Video Conferences with Minimal Control", STD 65, RFC 3551,
1600	              July 2003.

1602	   [RFC3556]  Casner, S., "Session Description Protocol (SDP) Bandwidth
1603	              Modifiers for RTP Control Protocol (RTCP) Bandwidth",
1604	              RFC 3556, July 2003.

1606	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
1607	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
1608	              RFC 3711, March 2004.

1610	   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
1611	              "Extended RTP Profile for Real-time Transport Control
1612	              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585,
1613	              July 2006.

1615	   [RFC4588]  Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
1616	              Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
1617	              July 2006.

1619	   [RFC4961]  Wing, D., "Symmetric RTP / RTP Control Protocol (RTCP)",
1620	              BCP 131, RFC 4961, July 2007.

1622	   [RFC5104]  Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
1623	              "Codec Control Messages in the RTP Audio-Visual Profile
1624	              with Feedback (AVPF)", RFC 5104, February 2008.

1626	   [RFC5124]  Ott, J. and E. Carrara, "Extended Secure RTP Profile for
1627	              Real-time Transport Control Protocol (RTCP)-Based Feedback
1628	              (RTP/SAVPF)", RFC 5124, February 2008.

1630	   [RFC5285]  Singer, D. and H. Desineni, "A General Mechanism for RTP
1631	              Header Extensions", RFC 5285, July 2008.

1633	   [RFC5506]  Johansson, I. and M. Westerlund, "Support for Reduced-Size
1634	              Real-Time Transport Control Protocol (RTCP): Opportunities
1635	              and Consequences", RFC 5506, April 2009.

1637	   [RFC5761]  Perkins, C. and M. Westerlund, "Multiplexing RTP Data and
1638	              Control Packets on a Single Port", RFC 5761, April 2010.

1640	   [RFC5764]  McGrew, D. and E. Rescorla, "Datagram Transport Layer
1641	              Security (DTLS) Extension to Establish Keys for the Secure
1642	              Real-time Transport Protocol (SRTP)", RFC 5764, May 2010.

1644	   [RFC6051]  Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP
1645	              Flows", RFC 6051, November 2010.

1647	   [RFC6464]  Lennox, J., Ivov, E., and E. Marocco, "A Real-time
1648	              Transport Protocol (RTP) Header Extension for Client-to-
1649	              Mixer Audio Level Indication", RFC 6464, December 2011.

1651	   [RFC6465]  Ivov, E., Marocco, E., and J. Lennox, "A Real-time
1652	              Transport Protocol (RTP) Header Extension for Mixer-to-
1653	              Client Audio Level Indication", RFC 6465, December 2011.

1655	   [RFC6562]  Perkins, C. and JM. Valin, "Guidelines for the Use of
1656	              Variable Bit Rate Audio with Secure RTP", RFC 6562,
1657	              March 2012.

1659	17.2.  Informative References

1661	   [I-D.alvestrand-rtcweb-msid]
1662	              Alvestrand, H., "Cross Session Stream Identification in
1663	              the Session Description Protocol",
1664	              draft-alvestrand-rtcweb-msid-02 (work in progress),
1665	              May 2012.

1667	   [I-D.ietf-avt-srtp-ekt]
1668	              Wing, D., McGrew, D., and K. Fischer, "Encrypted Key
1669	              Transport for Secure RTP", draft-ietf-avt-srtp-ekt-03
1670	              (work in progress), October 2011.

1672	   [I-D.ietf-rtcweb-qos]
1673	              Dhesikan, S., Druta, D., Jones, P., and J. Polk, "DSCP and
1674	              other packet markings for RTCWeb QoS",
1675	              draft-ietf-rtcweb-qos-00 (work in progress), October 2012.

1677	   [I-D.ietf-rtcweb-use-cases-and-requirements]
1678	              Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real-
1679	              Time Communication Use-cases and Requirements",
1680	              draft-ietf-rtcweb-use-cases-and-requirements-10 (work in
1681	              progress), December 2012.

1683	   [I-D.jesup-rtp-congestion-reqs]
1684	              Jesup, R. and H. Alvestrand, "Congestion Control
1685	              Requirements For Real Time Media",
1686	              draft-jesup-rtp-congestion-reqs-00 (work in progress),
1687	              March 2012.

1689	   [I-D.westerlund-avtcore-multiplex-architecture]
1690	              Westerlund, M., Burman, B., Perkins, C., and H.
1691	              Alvestrand, "Guidelines for using the Multiplexing
1692	              Features of RTP",
1693	              draft-westerlund-avtcore-multiplex-architecture-02 (work
1694	              in progress), July 2012.

1696	   [I-D.westerlund-avtcore-rtp-topologies-update]
1697	              Westerlund, M. and S. Wenger, "RTP Topologies",
1698	              draft-westerlund-avtcore-rtp-topologies-update-02 (work in
1699	              progress), February 2013.

1701	   [RFC4341]  Floyd, S. and E. Kohler, "Profile for Datagram Congestion
1702	              Control Protocol (DCCP) Congestion Control ID 2: TCP-like
1703	              Congestion Control", RFC 4341, March 2006.

1705	   [RFC4342]  Floyd, S., Kohler, E., and J. Padhye, "Profile for
1706	              Datagram Congestion Control Protocol (DCCP) Congestion
1707	              Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342,
1708	              March 2006.

1710	   [RFC4383]  Baugher, M. and E. Carrara, "The Use of Timed Efficient
1711	              Stream Loss-Tolerant Authentication (TESLA) in the Secure
1712	              Real-time Transport Protocol (SRTP)", RFC 4383,
1713	              February 2006.

1715	   [RFC4828]  Floyd, S. and E. Kohler, "TCP Friendly Rate Control
1716	              (TFRC): The Small-Packet (SP) Variant", RFC 4828,
1717	              April 2007.

1719	   [RFC5348]  Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
1720	              Friendly Rate Control (TFRC): Protocol Specification",
1721	              RFC 5348, September 2008.

1723	   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
1724	              Media Attributes in the Session Description Protocol
1725	              (SDP)", RFC 5576, June 2009.

1727	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
1728	              Control", RFC 5681, September 2009.

1730	   [RFC5968]  Ott, J. and C. Perkins, "Guidelines for Extending the RTP
1731	              Control Protocol (RTCP)", RFC 5968, September 2010.

1733	   [RFC6263]  Marjou, X. and A. Sollaud, "Application Mechanism for
1734	              Keeping Alive the NAT Mappings Associated with RTP / RTP
1735	              Control Protocol (RTCP) Flows", RFC 6263, June 2011.

1737	Appendix A.  Supported RTP Topologies

1739	   RTP supports both unicast and group communication, with participants
1740	   being connected using wide range of transport-layer topologies.  Some
1741	   of these topologies involve only the end-points, while others use RTP
1742	   translators and mixers to provide in-network processing.  Properties
1743	   of some RTP topologies are discussed in
1744	   [I-D.westerlund-avtcore-rtp-topologies-update], and we further
1745	   describe those expected to be useful for WebRTC in the following.  We
1746	   also goes into important RTP session aspects that the topology or
1747	   implementation variant can place on a WebRTC end-point.

1749	   This section includes RTP topologies beyond the RECOMMENDED ones.
1750	   This in an attempt to highlight the differences and the in many case
1751	   small differences in implementation to support a larger set of
1752	   possible topologies.

1754	   (tbd: This section needs reworking and clearer relation to
1755	   [I-D.westerlund-avtcore-rtp-topologies-update].)

1757	A.1.  Point to Point

1759	   The point-to-point RTP topology (Figure 3) is the simplest scenario
1760	   for WebRTC applications.  This is going to be very common for user to
1761	   user calls.

1763	                            +---+         +---+
1764	                            | A |<------->| B |
1765	                            +---+         +---+

1767	                         Figure 3: Point to Point

1769	   This being the basic one lets use the topology to high-light a couple
1770	   of details that are common for all RTP usage in the WebRTC context.
1771	   First is the intention to multiplex RTP and RTCP over the same UDP-
1772	   flow.  Secondly is the question of using only a single RTP session or
1773	   one per media type for legacy interoperability.  Thirdly is the
1774	   question of using multiple sender sources (SSRCs) per end-point.

1776	   Historically, RTP and RTCP have been run on separate UDP ports.  With
1777	   the increased use of Network Address/Port Translation (NAPT) this has
1778	   become problematic, since maintaining multiple NAT bindings can be
1779	   costly.  It also complicates firewall administration, since multiple
1780	   ports need to be opened to allow RTP traffic.  To reduce these costs
1781	   and session set-up times, support for multiplexing RTP data packets
1782	   and RTCP control packets on a single port [RFC5761] will be
1783	   supported.

1785	   In cases where there is only one type of media (e.g., a voice-only
1786	   call) this topology will be implemented as a single RTP session, with
1787	   bidirectional flows of RTP and RTCP packets, all then multiplexed
1788	   onto a single 5-tuple.  If multiple types of media are to be used
1789	   (e.g., audio and video), then each type media can be sent as a
1790	   separate RTP session using a different 5-tuple, allowing for separate
1791	   transport level treatment of each type of media.  Alternatively, all
1792	   types of media can be multiplexed onto a single 5-tuple as a single
1793	   RTP session, or as several RTP sessions if using a demultiplexing
1794	   shim.  Multiplexing different types of media onto a single 5-tuple
1795	   places some limitations on how RTP is used, as described in "RTP
1796	   Multiplexing Architecture"
1797	   [I-D.westerlund-avtcore-multiplex-architecture].  It is not expected
1798	   that these limitations will significantly affect the scenarios
1799	   targeted by WebRTC, but they can impact interoperability with legacy
1800	   systems.

1802	   An RTP session have good support for simultaneously transport
1803	   multiple media sources.  Each media source uses an unique SSRC
1804	   identifier and each SSRC has independent RTP sequence number and
1805	   timestamp spaces.  This is being utilized in WebRTC for several
1806	   cases.  One is to enable multiple media sources of the same type, an
1807	   end-point that has two video cameras can potentially transmit video
1808	   from both to its peer(s).  Another usage is when a single RTP session
1809	   is being used for both multiple media types, thus an end-point can
1810	   transmit both audio and video to the peer(s).  Thirdly to support
1811	   multi-party cases as will be discussed below support for multiple
1812	   SSRC of the same media type is needed.

1814	   Thus we can introduce a couple of different notations in the below
1815	   two alternate figures of a single peer connection in a point to point
1816	   set-up.  The first depicting a setup where the peer connection
1817	   established has two different RTP sessions, one for audio and one for
1818	   video.  The second one using a single RTP session.  In both cases A
1819	   has two video streams to send and one audio stream.  B has only one
1820	   audio and video stream.  These are used to illustrate the relation
1821	   between a peerConnection, the UDP flow(s), the RTP session(s) and the
1822	   SSRCs that will be used in the later cases also.  In the below
1823	   figures RTCP flows are not included.  They will flow bi-directionally
1824	   between any RTP session instances in the different nodes.

1826	            +-A-------------+                 +-B-------------+
1827	            | +-PeerC1------|                 |-PeerC1------+ |
1828	            | | +-UDP1------|                 |-UDP1------+ | |
1829	            | | | +-RTP1----|                 |-RTP1----+ | | |
1830	            | | | | +-Audio-|                 |-Audio-+ | | | |
1831	            | | | | |    AA1|---------------->|       | | | | |
1832	            | | | | |       |<----------------|BA1    | | | | |
1833	            | | | | +-------|                 |-------+ | | | |
1834	            | | | +---------|                 |---------+ | | |
1835	            | | +-----------|                 |-----------+ | |
1836	            | |             |                 |             | |
1837	            | | +-UDP2------|                 |-UDP2------+ | |
1838	            | | | +-RTP2----|                 |-RTP1----+ | | |
1839	            | | | | +-Video-|                 |-Video-+ | | | |
1840	            | | | | |    AV1|---------------->|       | | | | |
1841	            | | | | |    AV2|---------------->|       | | | | |
1842	            | | | | |       |<----------------|BV1    | | | | |
1843	            | | | | +-------|                 |-------+ | | | |
1844	            | | | +---------|                 |---------+ | | |
1845	            | | +-----------|                 |-----------+ | |
1846	            | +-------------|                 |-------------+ |
1847	            +---------------+                 +---------------+

1849	              Figure 4: Point to Point: Multiple RTP sessions

1851	   As can be seen above in the Point to Point: Multiple RTP sessions
1852	   (Figure 4) the single Peer Connection contains two RTP sessions over
1853	   different UDP flows UDP 1 and UDP 2, i.e. their 5-tuples will be
1854	   different, normally on source and destination ports.  The first RTP
1855	   session (RTP1) carries audio, one stream in each direction AA1 and
1856	   BA1.  The second RTP session contains two video streams from A (AV1
1857	   and AV2) and one from B to A (BV1).

1859	            +-A-------------+                 +-B-------------+
1860	            | +-PeerC1------|                 |-PeerC1------+ |
1861	            | | +-UDP1------|                 |-UDP1------+ | |
1862	            | | | +-RTP1----|                 |-RTP1----+ | | |
1863	            | | | | +-Audio-|                 |-Audio-+ | | | |
1864	            | | | | |    AA1|---------------->|       | | | | |
1865	            | | | | |       |<----------------|BA1    | | | | |
1866	            | | | | +-------|                 |-------+ | | | |
1867	            | | | |         |                 |         | | | |
1868	            | | | | +-Video-|                 |-Video-+ | | | |
1869	            | | | | |    AV1|---------------->|       | | | | |
1870	            | | | | |    AV2|---------------->|       | | | | |
1871	            | | | | |       |<----------------|BV1    | | | | |
1872	            | | | | +-------|                 |-------+ | | | |
1873	            | | | +---------|                 |---------+ | | |
1874	            | | +-----------|                 |-----------+ | |
1875	            | +-------------|                 |-------------+ |
1876	            +---------------+                 +---------------+

1878	               Figure 5: Point to Point: Single RTP session.

1880	   In (Figure 5) there is only a single UDP flow and RTP session (RTP1).
1881	   This RTP session carries a total of five (5) RTP media streams
1882	   (SSRCs).  From A to B there is Audio (AA1) and two video (AV1 and
1883	   AV2).  From B to A there is Audio (BA1) and Video (BV1).

1885	A.2.  Multi-Unicast (Mesh)

1887	   For small multiparty calls, it is practical to set up a multi-unicast
1888	   topology (Figure 6).  In this topology, each participant sends
1889	   individual unicast RTP/UDP/IP flows to each of the other participants
1890	   using independent PeerConnections in a full mesh.

1892	                              +---+      +---+
1893	                              | A |<---->| B |
1894	                              +---+      +---+
1895	                                ^         ^
1896	                                 \       /
1897	                                  \     /
1898	                                   v   v
1899	                                   +---+
1900	                                   | C |
1901	                                   +---+

1903	                          Figure 6: Multi-unicast

1905	   This topology has the benefit of not requiring central nodes.  The
1906	   downside is that it increases the used bandwidth at each sender by
1907	   requiring one copy of the RTP media streams for each participant that
1908	   are part of the same session beyond the sender itself.  Hence, this
1909	   topology is limited to scenarios with few participants unless the
1910	   media is very low bandwidth.  The multi-unicast topology could be
1911	   implemented as a single RTP session, spanning multiple peer-to-peer
1912	   transport layer connections, or as several pairwise RTP sessions, one
1913	   between each pair of peers.  To maintain a coherent mapping between
1914	   the relation between RTP sessions and PeerConnections we recommend
1915	   that one implements this as individual RTP sessions.  The only
1916	   downside is that end-point A will not learn of the quality of any
1917	   transmission happening between B and C based on RTCP.  This has not
1918	   been seen as a significant downside as now one has yet seen a need
1919	   for why A would need to know about the B's and C's communication.  An
1920	   advantage of using separate RTP sessions is that it enables using
1921	   different media bit-rates to the different peers, thus not forcing B
1922	   to endure the same quality reductions if there are limitations in the
1923	   transport from A to C as C will.

1925	        +-A------------------------+              +-B-------------+
1926	        |+---+       +-PeerC1------|              |-PeerC1------+ |
1927	        ||MIC|       | +-UDP1------|              |-UDP1------+ | |
1928	        |+---+       | | +-RTP1----|              |-RTP1----+ | | |
1929	        | |  +----+  | | | +-Audio-|              |-Audio-+ | | | |
1930	        | +->|ENC1|--+-+-+-+--->AA1|------------->|       | | | | |
1931	        | |  +----+  | | | |       |<-------------|BA1    | | | | |
1932	        | |          | | | +-------|              |-------+ | | | |
1933	        | |          | | +---------|              |---------+ | | |
1934	        | |          | +-----------|              |-----------+ | |
1935	        | |          +-------------|              |-------------+ |
1936	        | |                        |              |---------------+
1937	        | |                        |
1938	        | |                        |              +-C-------------+
1939	        | |          +-PeerC2------|              |-PeerC2------+ |
1940	        | |          | +-UDP2------|              |-UDP2------+ | |
1941	        | |          | | +-RTP2----|              |-RTP2----+ | | |
1942	        | |  +----+  | | | +-Audio-|              |-Audio-+ | | | |
1943	        | +->|ENC2|--+-+-+-+--->AA2|------------->|       | | | | |
1944	        |    +----+  | | | |       |<-------------|CA1    | | | | |
1945	        |            | | | +-------|              |-------+ | | | |
1946	        |            | | +---------|              |---------+ | | |
1947	        |            | +-----------|              |-----------+ | |
1948	        |            +-------------|              |-------------+ |
1949	        +--------------------------+              +---------------+

1951	            Figure 7: Session structure for Multi-Unicast Setup

1953	   Lets review how the RTP sessions looks from A's perspective by
1954	   considering both how the media is a handled and what PeerConnections
1955	   and RTP sessions that are set-up in Figure 7.  A's microphone is
1956	   captured and the digital audio can then be feed into two different
1957	   encoder instances each beeing associated with two different
1958	   PeerConnections (PeerC1 and PeerC2) each containing independent RTP
1959	   sessions (RTP1 and RTP2).  The SSRCs in each RTP session will be
1960	   completely independent and the media bit-rate produced by the encoder
1961	   can also be tuned to address any congestion control requirements
1962	   between A and B differently then for the path A to C.

1964	   For media encodings which are more resource consuming, like video,
1965	   one could expect that it will be common that end-points that are
1966	   resource constrained will use a different implementation strategy
1967	   where the encoder is shared between the different PeerConnections as
1968	   shown below Figure 8.
1969	        +-A----------------------+                 +-B-------------+
1970	        |+---+                   |                 |               |
1971	        ||CAM|     +-PeerC1------|                 |-PeerC1------+ |
1972	        |+---+     | +-UDP1------|                 |-UDP1------+ | |
1973	        |  |       | | +-RTP1----|                 |-RTP1----+ | | |
1974	        |  V       | | | +-Video-|                 |-Video-+ | | | |
1975	        |+----+    | | | |       |<----------------|BV1    | | | | |
1976	        ||ENC |----+-+-+-+--->AV1|---------------->|       | | | | |
1977	        |+----+    | | | +-------|                 |-------+ | | | |
1978	        |  |       | | +---------|                 |---------+ | | |
1979	        |  |       | +-----------|                 |-----------+ | |
1980	        |  |       +-------------|                 |-------------+ |
1981	        |  |                     |                 |---------------+
1982	        |  |                     |
1983	        |  |                     |                 +-C-------------+
1984	        |  |       +-PeerC2------|                 |-PeerC2------+ |
1985	        |  |       | +-UDP2------|                 |-UDP2------+ | |
1986	        |  |       | | +-RTP2----|                 |-RTP2----+ | | |
1987	        |  |       | | | +-Video-|                 |-Video-+ | | | |
1988	        |  +-------+-+-+-+--->AV2|---------------->|       | | | | |
1989	        |          | | | |       |<----------------|CV1    | | | | |
1990	        |          | | | +-------|                 |-------+ | | | |
1991	        |          | | +---------|                 |---------+ | | |
1992	        |          | +-----------|                 |-----------+ | |
1993	        |          +-------------|                 |-------------+ |
1994	        +------------------------+                 +---------------+

1996	               Figure 8: Single Encoder Multi-Unicast Setup

1998	   This will clearly save resources consumed by encoding but does
1999	   introduce the need for the end-point A to make decisions on how it
2000	   encodes the media so it suites delivery to both B and C. This is not
2001	   limited to congestion control, also preferred resolution to receive
2002	   based on dispaly area available is another aspect requiring
2003	   consideration.  The need for this type of decision logic does arise
2004	   in several different topologies and implementation.

2006	A.3.  Mixer Based

2008	   An mixer (Figure 9) is a centralised point that selects or mixes
2009	   content in a conference to optimise the RTP session so that each end-
2010	   point only needs connect to one entity, the mixer.  The mixer can
2011	   also reduce the bit-rate needed from the mixer down to a conference
2012	   participants as the media sent from the mixer to the end-point can be
2013	   optimised in different ways.  These optimisations include methods
2014	   like only choosing media from the currently most active speaker or
2015	   mixing together audio so that only one audio stream is needed instead
2016	   of 3 in the depicted scenario (Figure 9).

2018	                    +---+      +------------+      +---+
2019	                    | A |<---->|            |<---->| B |
2020	                    +---+      |            |      +---+
2021	                               |   Mixer    |
2022	                    +---+      |            |      +---+
2023	                    | C |<---->|            |<---->| D |
2024	                    +---+      +------------+      +---+

2026	                Figure 9: RTP Mixer with Only Unicast Paths

2028	   Mixers have two downsides, the first is that the mixer has to be a
2029	   trusted node as they either performs media operations or at least re-
2030	   packetize the media.  Both type of operations requires when using
2031	   SRTP that the mixer verifies integrity, decrypts the content, perform
2032	   its operation and form new RTP packets, encrypts and integrity
2033	   protect them.  This applies to all types of mixers described below.

2035	   The second downside is that all these operations and optimization of
2036	   the session requires processing.  How much depends on the
2037	   implementation as will become evident below.

2039	   The implementation of an mixer can take several different forms and
2040	   we will discuss the main themes available that doesn't break RTP.

2042	   Please note that a Mixer could also contain translator
2043	   functionalities, like a media transcoder to adjust the media bit-rate
2044	   or codec used on a particular RTP media stream.

2046	A.3.1.  Media Mixing

2048	   This type of mixer is one which clearly can be called RTP mixer is
2049	   likely the one that most thinks of when they hear the term mixer.
2050	   Its basic patter of operation is that it will receive the different
2051	   participants RTP media stream.  Select which that are to be included
2052	   in a media domain mix of the incoming RTP media streams.  Then create
2053	   a single outgoing stream from this mix.

2055	   Audio mixing is straight forward and commonly possible to do for a
2056	   number of participants.  Lets assume that you want to mix N number of
2057	   streams from different participants.  Then the mixer need to perform
2058	   decoding N times.  Then it needs to produce N or N+1 mixes, the
2059	   reasons that different mixes are needed are so that each contributing
2060	   source get a mix which don't contain themselves, as this would result
2061	   in an echo.  When N is lower than the number of all participants one
2062	   can produce a Mix of all N streams for the group that are curently
2063	   not included in the mix, thus N+1 mixes.  These audio streams are
2064	   then encoded again, RTP packetized and sent out.

2066	   Video can't really be "mixed" and produce something particular useful
2067	   for the users, however creating an composition out of the contributed
2068	   video streams can be done.  In fact it can be done in a number of
2069	   ways, tiling the different streams creating a chessboard, selecting
2070	   someone as more important and showing them large and a number of
2071	   other sources as smaller is another.  Also here one commonly need to
2072	   produce a number of different compositions so that the contributing
2073	   part doesn't need to see themselves.  Then the mixer re-encodes the
2074	   created video stream, RTP packetize it and send it out

2076	   The problem with media mixing is that it both consume large amount of
2077	   media processing and encoding resources.  The second is the quality
2078	   degradation created by decoding and re-encoding the RTP media stream.
2079	   Its advantage is that it is quite simplistic for the clients to
2080	   handle as they don't need to handle local mixing and composition.

2082	      +-A-------------+             +-MIXER--------------------------+
2083	      | +-PeerC1------|             |-PeerC1--------+                |
2084	      | | +-UDP1------|             |-UDP1--------+ |                |
2085	      | | | +-RTP1----|             |-RTP1------+ | |        +-----+ |
2086	      | | | | +-Audio-|             |-Audio---+ | | | +---+  |     | |
2087	      | | | | |    AA1|------------>|---------+-+-+-+-|DEC|->|     | |
2088	      | | | | |       |<------------|MA1 <----+ | | | +---+  |     | |
2089	      | | | | |       |             |(BA1+CA1)|\| | | +---+  |     | |
2090	      | | | | +-------|             |---------+ +-+-+-|ENC|<-| B+C | |
2091	      | | | +---------|             |-----------+ | | +---+  |     | |
2092	      | | +-----------|             |-------------+ |        |  M  | |
2093	      | +-------------|             |---------------+        |  E  | |
2094	      +---------------+             |                        |  D  | |
2095	                                    |                        |  I  | |
2096	      +-B-------------+             |                        |  A  | |
2097	      | +-PeerC2------|             |-PeerC2--------+        |     | |
2098	      | | +-UDP2------|             |-UDP2--------+ |        |  M  | |
2099	      | | | +-RTP2----|             |-RTP2------+ | |        |  I  | |
2100	      | | | | +-Audio-|             |-Audio---+ | | | +---+  |  X  | |
2101	      | | | | |    BA1|------------>|---------+-+-+-+-|DEC|->|  E  | |
2102	      | | | | |       |<------------|MA2 <----+ | | | +---+  |  R  | |
2103	      | | | | +-------|             |(BA1+CA1)|\| | | +---+  |     | |
2104	      | | | +---------|             |---------+ +-+-+-|ENC|<-| A+C | |
2105	      | | +-----------|             |-----------+ | | +---+  |     | |
2106	      | +-------------|             |-------------+ |        |     | |
2107	      +---------------+             |---------------+        |     | |
2108	                                    |                        |     | |
2109	      +-C-------------+             |                        |     | |
2110	      | +-PeerC3------|             |-PeerC3--------+        |     | |
2111	      | | +-UDP3------|             |-UDP3--------+ |        |     | |
2112	      | | | +-RTP3----|             |-RTP3------+ | |        |     | |
2113	      | | | | +-Audio-|             |-Audio---+ | | | +---+  |     | |
2114	      | | | | |    CA1|------------>|---------+-+-+-+-|DEC|->|     | |
2115	      | | | | |       |<------------|MA3 <----+ | | | +---+  |     | |
2116	      | | | | +-------|             |(BA1+CA1)|\| | | +---+  |     | |
2117	      | | | +---------|             |---------+ +-+-+-|ENC|<-| A+B | |
2118	      | | +-----------|             |-----------+ | | +---+  |     | |
2119	      | +-------------|             |-------------+ |        +-----+ |
2120	      +---------------+             |---------------+                |
2121	                                    +--------------------------------+

2123	            Figure 10: Session and SSRC details for Media Mixer

2125	   From an RTP perspective media mixing can be very straight forward as
2126	   can be seen in Figure 10.  The mixer present one SSRC towards the
2127	   peer client, e.g.  MA1 to Peer A, which is the media mix of the other
2128	   participants.  As each peer receives a different version produced by
2129	   the mixer there are no actual relation between the different RTP
2130	   sessions in the actual media or the transport level information.
2131	   There is however one connection between RTP1-RTP3 in this figure.  It
2132	   has to do with the SSRC space and the identity information.  When A
2133	   receives the MA1 stream which is a combination of BA1 and CA1 streams
2134	   in the other PeerConnections RTP could enable the mixer to include
2135	   CSRC information in the MA1 stream to identify the contributing
2136	   source BA1 and CA1.

2138	   The CSRC has in its turn utility in RTP extensions, like the in
2139	   Section 5.2.3 discussed Mixer to Client audio levels RTP header
2140	   extension [RFC6465].  If the SSRC from one PeerConnection are used as
2141	   CSRC in another PeerConnection then RTP1, RTP2 and RTP3 becomes one
2142	   joint session as they have a common SSRC space.  At this stage one
2143	   also need to consider which RTCP information one need to expose in
2144	   the different legs.  For the above situation commonly nothing more
2145	   than the Source Description (SDES) information and RTCP BYE for CSRC
2146	   need to be exposed.  The main goal would be to enable the correct
2147	   binding against the application logic and other information sources.
2148	   This also enables loop detection in the RTP session.

2150	A.3.1.1.  RTP Session Termination

2152	   There exist an possible implementation choice to have the RTP
2153	   sessions being separated between the different legs in the multi-
2154	   party communication session and only generate RTP media streams in
2155	   each without carrying on RTP/RTCP level any identity information
2156	   about the contributing sources.  This removes both the functionality
2157	   that CSRC can provide and the possibility to use any extensions that
2158	   build on CSRC and the loop detection.  It might appear a
2159	   simplification if SSRC collision would occur between two different
2160	   end-points as they can be avoided to be resolved and instead remapped
2161	   between the independent sessions if at all exposed.  However, SSRC/
2162	   CSRC remapping requires that SSRC/CSRC are never exposed to the
2163	   WebRTC JavaScript client to use as reference.  This as they only have
2164	   local importance if they are used on a multi-party session scope the
2165	   result would be mis-referencing.  Also SSRC collision handling will
2166	   still be needed as it can occur between the mixer and the end-point.

2168	   Session termination might appear to resolve some issues, it however
2169	   creates other issues that needs resolving, like loop detection,
2170	   identification of contributing sources and the need to handle mapped
2171	   identities and ensure that the right one is used towards the right
2172	   identities and never used directly between multiple end-points.

2174	A.3.2.  Media Switching

2176	   An RTP Mixer based on media switching avoids the media decoding and
2177	   encoding cycle in the mixer, but not the decryption and re-encryption
2178	   cycle as one rewrites RTP headers.  This both reduces the amount of
2179	   computational resources needed in the mixer and increases the media
2180	   quality per transmitted bit.  This is achieve by letting the mixer
2181	   have a number of SSRCs that represents conceptual or functional
2182	   streams the mixer produces.  These streams are created by selecting
2183	   media from one of the by the mixer received RTP media streams and
2184	   forward the media using the mixers own SSRCs.  The mixer can then
2185	   switch between available sources if that is needed by the concept for
2186	   the source, like currently active speaker.

2188	   To achieve a coherent RTP media stream from the mixer's SSRC the
2189	   mixer is forced to rewrite the incoming RTP packet's header.  First
2190	   the SSRC field has to be set to the value of the Mixer's SSRC.
2191	   Secondly, the sequence number is set to the next in the sequence of
2192	   outgoing packets it sent.  Thirdly the RTP timestamp value needs to
2193	   be adjusted using an offset that changes each time one switch media
2194	   source.  Finally depending on the negotiation the RTP payload type
2195	   value representing this particular RTP payload configuration might
2196	   have to be changed if the different PeerConnections have not arrived
2197	   on the same numbering for a given configuration.  This also requires
2198	   that the different end-points do support a common set of codecs,
2199	   otherwise media transcoding for codec compatibility is still needed.

2201	   Lets consider the operation of media switching mixer that supports a
2202	   video conference with six participants (A-F) where the two latest
2203	   speakers in the conference are shown to each participants.  Thus the
2204	   mixer has two SSRCs sending video to each peer.

2206	      +-A-------------+             +-MIXER--------------------------+
2207	      | +-PeerC1------|             |-PeerC1--------+                |
2208	      | | +-UDP1------|             |-UDP1--------+ |                |
2209	      | | | +-RTP1----|             |-RTP1------+ | |        +-----+ |
2210	      | | | | +-Video-|             |-Video---+ | | |        |     | |
2211	      | | | | |    AV1|------------>|---------+-+-+-+------->|     | |
2212	      | | | | |       |<------------|MV1 <----+-+-+-+-BV1----|     | |
2213	      | | | | |       |<------------|MV2 <----+-+-+-+-EV1----|     | |
2214	      | | | | +-------|             |---------+ | | |        |     | |
2215	      | | | +---------|             |-----------+ | |        |     | |
2216	      | | +-----------|             |-------------+ |        |  S  | |
2217	      | +-------------|             |---------------+        |  W  | |
2218	      +---------------+             |                        |  I  | |
2219	                                    |                        |  T  | |
2220	      +-B-------------+             |                        |  C  | |
2221	      | +-PeerC2------|             |-PeerC2--------+        |  H  | |
2222	      | | +-UDP2------|             |-UDP2--------+ |        |     | |
2223	      | | | +-RTP2----|             |-RTP2------+ | |        |  M  | |
2224	      | | | | +-Video-|             |-Video---+ | | |        |  A  | |
2225	      | | | | |    BV1|------------>|---------+-+-+-+------->|  T  | |
2226	      | | | | |       |<------------|MV3 <----+-+-+-+-AV1----|  R  | |
2227	      | | | | |       |<------------|MV4 <----+-+-+-+-EV1----|  I  | |
2228	      | | | | +-------|             |---------+ | | |        |  X  | |
2229	      | | | +---------|             |-----------+ | |        |     | |
2230	      | | +-----------|             |-------------+ |        |     | |
2231	      | +-------------|             |---------------+        |     | |
2232	      +---------------+             |                        |     | |
2233	                                    :                        :     : :
2234	                                    :                        :     : :
2235	      +-F-------------+             |                        |     | |
2236	      | +-PeerC6------|             |-PeerC6--------+        |     | |
2237	      | | +-UDP6------|             |-UDP6--------+ |        |     | |
2238	      | | | +-RTP6----|             |-RTP6------+ | |        |     | |
2239	      | | | | +-Video-|             |-Video---+ | | |        |     | |
2240	      | | | | |    CV1|------------>|---------+-+-+-+------->|     | |
2241	      | | | | |       |<------------|MV11 <---+-+-+-+-AV1----|     | |
2242	      | | | | |       |<------------|MV12 <---+-+-+-+-EV1----|     | |
2243	      | | | | +-------|             |---------+ | | |        |     | |
2244	      | | | +---------|             |-----------+ | |        |     | |
2245	      | | +-----------|             |-------------+ |        +-----+ |
2246	      | +-------------|             |---------------+                |
2247	      +---------------+             +--------------------------------+

2249	                   Figure 11: Media Switching RTP Mixer

2251	   The Media Switching RTP mixer can similar to the Media Mixing one
2252	   reduce the bit-rate needed towards the different peers by selecting
2253	   and switching in a sub-set of RTP media streams out of the ones it
2254	   receives from the conference participations.

2256	   To ensure that a media receiver can correctly decode the RTP media
2257	   stream after a switch, it becomes necessary to ensure for state
2258	   saving codecs that they start from default state at the point of
2259	   switching.  Thus one common tool for video is to request that the
2260	   encoding creates an intra picture, something that isn't dependent on
2261	   earlier state.  This can be done using Full Intra Request RTCP codec
2262	   control message as discussed in Section 5.1.1.

2264	   Also in this type of mixer one could consider to terminate the RTP
2265	   sessions fully between the different PeerConnection.  The same
2266	   arguments and considerations as discussed in Appendix A.3.1.1 applies
2267	   here.

2269	A.3.3.  Media Projecting

2271	   Another method for handling media in the RTP mixer is to project all
2272	   potential sources (SSRCs) into a per end-point independent RTP
2273	   session.  The mixer can then select which of the potential sources
2274	   that are currently actively transmitting media, despite that the
2275	   mixer in another RTP session receives media from that end-point.
2276	   This is similar to the media switching Mixer but have some important
2277	   differences in RTP details.

2279	      +-A-------------+             +-MIXER--------------------------+
2280	      | +-PeerC1------|             |-PeerC1--------+                |
2281	      | | +-UDP1------|             |-UDP1--------+ |                |
2282	      | | | +-RTP1----|             |-RTP1------+ | |        +-----+ |
2283	      | | | | +-Video-|             |-Video---+ | | |        |     | |
2284	      | | | | |    AV1|------------>|---------+-+-+-+------->|     | |
2285	      | | | | |       |<------------|BV1 <----+-+-+-+--------|     | |
2286	      | | | | |       |<------------|CV1 <----+-+-+-+--------|     | |
2287	      | | | | |       |<------------|DV1 <----+-+-+-+--------|     | |
2288	      | | | | |       |<------------|EV1 <----+-+-+-+--------|     | |
2289	      | | | | |       |<------------|FV1 <----+-+-+-+--------|     | |
2290	      | | | | +-------|             |---------+ | | |        |     | |
2291	      | | | +---------|             |-----------+ | |        |     | |
2292	      | | +-----------|             |-------------+ |        |  S  | |
2293	      | +-------------|             |---------------+        |  W  | |
2294	      +---------------+             |                        |  I  | |
2295	                                    |                        |  T  | |
2296	      +-B-------------+             |                        |  C  | |
2297	      | +-PeerC2------|             |-PeerC2--------+        |  H  | |
2298	      | | +-UDP2------|             |-UDP2--------+ |        |     | |
2299	      | | | +-RTP2----|             |-RTP2------+ | |        |  M  | |
2300	      | | | | +-Video-|             |-Video---+ | | |        |  A  | |
2301	      | | | | |    BV1|------------>|---------+-+-+-+------->|  T  | |
2302	      | | | | |       |<------------|AV1 <----+-+-+-+--------|  R  | |
2303	      | | | | |       |<------------|CV1 <----+-+-+-+--------|  I  | |
2304	      | | | | |       | :    :    : |: :  : : : : : :  :  : :|  X  | |
2305	      | | | | |       |<------------|FV1 <----+-+-+-+--------|     | |
2306	      | | | | +-------|             |---------+ | | |        |     | |
2307	      | | | +---------|             |-----------+ | |        |     | |
2308	      | | +-----------|             |-------------+ |        |     | |
2309	      | +-------------|             |---------------+        |     | |
2310	      +---------------+             |                        |     | |
2311	                                    :                        :     : :
2312	                                    :                        :     : :
2313	      +-F-------------+             |                        |     | |
2314	      | +-PeerC6------|             |-PeerC6--------+        |     | |
2315	      | | +-UDP6------|             |-UDP6--------+ |        |     | |
2316	      | | | +-RTP6----|             |-RTP6------+ | |        |     | |
2317	      | | | | +-Video-|             |-Video---+ | | |        |     | |
2318	      | | | | |    CV1|------------>|---------+-+-+-+------->|     | |
2319	      | | | | |       |<------------|AV1 <----+-+-+-+--------|     | |
2320	      | | | | |       | :    :    : |: :  : : : : : :  :  : :|     | |
2321	      | | | | |       |<------------|EV1 <----+-+-+-+--------|     | |
2322	      | | | | +-------|             |---------+ | | |        |     | |
2323	      | | | +---------|             |-----------+ | |        |     | |
2324	      | | +-----------|             |-------------+ |        +-----+ |
2325	      | +-------------|             |---------------+                |
2326	      +---------------+             +--------------------------------+
2327	                     Figure 12: Media Projecting Mixer

2329	   So in this six participant conference depicted above in (Figure 12)
2330	   one can see that end-point A will in this case be aware of 5 incoming
2331	   SSRCs, BV1-FV1.  If this mixer intend to have the same behavior as in
2332	   Appendix A.3.2 where the mixer provides the end-points with the two
2333	   latest speaking end-points, then only two out of these five SSRCs
2334	   will concurrently transmit media to A. As the mixer selects which
2335	   source in the different RTP sessions that transmit media to the end-
2336	   points each RTP media stream will require some rewriting when being
2337	   projected from one session into another.  The main thing is that the
2338	   sequence number will need to be consecutively incremented based on
2339	   the packet actually being transmitted in each RTP session.  Thus the
2340	   RTP sequence number offset will change each time a source is turned
2341	   on in RTP session.

2343	   As the RTP sessions are independent the SSRC numbers used can be
2344	   handled independently also thus working around any SSRC collisions by
2345	   having remapping tables between the RTP sessions.  However the
2346	   related WebRTC MediaStream signalling need to be correspondingly
2347	   changed to ensure consistent WebRTC MediaStream to SSRC mappings
2348	   between the different PeerConnections and the same comment that
2349	   higher functions MUST NOT use SSRC as references to RTP media streams
2350	   applies also here.

2352	   The mixer will also be responsible to act on any RTCP codec control
2353	   requests coming from an end-point and decide if it can act on it
2354	   locally or needs to translate the request into the RTP session that
2355	   contains the media source.  Both end-points and the mixer will need
2356	   to implement conference related codec control functionalities to
2357	   provide a good experience.  Full Intra Request to request from the
2358	   media source to provide switching points between the sources,
2359	   Temporary Maximum Media Bit-rate Request (TMMBR) to enable the mixer
2360	   to aggregate congestion control response towards the media source and
2361	   have it adjust its bit-rate in case the limitation is not in the
2362	   source to mixer link.

2364	   This version of the mixer also puts different requirements on the
2365	   end-point when it comes to decoder instances and handling of the RTP
2366	   media streams providing media.  As each projected SSRC can at any
2367	   time provide media the end-point either needs to handle having thus
2368	   many allocated decoder instances or have efficient switching of
2369	   decoder contexts in a more limited set of actual decoder instances to
2370	   cope with the switches.  The WebRTC application also gets more
2371	   responsibility to update how the media provides is to be presented to
2372	   the user.

2374	A.4.  Translator Based

2376	   There is also a variety of translators.  The core commonality is that
2377	   they do not need to make themselves visible in the RTP level by
2378	   having an SSRC themselves.  Instead they sit between one or more end-
2379	   point and perform translation at some level.  It can be media
2380	   transcoding, protocol translation or covering missing functionality
2381	   for a legacy end-point or simply relay packets between transport
2382	   domains or to realize multi-party.  We will go in details below.

2384	A.4.1.  Transcoder

2386	   A transcoder operates on media level and really used for two
2387	   purposes, the first is to allow two end-points that doesn't have a
2388	   common set of media codecs to communicate by translating from one
2389	   codec to another.  The second is to change the bit-rate to a lower
2390	   one.  For WebRTC end-points communicating with each other only the
2391	   first one is relevant.  In certain legacy deployment media transcoder
2392	   will be necessary to ensure both codecs and bit-rate falls within the
2393	   envelope the legacy end-point supports.

2395	   As transcoding requires access to the media, the transcoder has to be
2396	   within the security context and access any media encryption and
2397	   integrity keys.  On the RTP plane a media transcoder will in practice
2398	   fork the RTP session into two different domains that are highly
2399	   decoupled when it comes to media parameters and reporting, but not
2400	   identities.  To maintain signalling bindings to SSRCs a transcoder is
2401	   likely needing to use the SSRC of one end-point to represent the
2402	   transcoded RTP media stream to the other end-point(s).  The
2403	   congestion control loop can be terminated in the transcoder as the
2404	   media bit-rate being sent by the transcoder can be adjusted
2405	   independently of the incoming bit-rate.  However, for optimizing
2406	   performance and resource consumption the translator needs to consider
2407	   what signals or bit-rate reductions it needs to send towards the
2408	   source end-point.  For example receiving a 2.5 Mbps video stream and
2409	   then send out a 250 kbps video stream after transcoding is a waste of
2410	   resources.  In most cases a 500 kbps video stream from the source in
2411	   the right resolution is likely to provide equal quality after
2412	   transcoding as the 2.5 Mbps source stream.  At the same time
2413	   increasing media bit-rate further than what is needed to represent
2414	   the incoming quality accurate is also wasted resources.

2416	       +-A-------------+             +-Translator------------------+
2417	       | +-PeerC1------|             |-PeerC1--------+             |
2418	       | | +-UDP1------|             |-UDP1--------+ |             |
2419	       | | | +-RTP1----|             |-RTP1------+ | |             |
2420	       | | | | +-Audio-|             |-Audio---+ | | | +---+       |
2421	       | | | | |    AA1|------------>|---------+-+-+-+-|DEC|----+  |
2422	       | | | | |       |<------------|BA1 <----+ | | | +---+    |  |
2423	       | | | | |       |             |         |\| | | +---+    |  |
2424	       | | | | +-------|             |---------+ +-+-+-|ENC|<-+ |  |
2425	       | | | +---------|             |-----------+ | | +---+  | |  |
2426	       | | +-----------|             |-------------+ |        | |  |
2427	       | +-------------|             |---------------+        | |  |
2428	       +---------------+             |                        | |  |
2429	                                     |                        | |  |
2430	       +-B-------------+             |                        | |  |
2431	       | +-PeerC2------|             |-PeerC2--------+        | |  |
2432	       | | +-UDP2------|             |-UDP2--------+ |        | |  |
2433	       | | | +-RTP1----|             |-RTP1------+ | |        | |  |
2434	       | | | | +-Audio-|             |-Audio---+ | | | +---+  | |  |
2435	       | | | | |    BA1|------------>|---------+-+-+-+-|DEC|--+ |  |
2436	       | | | | |       |<------------|AA1 <----+ | | | +---+    |  |
2437	       | | | | |       |             |         |\| | | +---+    |  |
2438	       | | | | +-------|             |---------+ +-+-+-|ENC|<---+  |
2439	       | | | +---------|             |-----------+ | | +---+       |
2440	       | | +-----------|             |-------------+ |             |
2441	       | +-------------|             |---------------+             |
2442	       +---------------+             +-----------------------------+

2444	                        Figure 13: Media Transcoder

2446	   Figure 13 exposes some important details.  First of all you can see
2447	   the SSRC identifiers used by the translator are the corresponding
2448	   end-points.  Secondly, there is a relation between the RTP sessions
2449	   in the two different PeerConnections that are represented by having
2450	   both parts be identified by the same level and they need to share
2451	   certain contexts.  Also certain type of RTCP messages will need to be
2452	   bridged between the two parts.  Certain RTCP feedback messages are
2453	   likely needed to be sourced by the translator in response to actions
2454	   by the translator and its media encoder.

2456	A.4.2.  Gateway / Protocol Translator

2458	   Gateways are used when some protocol feature that are needed are not
2459	   supported by an end-point wants to participate in session.  This RTP
2460	   translator in Figure 14 takes on the role of ensuring that from the
2461	   perspective of participant A, participant B appears as a fully
2462	   compliant WebRTC end-point (that is, it is the combination of the
2463	   Translator and participant B that looks like a WebRTC end point).

2465	                               +------------+
2466	                               |            |
2467	                    +---+      | Translator |      +---+
2468	                    | A |<---->| to legacy  |<---->| B |
2469	                    +---+      | end-point  |      +---+
2470	                    WebRTC     |            |     Legacy
2471	                               +------------+

2473	       Figure 14: Gateway (RTP translator) towards legacy end-point

2475	   For WebRTC there are a number of requirements that could force the
2476	   need for a gateway if a WebRTC end-point is to communicate with a
2477	   legacy end-point, such as support of ICE and DTLS-SRTP for key
2478	   management.  On RTP level the main functions that might be missing in
2479	   a legacy implementation that otherwise support RTP are RTCP in
2480	   general, SRTP implementation, congestion control and feedback
2481	   messages needed to make it work.

2483	       +-A-------------+             +-Translator------------------+
2484	       | +-PeerC1------|             |-PeerC1------+               |
2485	       | | +-UDP1------|             |-UDP1------+ |               |
2486	       | | | +-RTP1----|             |-RTP1-----------------------+|
2487	       | | | | +-Audio-|             |-Audio---+                  ||
2488	       | | | | |    AA1|------------>|---------+----------------+ ||
2489	       | | | | |       |<------------|BA1 <----+--------------+ | ||
2490	       | | | | |       |<---RTCP---->|<--------+----------+   | | ||
2491	       | | | | +-------|             |---------+      +---+-+ | | ||
2492	       | | | +---------|             |---------------+| T   | | | ||
2493	       | | +-----------|             |-----------+ | || R   | | | ||
2494	       | +-------------|             |-------------+ || A   | | | ||
2495	       +---------------+             |               || N   | | | ||
2496	                                     |               || S   | | | ||
2497	       +-B-(Legacy)----+             |               || L   | | | ||
2498	       |               |             |               || A   | | | ||
2499	       |   +-UDP2------|             |-UDP2------+   || T   | | | ||
2500	       |   | +-RTP1----|             |-RTP1----------+| E   | | | ||
2501	       |   | | +-Audio-|             |-Audio---+      +---+-+ | | ||
2502	       |   | | |       |<---RTCP---->|<--------+----------+   | | ||
2503	       |   | | |    BA1|------------>|---------+--------------+ | ||
2504	       |   | | |       |<------------|AA1 <----+----------------+ ||
2505	       |   | | +-------|             |---------+                  ||
2506	       |   | +---------|             |----------------------------+|
2507	       |   +-----------|             |-----------+                 |
2508	       |               |             |                             |
2509	       +---------------+             +-----------------------------+

2511	                  Figure 15: RTP/RTCP Protocol Translator

2513	   The legacy gateway can be implemented in several ways and what it
2514	   need to change is highly dependent on what functions it need to proxy
2515	   for the legacy end-point.  One possibility is depicted in Figure 15
2516	   where the RTP media streams are compatible and forward without
2517	   changes.  However, their RTP header values are captured to enable the
2518	   RTCP translator to create RTCP reception information related to the
2519	   leg between the end-point and the translator.  This can then be
2520	   combined with the more basic RTCP reports that the legacy endpoint
2521	   (B) provides to give compatible and expected RTCP reporting to A.
2522	   Thus enabling at least full congestion control on the path between A
2523	   and the translator.  If B has limited possibilities for congestion
2524	   response for the media then the translator might need the capability
2525	   to perform media transcoding to address cases where it otherwise
2526	   would need to terminate media transmission.

2528	   As the translator are generating RTP/RTCP traffic on behalf of B to A
2529	   it will need to be able to correctly protect these packets that it
2530	   translates or generates.  Thus security context information are
2531	   needed in this type of translator if it operates on the RTP/RTCP
2532	   packet content or media.  In fact one of the more likely scenario is
2533	   that the translator (gateway) will need to have two different
2534	   security contexts one towards A and one towards B and for each RTP/
2535	   RTCP packet do a authenticity verification, decryption followed by a
2536	   encryption and integrity protection operation to resolve mismatch in
2537	   security systems.

2539	A.4.3.  Relay

2541	   There exist a class of translators that operates on transport level
2542	   below RTP and thus do not effect RTP/RTCP packets directly.  They
2543	   come in two distinct flavours, the one used to bridge between two
2544	   different transport or address domains to more function as a gateway
2545	   and the second one which is to to provide a group communication
2546	   feature as depicted below in Figure 16.

2548	                    +---+      +------------+      +---+
2549	                    | A |<---->|            |<---->| B |
2550	                    +---+      |            |      +---+
2551	                               | Translator |
2552	                    +---+      |            |      +---+
2553	                    | C |<---->|            |<---->| D |
2554	                    +---+      +------------+      +---+

2556	         Figure 16: RTP Translator (Relay) with Only Unicast Paths

2558	   The first kind is straight forward and is likely to exist in WebRTC
2559	   context when an legacy end-point is compatible with the exception for
2560	   ICE, and thus needs a gateway that terminates the ICE and then
2561	   forwards all the RTP/RTCP traffic and key management to the end-point
2562	   only rewriting the IP/UDP to forward the packet to the legacy node.

2564	   The second type is useful if one wants a less complex central node or
2565	   a central node that is outside of the security context and thus do
2566	   not have access to the media.  This relay takes on the role of
2567	   forwarding the media (RTP and RTCP) packets to the other end-points
2568	   but doesn't perform any RTP or media processing.  Such a device
2569	   simply forwards the media from each sender to all of the other
2570	   participants, and is sometimes called a transport-layer translator.
2571	   In Figure 16, participant A will only need to send a media once to
2572	   the relay, which will redistribute it by sending a copy of the stream
2573	   to participants B, C, and D. Participant A will still receive three
2574	   RTP streams with the media from B, C and D if they transmit
2575	   simultaneously.  This is from an RTP perspective resulting in an RTP
2576	   session that behaves equivalent to one transporter over an IP Any
2577	   Source Multicast (ASM).

2579	   This results in one common RTP session between all participants
2580	   despite that there will be independent PeerConnections created to the
2581	   translator as depicted below Figure 17.

2583	      +-A-------------+             +-RELAY--------------------------+
2584	      | +-PeerC1------|             |-PeerC1--------+                |
2585	      | | +-UDP1------|             |-UDP1--------+ |                |
2586	      | | | +-RTP1----|             |-RTP1-------------------------+ |
2587	      | | | | +-Video-|             |-Video---+                    | |
2588	      | | | | |    AV1|------------>|---------------------------+  | |
2589	      | | | | |       |<------------|BV1 <--------------------+ |  | |
2590	      | | | | |       |<------------|CV1 <------------------+ | |  | |
2591	      | | | | +-------|             |---------+             | | |  | |
2592	      | | | +---------|             |-------------------+   ^ ^ V  | |
2593	      | | +-----------|             |-------------+ |   |   | | |  | |
2594	      | +-------------|             |---------------+   |   | | |  | |
2595	      +---------------+             |                   |   | | |  | |
2596	                                    |                   |   | | |  | |
2597	      +-B-------------+             |                   |   | | |  | |
2598	      | +-PeerC2------|             |-PeerC2--------+   |   | | |  | |
2599	      | | +-UDP2------|             |-UDP2--------+ |   |   | | |  | |
2600	      | | | +-RTP2----|             |-RTP1--------------+   | | |  | |
2601	      | | | | +-Video-|             |-Video---+             | | |  | |
2602	      | | | | |    BV1|------------>|-----------------------+ | |  | |
2603	      | | | | |       |<------------|AV1 <----------------------+  | |
2604	      | | | | |       |<------------|CV1 <--------------------+ |  | |
2605	      | | | | +-------|             |---------+             | | |  | |
2606	      | | | +---------|             |-------------------+   | | |  | |
2607	      | | +-----------|             |-------------+ |   |   V ^ V  | |
2608	      | +-------------|             |---------------+   |   | | |  | |
2609	      +---------------+             |                   |   | | |  | |
2610	                                    :                   |   | | |  | |
2611	                                    :                   |   | | |  | |
2612	      +-C-------------+             |                   |   | | |  | |
2613	      | +-PeerC3------|             |-PeerC3--------+   |   | | |  | |
2614	      | | +-UDP3------|             |-UDP3--------+ |   |   | | |  | |
2615	      | | | +-RTP3----|             |-RTP1--------------+   | | |  | |
2616	      | | | | +-Video-|             |-Video---+             | | |  | |
2617	      | | | | |    CV1|------------>|-------------------------+ |  | |
2618	      | | | | |       |<------------|AV1 <----------------------+  | |
2619	      | | | | |       |<------------|BV1 <------------------+      | |
2620	      | | | | +-------|             |---------+                    | |
2621	      | | | +---------|             |------------------------------+ |
2622	      | | +-----------|             |-------------+ |                |
2623	      | +-------------|             |---------------+                |
2624	      +---------------+             +--------------------------------+

2626	                  Figure 17: Transport Multi-party Relay

2628	   As the Relay RTP and RTCP packets between the UDP flows as indicated
2629	   by the arrows for the media flow a given WebRTC end-point, like A
2630	   will see the remote sources BV1 and CV1.  There will be also two
2631	   different network paths between A, and B or C. This results in that
2632	   the client A has to be capable of handling that when determining
2633	   congestion state that there might exist multiple destinations on the
2634	   far side of a PeerConnection and that these paths have to be treated
2635	   differently.  It also results in a requirement to combine the
2636	   different congestion states into a decision to transmit a particular
2637	   RTP media stream suitable to all participants.

2639	   It is also important to note that the relay can not perform selective
2640	   relaying of some sources and not others.  The reason is that the RTCP
2641	   reporting in that case becomes inconsistent and without explicit
2642	   information about it being blocked has to be interpreted as severe
2643	   congestion.

2645	   In this usage it is also necessary that the session management has
2646	   configured a common set of RTP configuration including RTP payload
2647	   formats as when A sends a packet with pt=97 it will arrive at both B
2648	   and C carrying pt=97 and having the same packetization and encoding,
2649	   no entity will have manipulated the packet.

2651	   When it comes to security there exist some additional requirements to
2652	   ensure that the property that the relay can't read the media traffic
2653	   is enforced.  First of all the key to be used has to be agreed such
2654	   so that the relay doesn't get it, e.g. no DTLS-SRTP handshake with
2655	   the relay, instead some other method needs to be used.  Secondly, the
2656	   keying structure has to be capable of handling multiple end-points in
2657	   the same RTP session.

2659	   The second problem can basically be solved in two ways.  Either a
2660	   common master key from which all derive their per source key for
2661	   SRTP.  The second alternative which might be more practical is that
2662	   each end-point has its own key used to protects all RTP/RTCP packets
2663	   it sends.  Each participants key are then distributed to the other
2664	   participants.  This second method could be implemented using DTLS-
2665	   SRTP to a special key server and then use Encrypted Key Transport
2666	   [I-D.ietf-avt-srtp-ekt] to distribute the actual used key to the
2667	   other participants in the RTP session Figure 18.  The first one could
2668	   be achieved using MIKEY messages in SDP.

2670	                 +---+                               +---+
2671	                 |   |         +-----------+         |   |
2672	                 | A |<------->| DTLS-SRTP |<------->| C |
2673	                 |   |<--   -->|   HOST    |<--   -->|   |
2674	                 +---+   \ /   +-----------+   \ /   +---+
2675	                          X                     X
2676	                 +---+   / \   +-----------+   / \   +---+
2677	                 |   |<--   -->|    RTP    |<--   -->|   |
2678	                 | B |<------->|   RELAY   |<------->| D |
2679	                 |   |         +-----------+         |   |
2680	                 +---+                               +---+

2682	             Figure 18: DTLS-SRTP host and RTP Relay Separated

2684	   The relay can still verify that a given SSRC isn't used or spoofed by
2685	   another participant within the multi-party session by binding SSRCs
2686	   on their first usage to a given source address and port pair.
2687	   Packets carrying that source SSRC from other addresses can be
2688	   suppressed to prevent spoofing.  This is possible as long as SRTP is
2689	   used which leaves the SSRC of the packet originator in RTP and RTCP
2690	   packets in the clear.  If such packet level method for enforcing
2691	   source authentication within the group, then there exist
2692	   cryptographic methods such as TESLA [RFC4383] that could be used for
2693	   true source authentication.

2695	A.5.  End-point Forwarding

2697	   An WebRTC end-point (B in Figure 19) will receive a WebRTC
2698	   MediaStream (set of SSRCs) over a PeerConnection (from A).  For the
2699	   moment is not decided if the end-point is allowed or not to in its
2700	   turn send that WebRTC MediaStream over another PeerConnection to C.
2701	   This section discusses the RTP and end-point implications of allowing
2702	   such functionality, which on the API level is extremely simplistic to
2703	   perform.

2705	                          +---+    +---+    +---+
2706	                          | A |--->| B |--->| C |
2707	                          +---+    +---+    +---+

2709	                     Figure 19: MediaStream Forwarding

2711	   There exist two main approaches to how B forwards the media from A to
2712	   C. The first one is to simply relay the RTP media stream.  The second
2713	   one is for B to act as a transcoder.  Lets consider both approaches.

2715	   A relay approach will result in that the WebRTC end-points will have
2716	   to have the same capabilities as being discussed in Relay
2717	   (Appendix A.4.3).  Thus A will see an RTP session that is extended
2718	   beyond the PeerConnection and see two different receiving end-points
2719	   with different path characteristics (B and C).  Thus A's congestion
2720	   control needs to be capable of handling this.  The security solution
2721	   can either support mechanism that allows A to inform C about the key
2722	   A is using despite B and C having agreed on another set of keys.
2723	   Alternatively B will decrypt and then re-encrypt using a new key.
2724	   The relay based approach has the advantage that B does not need to
2725	   transcode the media thus both maintaining the quality of the encoding
2726	   and reducing B's complexity requirements.  If the right security
2727	   solutions are supported then also C will be able to verify the
2728	   authenticity of the media coming from A. As downside A are forced to
2729	   take both B and C into consideration when delivering content.

2731	   The media transcoder approach is similar to having B act as Mixer
2732	   terminating the RTP session combined with the transcoder as discussed
2733	   in Appendix A.4.1.  A will only see B as receiver of its media.  B
2734	   will responsible to produce a RTP media stream suitable for the B to
2735	   C PeerConnection.  This might require media transcoding for
2736	   congestion control purpose to produce a suitable bit-rate.  Thus
2737	   loosing media quality in the transcoding and forcing B to spend the
2738	   resource on the transcoding.  The media transcoding does result in a
2739	   separation of the two different legs removing almost all
2740	   dependencies.  B could choice to implement logic to optimize its
2741	   media transcoding operation, by for example requesting media
2742	   properties that are suitable for C also, thus trying to avoid it
2743	   having to transcode the content and only forward the media payloads
2744	   between the two sides.  For that optimization to be practical WebRTC
2745	   end-points have to support sufficiently good tools for codec control.

2747	A.6.  Simulcast

2749	   This section discusses simulcast in the meaning of providing a node,
2750	   for example a stream switching Mixer, with multiple different encoded
2751	   version of the same media source.  In the WebRTC context that appears
2752	   to be most easily accomplished by establishing multiple
2753	   PeerConnection all being feed the same set of WebRTC MediaStreams.
2754	   Each PeerConnection is then configured to deliver a particular media
2755	   quality and thus media bit-rate.  This will work well as long as the
2756	   end-point implements media encoding according to Figure 7.  Then each
2757	   PeerConnection will receive an independently encoded version and the
2758	   codec parameters can be agreed specifically in the context of this
2759	   PeerConnection.

2761	   For simulcast to work one needs to prevent that the end-point deliver
2762	   content encoded as depicted in Figure 8.  If a single encoder
2763	   instance is feed to multiple PeerConnections the intention of
2764	   performing simulcast will fail.

2766	   Thus it needs to be considered to explicitly signal which of the two
2767	   implementation strategies that are desired and which will be done.
2768	   At least making the application and possible the central node
2769	   interested in receiving simulcast of an end-points RTP media streams
2770	   to be aware if it will function or not.

2772	Authors' Addresses

2774	   Colin Perkins
2775	   University of Glasgow
2776	   School of Computing Science
2777	   Glasgow  G12 8QQ
2778	   United Kingdom

2780	   Email: csp@csperkins.org

2782	   Magnus Westerlund
2783	   Ericsson
2784	   Farogatan 6
2785	   SE-164 80 Kista
2786	   Sweden

2788	   Phone: +46 10 714 82 87
2789	   Email: magnus.westerlund@ericsson.com

2791	   Joerg Ott
2792	   Aalto University
2793	   School of Electrical Engineering
2794	   Espoo  02150
2795	   Finland

2797	   Email: jorg.ott@aalto.fi