idnits 2.17.1 

draft-ietf-rtcweb-rtp-usage-13.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (April 23, 2014) is 3650 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Outdated reference: A later version (-13) exists of
     draft-ietf-avtcore-multi-media-rtp-session-05

  == Outdated reference: A later version (-18) exists of
     draft-ietf-avtcore-rtp-circuit-breakers-05

  == Outdated reference: A later version (-12) exists of
     draft-ietf-avtcore-rtp-multi-stream-optimisation-02

  == Outdated reference: A later version (-11) exists of
     draft-ietf-avtcore-rtp-multi-stream-03

  == Outdated reference: A later version (-20) exists of
     draft-ietf-rtcweb-security-arch-09

  == Outdated reference: A later version (-12) exists of
     draft-ietf-rtcweb-security-06

  ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866)

  ** Obsolete normative reference: RFC 5285 (Obsoleted by RFC 8285)

  == Outdated reference: A later version (-12) exists of
     draft-ietf-avtcore-multiplex-guidelines-02

  == Outdated reference: A later version (-10) exists of
     draft-ietf-avtcore-rtp-topologies-update-01

  == Outdated reference: A later version (-08) exists of
     draft-ietf-avtext-rtp-grouping-taxonomy-01

  == Outdated reference: A later version (-17) exists of
     draft-ietf-mmusic-msid-05

  == Outdated reference: A later version (-54) exists of
     draft-ietf-mmusic-sdp-bundle-negotiation-07

  == Outdated reference: A later version (-14) exists of
     draft-ietf-payload-rtp-howto-13

  == Outdated reference: A later version (-09) exists of
     draft-ietf-rmcat-cc-requirements-04

  == Outdated reference: A later version (-11) exists of
     draft-ietf-rtcweb-audio-05

  == Outdated reference: A later version (-19) exists of
     draft-ietf-rtcweb-overview-09

  == Outdated reference: A later version (-16) exists of
     draft-ietf-rtcweb-use-cases-and-requirements-14

  == Outdated reference: A later version (-18) exists of
     draft-ietf-tsvwg-rtcweb-qos-00


     Summary: 2 errors (**), 0 flaws (~~), 18 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	RTCWEB Working Group                                          C. Perkins
3	Internet-Draft                                     University of Glasgow
4	Intended status: Standards Track                           M. Westerlund
5	Expires: October 25, 2014                                       Ericsson
6	                                                                  J. Ott
7	                                                        Aalto University
8	                                                          April 23, 2014

10	  Web Real-Time Communication (WebRTC): Media Transport and Use of RTP
11	                     draft-ietf-rtcweb-rtp-usage-13

13	Abstract

15	   The Web Real-Time Communication (WebRTC) framework provides support
16	   for direct interactive rich communication using audio, video, text,
17	   collaboration, games, etc. between two peers' web-browsers.  This
18	   memo describes the media transport aspects of the WebRTC framework.
19	   It specifies how the Real-time Transport Protocol (RTP) is used in
20	   the WebRTC context, and gives requirements for which RTP features,
21	   profiles, and extensions need to be supported.

23	Status of This Memo

25	   This Internet-Draft is submitted in full conformance with the
26	   provisions of BCP 78 and BCP 79.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF).  Note that other groups may also distribute
30	   working documents as Internet-Drafts.  The list of current Internet-
31	   Drafts is at http://datatracker.ietf.org/drafts/current/.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet-Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   This Internet-Draft will expire on October 25, 2014.

40	Copyright Notice

42	   Copyright (c) 2014 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (http://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with respect
50	   to this document.  Code Components extracted from this document must
51	   include Simplified BSD License text as described in Section 4.e of
52	   the Trust Legal Provisions and are provided without warranty as
53	   described in the Simplified BSD License.

55	Table of Contents

57	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
58	   2.  Rationale . . . . . . . . . . . . . . . . . . . . . . . . . .   4
59	   3.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
60	   4.  WebRTC Use of RTP: Core Protocols . . . . . . . . . . . . . .   5
61	     4.1.  RTP and RTCP  . . . . . . . . . . . . . . . . . . . . . .   5
62	     4.2.  Choice of the RTP Profile . . . . . . . . . . . . . . . .   7
63	     4.3.  Choice of RTP Payload Formats . . . . . . . . . . . . . .   7
64	     4.4.  Use of RTP Sessions . . . . . . . . . . . . . . . . . . .   9
65	     4.5.  RTP and RTCP Multiplexing . . . . . . . . . . . . . . . .   9
66	     4.6.  Reduced Size RTCP . . . . . . . . . . . . . . . . . . . .  10
67	     4.7.  Symmetric RTP/RTCP  . . . . . . . . . . . . . . . . . . .  10
68	     4.8.  Choice of RTP Synchronisation Source (SSRC) . . . . . . .  11
69	     4.9.  Generation of the RTCP Canonical Name (CNAME) . . . . . .  11
70	     4.10. Handling of Leap Seconds  . . . . . . . . . . . . . . . .  12
71	   5.  WebRTC Use of RTP: Extensions . . . . . . . . . . . . . . . .  12
72	     5.1.  Conferencing Extensions and Topologies  . . . . . . . . .  12
73	       5.1.1.  Full Intra Request (FIR)  . . . . . . . . . . . . . .  14
74	       5.1.2.  Picture Loss Indication (PLI) . . . . . . . . . . . .  14
75	       5.1.3.  Slice Loss Indication (SLI) . . . . . . . . . . . . .  14
76	       5.1.4.  Reference Picture Selection Indication (RPSI) . . . .  15
77	       5.1.5.  Temporal-Spatial Trade-off Request (TSTR) . . . . . .  15
78	       5.1.6.  Temporary Maximum Media Stream Bit Rate Request
79	               (TMMBR) . . . . . . . . . . . . . . . . . . . . . . .  15
80	     5.2.  Header Extensions . . . . . . . . . . . . . . . . . . . .  16
81	       5.2.1.  Rapid Synchronisation . . . . . . . . . . . . . . . .  16
82	       5.2.2.  Client-to-Mixer Audio Level . . . . . . . . . . . . .  16
83	       5.2.3.  Mixer-to-Client Audio Level . . . . . . . . . . . . .  17
84	   6.  WebRTC Use of RTP: Improving Transport Robustness . . . . . .  17
85	     6.1.  Negative Acknowledgements and RTP Retransmission  . . . .  17
86	     6.2.  Forward Error Correction (FEC)  . . . . . . . . . . . . .  18
87	   7.  WebRTC Use of RTP: Rate Control and Media Adaptation  . . . .  19
88	     7.1.  Boundary Conditions and Circuit Breakers  . . . . . . . .  20
89	     7.2.  RTCP Limitations for Congestion Control . . . . . . . . .  20
90	     7.3.  Congestion Control Interoperability and Legacy Systems  .  22
91	   8.  WebRTC Use of RTP: Performance Monitoring . . . . . . . . . .  23
92	   9.  WebRTC Use of RTP: Future Extensions  . . . . . . . . . . . .  24
93	   10. Signalling Considerations . . . . . . . . . . . . . . . . . .  24
94	   11. WebRTC API Considerations . . . . . . . . . . . . . . . . . .  25
95	   12. RTP Implementation Considerations . . . . . . . . . . . . . .  28
96	     12.1.  Configuration and Use of RTP Sessions  . . . . . . . . .  28
97	       12.1.1.  Use of Multiple Media Sources Within an RTP Session   28
98	       12.1.2.  Use of Multiple RTP Sessions . . . . . . . . . . . .  29
99	       12.1.3.  Differentiated Treatment of RTP Packet Streams . . .  34
100	     12.2.  Media Source, RTP Packet Streams, and Participant
101	            Identification . . . . . . . . . . . . . . . . . . . . .  35
102	       12.2.1.  Media Source . . . . . . . . . . . . . . . . . . . .  36
103	       12.2.2.  SSRC Collision Detection . . . . . . . . . . . . . .  36
104	       12.2.3.  Media Synchronisation Context  . . . . . . . . . . .  37
105	   13. Security Considerations . . . . . . . . . . . . . . . . . . .  38
106	   14. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  39
107	   15. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  39
108	   16. References  . . . . . . . . . . . . . . . . . . . . . . . . .  39
109	     16.1.  Normative References . . . . . . . . . . . . . . . . . .  39
110	     16.2.  Informative References . . . . . . . . . . . . . . . . .  42
111	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  44

113	1.  Introduction

115	   The Real-time Transport Protocol (RTP) [RFC3550] provides a framework
116	   for delivery of audio and video teleconferencing data and other real-
117	   time media applications.  Previous work has defined the RTP protocol,
118	   along with numerous profiles, payload formats, and other extensions.
119	   When combined with appropriate signalling, these form the basis for
120	   many teleconferencing systems.

122	   The Web Real-Time communication (WebRTC) framework provides the
123	   protocol building blocks to support direct, interactive, real-time
124	   communication using audio, video, collaboration, games, etc., between
125	   two peers' web-browsers.  This memo describes how the RTP framework
126	   is to be used in the WebRTC context.  It proposes a baseline set of
127	   RTP features that are to be implemented by all WebRTC-aware end-
128	   points, along with suggested extensions for enhanced functionality.

130	   This memo specifies a protocol intended for use within the WebRTC
131	   framework, but is not restricted to that context.  An overview of the
132	   WebRTC framework is given in [I-D.ietf-rtcweb-overview].

134	   The structure of this memo is as follows.  Section 2 outlines our
135	   rationale in preparing this memo and choosing these RTP features.
136	   Section 3 defines terminology.  Requirements for core RTP protocols
137	   are described in Section 4 and suggested RTP extensions are described
138	   in Section 5.  Section 6 outlines mechanisms that can increase
139	   robustness to network problems, while Section 7 describes congestion
140	   control and rate adaptation mechanisms.  The discussion of mandated
141	   RTP mechanisms concludes in Section 8 with a review of performance
142	   monitoring and network management tools that can be used in the
143	   WebRTC context.  Section 9 gives some guidelines for future
144	   incorporation of other RTP and RTP Control Protocol (RTCP) extensions
145	   into this framework.  Section 10 describes requirements placed on the
146	   signalling channel.  Section 11 discusses the relationship between
147	   features of the RTP framework and the WebRTC application programming
148	   interface (API), and Section 12 discusses RTP implementation
149	   considerations.  The memo concludes with security considerations
150	   (Section 13) and IANA considerations (Section 14).

152	2.  Rationale

154	   The RTP framework comprises the RTP data transfer protocol, the RTP
155	   control protocol, and numerous RTP payload formats, profiles, and
156	   extensions.  This range of add-ons has allowed RTP to meet various
157	   needs that were not envisaged by the original protocol designers, and
158	   to support many new media encodings, but raises the question of what
159	   extensions are to be supported by new implementations.  The
160	   development of the WebRTC framework provides an opportunity to review
161	   the available RTP features and extensions, and to define a common
162	   baseline feature set for all WebRTC implementations of RTP.  This
163	   builds on the past 20 years development of RTP to mandate the use of
164	   extensions that have shown widespread utility, while still remaining
165	   compatible with the wide installed base of RTP implementations where
166	   possible.

168	   RTP and RTCP extensions that are not discussed in this document can
169	   be implemented by WebRTC end-points if they are beneficial for new
170	   use cases.  However, they are not necessary to address the WebRTC use
171	   cases and requirements identified in
172	   [I-D.ietf-rtcweb-use-cases-and-requirements].

174	   While the baseline set of RTP features and extensions defined in this
175	   memo is targeted at the requirements of the WebRTC framework, it is
176	   expected to be broadly useful for other conferencing-related uses of
177	   RTP.  In particular, it is likely that this set of RTP features and
178	   extensions will be appropriate for other desktop or mobile video
179	   conferencing systems, or for room-based high-quality telepresence
180	   applications.

182	3.  Terminology

184	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
185	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
186	   document are to be interpreted as described in [RFC2119].  The RFC
187	   2119 interpretation of these key words applies only when written in
188	   ALL CAPS.  Lower- or mixed-case uses of these key words are not to be
189	   interpreted as carrying special significance in this memo.

191	   We define the following additional terms:

193	   WebRTC MediaStream:  The MediaStream concept defined by the W3C in
194	      the WebRTC API [W3C.WD-mediacapture-streams-20130903].

196	   Transport-layer Flow:  A uni-directional flow of transport packets
197	      that are identified by having a particular 5-tuple of source IP
198	      address, source port, destination IP address, destination port,
199	      and transport protocol used.

201	   Bi-directional Transport-layer Flow:  A bi-directional transport-
202	      layer flow is a transport-layer flow that is symmetric.  That is,
203	      the transport-layer flow in the reverse direction has a 5-tuple
204	      where the source and destination address and ports are swapped
205	      compared to the forward path transport-layer flow, and the
206	      transport protocol is the same.

208	   This document uses the terminology from
209	   [I-D.ietf-avtext-rtp-grouping-taxonomy].  Other terms are used
210	   according to their definitions from the RTP Specification [RFC3550].
211	   We especially note the following frequently used terms: RTP Packet
212	   Stream, RTP Session, and End-point.

214	4.  WebRTC Use of RTP: Core Protocols

216	   The following sections describe the core features of RTP and RTCP
217	   that need to be implemented, along with the mandated RTP profiles.
218	   Also described are the core extensions providing essential features
219	   that all WebRTC implementations need to implement to function
220	   effectively on today's networks.

222	4.1.  RTP and RTCP

224	   The Real-time Transport Protocol (RTP) [RFC3550] is REQUIRED to be
225	   implemented as the media transport protocol for WebRTC.  RTP itself
226	   comprises two parts: the RTP data transfer protocol, and the RTP
227	   control protocol (RTCP).  RTCP is a fundamental and integral part of
228	   RTP, and MUST be implemented in all WebRTC applications.

230	   The following RTP and RTCP features are sometimes omitted in limited
231	   functionality implementations of RTP, but are REQUIRED in all WebRTC
232	   implementations:

234	   o  Support for use of multiple simultaneous SSRC values in a single
235	      RTP session, including support for RTP end-points that send many
236	      SSRC values simultaneously, following [RFC3550] and
237	      [I-D.ietf-avtcore-rtp-multi-stream].  Support for the RTCP
238	      optimisations for multi-SSRC sessions defined in
239	      [I-D.ietf-avtcore-rtp-multi-stream-optimisation] is RECOMMENDED.

241	   o  Random choice of SSRC on joining a session; collision detection
242	      and resolution for SSRC values (see also Section 4.8).

244	   o  Support for reception of RTP data packets containing CSRC lists,
245	      as generated by RTP mixers, and RTCP packets relating to CSRCs.

247	   o  Sending correct synchronisation information in the RTCP Sender
248	      Reports, to allow receivers to implement lip-synchronisation;
249	      support for the rapid RTP synchronisation extensions (see
250	      Section 5.2.1) is RECOMMENDED.

252	   o  Support for multiple synchronisation contexts.  Participants that
253	      send multiple simultaneous RTP packet streams SHOULD do so as part
254	      of a single synchronisation context, using a single RTCP CNAME for
255	      all streams and allowing receivers to play the streams out in a
256	      synchronised manner.  For compatibility with potential future
257	      versions of this specification, or for interoperability with non-
258	      WebRTC devices through a gateway, receivers MUST support multiple
259	      synchronisation contexts, indicated by the use of multiple RTCP
260	      CNAMEs in an RTP session.  This specification requires the usage
261	      of a single CNAME when sending RTP Packet Streams in some
262	      circumstances, see Section 4.9.

264	   o  Support for sending and receiving RTCP SR, RR, SDES, and BYE
265	      packet types, with OPTIONAL support for other RTCP packet types
266	      unless mandated by other parts of this specification;
267	      implementations MUST ignore unknown RTCP packet types.  Note that
268	      additional RTCP Packet types are used by the RTP/SAVPF Profile
269	      (Section 4.2) and the other RTCP extensions (Section 5).

271	   o  Support for multiple end-points in a single RTP session, and for
272	      scaling the RTCP transmission interval according to the number of
273	      participants in the session; support for randomised RTCP
274	      transmission intervals to avoid synchronisation of RTCP reports;
275	      support for RTCP timer reconsideration.

277	   o  Support for configuring the RTCP bandwidth as a fraction of the
278	      media bandwidth, and for configuring the fraction of the RTCP
279	      bandwidth allocated to senders, e.g., using the SDP "b=" line
280	      [RFC4566][RFC3556].  Support for the reduced minimum RTCP
281	      reporting interval described in Section 6.2 of [RFC3550] is
282	      RECOMMENDED.

284	   It is known that a significant number of legacy RTP implementations,
285	   especially those targeted at VoIP-only systems, do not support all of
286	   the above features, and in some cases do not support RTCP at all.
287	   Implementers are advised to consider the requirements for graceful
288	   degradation when interoperating with legacy implementations.

290	   Other implementation considerations are discussed in Section 12.

292	4.2.  Choice of the RTP Profile

294	   The complete specification of RTP for a particular application domain
295	   requires the choice of an RTP Profile.  For WebRTC use, the Extended
296	   Secure RTP Profile for RTCP-Based Feedback (RTP/SAVPF) [RFC5124], as
297	   extended by [RFC7007], MUST be implemented.  The RTP/SAVPF profile is
298	   the combination of basic RTP/AVP profile [RFC3551], the RTP profile
299	   for RTCP-based feedback (RTP/AVPF) [RFC4585], and the secure RTP
300	   profile (RTP/SAVP) [RFC3711].

302	   The RTCP-based feedback extensions [RFC4585] are needed for the
303	   improved RTCP timer model.  This allows more flexible transmission of
304	   RTCP packets in response to events, rather than strictly according to
305	   bandwidth, and is vital for being able to report congestion signals
306	   as well as media events.  These extensions also allow saving RTCP
307	   bandwidth, and an end-point will commonly only use the full RTCP
308	   bandwidth allocation if there are many events that require feedback.
309	   The timer rules are also needed to make use of the RTP conferencing
310	   extensions discussed in Section 5.1.

312	      Note: The enhanced RTCP timer model defined in the RTP/AVPF
313	      profile is backwards compatible with legacy systems that implement
314	      only the RTP/AVP or RTP/SAVP profile, given some constraints on
315	      parameter configuration such as the RTCP bandwidth value and "trr-
316	      int" (the most important factor for interworking with RTP/(S)AVP
317	      end-points via a gateway is to set the trr-int parameter to a
318	      value representing 4 seconds).

320	   The secure RTP (SRTP) profile extensions [RFC3711] are needed to
321	   provide media encryption, integrity protection, replay protection and
322	   a limited form of source authentication.  WebRTC implementations MUST
323	   NOT send packets using the basic RTP/AVP profile or the RTP/AVPF
324	   profile; they MUST employ the full RTP/SAVPF profile to protect all
325	   RTP and RTCP packets that are generated (i.e., implementations MUST
326	   use SRTP and SRTCP).  The RTP/SAVPF profile MUST be configured using
327	   the cipher suites, DTLS-SRTP protection profiles, keying mechanisms,
328	   and other parameters described in [I-D.ietf-rtcweb-security-arch].

330	4.3.  Choice of RTP Payload Formats

332	   The set of mandatory to implement codecs and RTP payload formats for
333	   WebRTC is not specified in this memo, instead they are defined in
334	   separate specifications, such as [I-D.ietf-rtcweb-audio].
335	   Implementations can support any codec for which an RTP payload format
336	   and associated signalling is defined.  Implementation cannot assume
337	   that the other participants in an RTP session understand any RTP
338	   payload format, no matter how common; the mapping between RTP payload
339	   type numbers and specific configurations of particular RTP payload
340	   formats MUST be agreed before those payload types/formats can be
341	   used.  In an SDP context, this can be done using the "a=rtpmap:" and
342	   "a=fmtp:" attributes associated with an "m=" line, along with any
343	   other SDP attributes needed to configure the RTP payload format.

345	   End-points can signal support for multiple RTP payload formats, or
346	   multiple configurations of a single RTP payload format, as long as
347	   each unique RTP payload format configuration uses a different RTP
348	   payload type number.  As outlined in Section 4.8, the RTP payload
349	   type number is sometimes used to associate an RTP packet stream with
350	   a signalling context.  This association is possible provided unique
351	   RTP payload type numbers are used in each context.  For example, an
352	   RTP packet stream can be associated with an SDP "m=" line by
353	   comparing the RTP payload type numbers used by the RTP packet stream
354	   with payload types signalled in the "a=rtpmap:" lines in the media
355	   sections of the SDP.  If RTP packet streams are being associated with
356	   signalling contexts based on the RTP payload type, then the
357	   assignment of RTP payload type numbers MUST be unique across
358	   signalling contexts; if the same RTP payload format configuration is
359	   used in multiple contexts, then a different RTP payload type number
360	   has to be assigned in each context to ensure uniqueness.  If the RTP
361	   payload type number is not being used to associate RTP packet streams
362	   with a signalling context, then the same RTP payload type number can
363	   be used to indicate the exact same RTP payload format configuration
364	   in multiple contexts.  A single RTP payload type number MUST NOT be
365	   assigned to different RTP payload formats, or different
366	   configurations of the same RTP payload format, within a single RTP
367	   session (note that the different "m=" lines in an SDP bundle group
368	   [I-D.ietf-mmusic-sdp-bundle-negotiation] form a single RTP session).

370	   An end-point that has signalled support for multiple RTP payload
371	   formats SHOULD be able to accept data in any of those payload formats
372	   at any time, unless it has previously signalled limitations on its
373	   decoding capability.  This requirement is constrained if several
374	   types of media (e.g., audio and video) are sent in the same RTP
375	   session.  In such a case, a source (SSRC) is restricted to switching
376	   only between the RTP payload formats signalled for the type of media
377	   that is being sent by that source; see Section 4.4.  To support rapid
378	   rate adaptation by changing codec, RTP does not require advance
379	   signalling for changes between RTP payload formats used by a single
380	   SSRC that were signalled during session set-up.

382	   An RTP sender that changes between two RTP payload types that use
383	   different RTP clock rates MUST follow the recommendations in
384	   Section 4.1 of [RFC7160].  RTP receivers MUST follow the
385	   recommendations in Section 4.3 of [RFC7160] in order to support
386	   sources that switch between clock rates in an RTP session (these
387	   recommendations for receivers are backwards compatible with the case
388	   where senders use only a single clock rate).

390	4.4.  Use of RTP Sessions

392	   An association amongst a set of end-points communicating using RTP is
393	   known as an RTP session [RFC3550].  An end-point can be involved in
394	   several RTP sessions at the same time.  In a multimedia session, each
395	   type of media has typically been carried in a separate RTP session
396	   (e.g., using one RTP session for the audio, and a separate RTP
397	   session using a different transport-layer flow for the video).
398	   WebRTC implementations of RTP are REQUIRED to implement support for
399	   multimedia sessions in this way, separating each session using
400	   different transport-layer flows for compatibility with legacy
401	   systems.

403	   In modern day networks, however, with the widespread use of network
404	   address/port translators (NAT/NAPT) and firewalls, it is desirable to
405	   reduce the number of transport-layer flows used by RTP applications.
406	   This can be done by sending all the RTP packet streams in a single
407	   RTP session, which will comprise a single transport-layer flow (this
408	   will prevent the use of some quality-of-service mechanisms, as
409	   discussed in Section 12.1.3).  Implementations are therefore also
410	   REQUIRED to support transport of all RTP packet streams, independent
411	   of media type, in a single RTP session using a single transport layer
412	   flow, according to [I-D.ietf-avtcore-multi-media-rtp-session].  If
413	   multiple types of media are to be used in a single RTP session, all
414	   participants in that RTP session MUST agree to this usage.  In an SDP
415	   context, [I-D.ietf-mmusic-sdp-bundle-negotiation] can be used to
416	   signal such a bundle of RTP packet streams forming a single RTP
417	   session.

419	   Further discussion about the suitability of different RTP session
420	   structures and multiplexing methods to different scenarios are
421	   suitable can be found in [I-D.ietf-avtcore-multiplex-guidelines].

423	4.5.  RTP and RTCP Multiplexing

425	   Historically, RTP and RTCP have been run on separate transport layer
426	   flows (e.g., two UDP ports for each RTP session, one port for RTP and
427	   one port for RTCP).  With the increased use of Network Address/Port
428	   Translation (NAT/NAPT) this has become problematic, since maintaining
429	   multiple NAT bindings can be costly.  It also complicates firewall
430	   administration, since multiple ports need to be opened to allow RTP
431	   traffic.  To reduce these costs and session set-up times, support for
432	   multiplexing RTP data packets and RTCP control packets on a single
433	   transport-layer flow for each RTP session is REQUIRED, provided it is
434	   negotiated in the signalling channel before use as specified in
435	   [RFC5761].  For backwards compatibility, implementations are also
436	   REQUIRED to support RTP and RTCP sent on separate transport-layer
437	   flows.

439	   Note that the use of RTP and RTCP multiplexed onto a single
440	   transport-layer flow ensures that there is occasional traffic sent on
441	   that port, even if there is no active media traffic.  This can be
442	   useful to keep NAT bindings alive, and is the recommend method for
443	   application level keep-alives of RTP sessions [RFC6263].

445	4.6.  Reduced Size RTCP

447	   RTCP packets are usually sent as compound RTCP packets, and [RFC3550]
448	   requires that those compound packets start with an Sender Report (SR)
449	   or Receiver Report (RR) packet.  When using frequent RTCP feedback
450	   messages under the RTP/AVPF Profile [RFC4585] these statistics are
451	   not needed in every packet, and unnecessarily increase the mean RTCP
452	   packet size.  This can limit the frequency at which RTCP packets can
453	   be sent within the RTCP bandwidth share.

455	   To avoid this problem, [RFC5506] specifies how to reduce the mean
456	   RTCP message size and allow for more frequent feedback.  Frequent
457	   feedback, in turn, is essential to make real-time applications
458	   quickly aware of changing network conditions, and to allow them to
459	   adapt their transmission and encoding behaviour.  Support for non-
460	   compound RTCP feedback packets [RFC5506] is REQUIRED, but MUST be
461	   negotiated using the signalling channel before use.  For backwards
462	   compatibility, implementations are also REQUIRED to support the use
463	   of compound RTCP feedback packets if the remote end-point does not
464	   agree to the use of non-compound RTCP in the signalling exchange.

466	4.7.  Symmetric RTP/RTCP

468	   To ease traversal of NAT and firewall devices, implementations are
469	   REQUIRED to implement and use Symmetric RTP [RFC4961].  The reason
470	   for using symmetric RTP is primarily to avoid issues with NATs and
471	   Firewalls by ensuring that the send and receive RTP packet streams,
472	   as well as RTCP, are actually bi-directional transport-layer flows.
473	   This will keep alive the NAT and firewall pinholes, and help indicate
474	   consent that the receive direction is a transport-layer flow the
475	   intended recipient actually wants.  In addition, it saves resources,
476	   specifically ports at the end-points, but also in the network as NAT
477	   mappings or firewall state is not unnecessary bloated.  The amount of
478	   per flow QoS state kept in the network is also reduced.

480	4.8.  Choice of RTP Synchronisation Source (SSRC)

482	   Implementations are REQUIRED to support signalled RTP synchronisation
483	   source (SSRC) identifiers, using the "a=ssrc:" SDP attribute defined
484	   in Section 4.1 and Section 5 of [RFC5576].  Implementations MUST also
485	   support the "previous-ssrc" source attribute defined in Section 6.2
486	   of [RFC5576].  Other per-SSRC attributes defined in [RFC5576] MAY be
487	   supported.

489	   Use of the "a=ssrc:" attribute to signal SSRC identifiers in an RTP
490	   session is OPTIONAL.  Implementations MUST be prepared to accept RTP
491	   and RTCP packets using SSRCs that have not been explicitly signalled
492	   ahead of time.  Implementations MUST support random SSRC assignment,
493	   and MUST support SSRC collision detection and resolution, according
494	   to [RFC3550].  When using signalled SSRC values, collision detection
495	   MUST be performed as described in Section 5 of [RFC5576].

497	   It is often desirable to associate an RTP packet stream with a non-
498	   RTP context.  For users of the WebRTC API a mapping between SSRCs and
499	   MediaStreamTracks are provided per Section 11.  For gateways or other
500	   usages it is possible to associate an RTP packet stream with an "m="
501	   line in a session description formatted using SDP.  If SSRCs are
502	   signalled this is straightforward (in SDP the "a=ssrc:" line will be
503	   at the media level, allowing a direct association with an "m=" line).
504	   If SSRCs are not signalled, the RTP payload type numbers used in an
505	   RTP packet stream are often sufficient to associate that packet
506	   stream with a signalling context (e.g., if RTP payload type numbers
507	   are assigned as described in Section 4.3 of this memo, the RTP
508	   payload types used by an RTP packet stream can be compared with
509	   values in SDP "a=rtpmap:" lines, which are at the media level in SDP,
510	   and so map to an "m=" line).

512	4.9.  Generation of the RTCP Canonical Name (CNAME)

514	   The RTCP Canonical Name (CNAME) provides a persistent transport-level
515	   identifier for an RTP end-point.  While the Synchronisation Source
516	   (SSRC) identifier for an RTP end-point can change if a collision is
517	   detected, or when the RTP application is restarted, its RTCP CNAME is
518	   meant to stay unchanged for the duration of a RTCPeerConnection
519	   [W3C.WD-webrtc-20130910], so that RTP end-points can be uniquely
520	   identified and associated with their RTP packet streams within a set
521	   of related RTP sessions.

523	   Each RTP end-point MUST have at least one RTCP CNAME, and that RTCP
524	   CNAME MUST be unique within the RTCPeerConnection.  RTCP CNAMEs
525	   identify a particular synchronisation context, i.e., all SSRCs
526	   associated with a single RTCP CNAME share a common reference clock.
527	   If an end-point has SSRCs that are associated with several
528	   unsynchronised reference clocks, and hence different synchronisation
529	   contexts, it will need to use multiple RTCP CNAMEs, one for each
530	   synchronisation context.

532	   Taking the discussion in Section 11 into account, a WebRTC end-point
533	   MUST NOT use more than one RTCP CNAME in the RTP sessions belonging
534	   to single RTCPeerConnection (that is, an RTCPeerConnection forms a
535	   synchronisation context).  RTP middleboxes MAY generate RTP packet
536	   streams associated with more than one RTCP CNAME, to allow them to
537	   avoid having to resynchronize media from multiple different end-
538	   points part of a multi-party RTP session.

540	   The RTP specification [RFC3550] includes guidelines for choosing a
541	   unique RTP CNAME, but these are not sufficient in the presence of NAT
542	   devices.  In addition, long-term persistent identifiers can be
543	   problematic from a privacy viewpoint (Section 13).  Accordingly, a
544	   WebRTC endpoint MUST generate a new, unique, short-term persistent
545	   RTCP CNAME for each RTCPeerConnection, following [RFC7022], with a
546	   single exception; if explicitly requested at creation an
547	   RTCPeerConnection MAY use the same CNAME as as an existing
548	   RTCPeerConnection within their common same-origin context.

550	   An WebRTC end-point MUST support reception of any CNAME that matches
551	   the syntax limitations specified by the RTP specification [RFC3550]
552	   and cannot assume that any CNAME will be chosen according to the form
553	   suggested above.

555	4.10.  Handling of Leap Seconds

557	   The guidelines regarding handling of leap seconds to limit their
558	   impact on RTP media playout and synchronization given in [RFC7164]
559	   SHOULD be followed.

561	5.  WebRTC Use of RTP: Extensions

563	   There are a number of RTP extensions that are either needed to obtain
564	   full functionality, or extremely useful to improve on the baseline
565	   performance, in the WebRTC application context.  One set of these
566	   extensions is related to conferencing, while others are more generic
567	   in nature.  The following subsections describe the various RTP
568	   extensions mandated or suggested for use within the WebRTC context.

570	5.1.  Conferencing Extensions and Topologies

572	   RTP is a protocol that inherently supports group communication.
573	   Groups can be implemented by having each endpoint send its RTP packet
574	   streams to an RTP middlebox that redistributes the traffic, by using
575	   a mesh of unicast RTP packet streams between endpoints, or by using
576	   an IP multicast group to distribute the RTP packet streams.  These
577	   topologies can be implemented in a number of ways as discussed in
578	   [I-D.ietf-avtcore-rtp-topologies-update].

580	   While the use of IP multicast groups is popular in IPTV systems, the
581	   topologies based on RTP middleboxes are dominant in interactive video
582	   conferencing environments.  Topologies based on a mesh of unicast
583	   transport-layer flows to create a common RTP session have not seen
584	   widespread deployment to date.  Accordingly, WebRTC implementations
585	   are not expected to support topologies based on IP multicast groups
586	   or to support mesh-based topologies, such as a point-to-multipoint
587	   mesh configured as a single RTP session (Topo-Mesh in the terminology
588	   of [I-D.ietf-avtcore-rtp-topologies-update]).  However, a point-to-
589	   multipoint mesh constructed using several RTP sessions, implemented
590	   in the WebRTC context using independent RTCPeerConnections, can be
591	   expected to be utilised by WebRTC applications and needs to be
592	   supported.

594	   WebRTC implementations of RTP endpoints implemented according to this
595	   memo are expected to support all the topologies described in
596	   [I-D.ietf-avtcore-rtp-topologies-update] where the RTP endpoints send
597	   and receive unicast RTP packet streams to and from some peer device,
598	   provided that peer can participate in performing congestion control
599	   on the RTP packet streams.  The peer device could be another RTP
600	   endpoint, or it could be an RTP middlebox that redistributes the RTP
601	   packet streams to other RTP endpoints.  This limitation means that
602	   some of the RTP middlebox-based topologies are not suitable for use
603	   in the WebRTC environment.  Specifically:

605	   o  Video switching MCUs (Topo-Video-switch-MCU) SHOULD NOT be used,
606	      since they make the use of RTCP for congestion control and quality
607	      of service reports problematic (see Section 3.8 of
608	      [I-D.ietf-avtcore-rtp-topologies-update]).

610	   o  The Relay-Transport Translator (Topo-PtM-Trn-Translator) topology
611	      SHOULD NOT be used because its safe use requires a congestion
612	      control algorithm or RTP circuit breaker that handles point to
613	      multipoint, which has not yet been standardised.

615	   The following topology can be used, however it has some issues worth
616	   noting:

618	   o  Content modifying MCUs with RTCP termination (Topo-RTCP-
619	      terminating-MCU) MAY be used.  Note that in this RTP Topology, RTP
620	      loop detection and identification of active senders is the
621	      responsibility of the WebRTC application; since the clients are
622	      isolated from each other at the RTP layer, RTP cannot assist with
623	      these functions (see section 3.9 of
624	      [I-D.ietf-avtcore-rtp-topologies-update]).

626	   The RTP extensions described in Section 5.1.1 to Section 5.1.6 are
627	   designed to be used with centralised conferencing, where an RTP
628	   middlebox (e.g., a conference bridge) receives a participant's RTP
629	   packet streams and distributes them to the other participants.  These
630	   extensions are not necessary for interoperability; an RTP end-point
631	   that does not implement these extensions will work correctly, but
632	   might offer poor performance.  Support for the listed extensions will
633	   greatly improve the quality of experience and, to provide a
634	   reasonable baseline quality, some of these extensions are mandatory
635	   to be supported by WebRTC end-points.

637	   The RTCP conferencing extensions are defined in Extended RTP Profile
638	   for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/
639	   AVPF) [RFC4585] and the memo on Codec Control Messages (CCM) in RTP/
640	   AVPF [RFC5104]; they are fully usable by the Secure variant of this
641	   profile (RTP/SAVPF) [RFC5124].

643	5.1.1.  Full Intra Request (FIR)

645	   The Full Intra Request message is defined in Sections 3.5.1 and 4.3.1
646	   of the Codec Control Messages [RFC5104].  It is used to make the
647	   mixer request a new Intra picture from a participant in the session.
648	   This is used when switching between sources to ensure that the
649	   receivers can decode the video or other predictive media encoding
650	   with long prediction chains.  WebRTC senders MUST understand and
651	   react to FIR feedback messages they receiver, since this greatly
652	   improves the user experience when using centralised mixer-based
653	   conferencing.  Support for sending FIR messages is OPTIONAL.

655	5.1.2.  Picture Loss Indication (PLI)

657	   The Picture Loss Indication message is defined in Section 6.3.1 of
658	   the RTP/AVPF profile [RFC4585].  It is used by a receiver to tell the
659	   sending encoder that it lost the decoder context and would like to
660	   have it repaired somehow.  This is semantically different from the
661	   Full Intra Request above as there could be multiple ways to fulfil
662	   the request.  WebRTC senders MUST understand and react to PLI
663	   feedback messages as a loss tolerance mechanism.  Receivers MAY send
664	   PLI messages.

666	5.1.3.  Slice Loss Indication (SLI)

668	   The Slice Loss Indication message is defined in Section 6.3.2 of the
669	   RTP/AVPF profile [RFC4585].  It is used by a receiver to tell the
670	   encoder that it has detected the loss or corruption of one or more
671	   consecutive macro blocks, and would like to have these repaired
672	   somehow.  It is RECOMMENDED that receivers generate SLI feedback
673	   messages if slices are lost when using a codec that supports the
674	   concept of macro blocks.  A sender that receives an SLI feedback
675	   message SHOULD attempt to repair the lost slice(s).

677	5.1.4.  Reference Picture Selection Indication (RPSI)

679	   Reference Picture Selection Indication (RPSI) messages are defined in
680	   Section 6.3.3 of the RTP/AVPF profile [RFC4585].  Some video encoding
681	   standards allow the use of older reference pictures than the most
682	   recent one for predictive coding.  If such a codec is in use, and if
683	   the encoder has learnt that encoder-decoder synchronisation has been
684	   lost, then a known as correct reference picture can be used as a base
685	   for future coding.  The RPSI message allows this to be signalled.
686	   Receivers that detect that encoder-decoder synchronisation has been
687	   lost SHOULD generate an RPSI feedback message if codec being used
688	   supports reference picture selection.  A RTP packet stream sender
689	   that receives such an RPSI message SHOULD act on that messages to
690	   change the reference picture, if it is possible to do so within the
691	   available bandwidth constraints, and with the codec being used.

693	5.1.5.  Temporal-Spatial Trade-off Request (TSTR)

695	   The temporal-spatial trade-off request and notification are defined
696	   in Sections 3.5.2 and 4.3.2 of [RFC5104].  This request can be used
697	   to ask the video encoder to change the trade-off it makes between
698	   temporal and spatial resolution, for example to prefer high spatial
699	   image quality but low frame rate.  Support for TSTR requests and
700	   notifications is OPTIONAL.

702	5.1.6.  Temporary Maximum Media Stream Bit Rate Request (TMMBR)

704	   The TMMBR feedback message is defined in Sections 3.5.4 and 4.2.1 of
705	   the Codec Control Messages [RFC5104].  This request and its
706	   notification message are used by a media receiver to inform the
707	   sending party that there is a current limitation on the amount of
708	   bandwidth available to this receiver.  This can be various reasons
709	   for this: for example, an RTP mixer can use this message to limit the
710	   media rate of the sender being forwarded by the mixer (without doing
711	   media transcoding) to fit the bottlenecks existing towards the other
712	   session participants.  WebRTC senders are REQUIRED to implement
713	   support for TMMBR messages, and MUST follow bandwidth limitations set
714	   by a TMMBR message received for their SSRC.  The sending of TMMBR
715	   requests is OPTIONAL.

717	5.2.  Header Extensions

719	   The RTP specification [RFC3550] provides the capability to include
720	   RTP header extensions containing in-band data, but the format and
721	   semantics of the extensions are poorly specified.  The use of header
722	   extensions is OPTIONAL in the WebRTC context, but if they are used,
723	   they MUST be formatted and signalled following the general mechanism
724	   for RTP header extensions defined in [RFC5285], since this gives
725	   well-defined semantics to RTP header extensions.

727	   As noted in [RFC5285], the requirement from the RTP specification
728	   that header extensions are "designed so that the header extension may
729	   be ignored" [RFC3550] stands.  To be specific, header extensions MUST
730	   only be used for data that can safely be ignored by the recipient
731	   without affecting interoperability, and MUST NOT be used when the
732	   presence of the extension has changed the form or nature of the rest
733	   of the packet in a way that is not compatible with the way the stream
734	   is signalled (e.g., as defined by the payload type).  Valid examples
735	   of RTP header extensions might include metadata that is additional to
736	   the usual RTP information, but that can safely be ignored without
737	   compromising interoperability.

739	5.2.1.  Rapid Synchronisation

741	   Many RTP sessions require synchronisation between audio, video, and
742	   other content.  This synchronisation is performed by receivers, using
743	   information contained in RTCP SR packets, as described in the RTP
744	   specification [RFC3550].  This basic mechanism can be slow, however,
745	   so it is RECOMMENDED that the rapid RTP synchronisation extensions
746	   described in [RFC6051] be implemented in addition to RTCP SR-based
747	   synchronisation.  The rapid synchronisation extensions use the
748	   general RTP header extension mechanism [RFC5285], which requires
749	   signalling, but are otherwise backwards compatible.

751	5.2.2.  Client-to-Mixer Audio Level

753	   The Client to Mixer Audio Level extension [RFC6464] is an RTP header
754	   extension used by an endpoint to inform a mixer about the level of
755	   audio activity in the packet to which the header is attached.  This
756	   enables an RTP middlebox to make mixing or selection decisions
757	   without decoding or detailed inspection of the payload, reducing the
758	   complexity in some types of mixer.  It can also save decoding
759	   resources in receivers, which can choose to decode only the most
760	   relevant RTP packet streams based on audio activity levels.

762	   The Client-to-Mixer Audio Level [RFC6464] header extension is
763	   RECOMMENDED to be implemented.  If this header extension is
764	   implemented, it is REQUIRED that implementations are capable of
765	   encrypting the header extension according to [RFC6904] since the
766	   information contained in these header extensions can be considered
767	   sensitive.  It is further RECOMMENDED that this encryption is used,
768	   unless the encryption has been explicitly disabled through API or
769	   signalling.

771	5.2.3.  Mixer-to-Client Audio Level

773	   The Mixer to Client Audio Level header extension [RFC6465] provides
774	   an endpoint with the audio level of the different sources mixed into
775	   a common mix by a RTP mixer.  This enables a user interface to
776	   indicate the relative activity level of each session participant,
777	   rather than just being included or not based on the CSRC field.  This
778	   is a pure optimisations of non critical functions, and is hence
779	   OPTIONAL to implement.  If this header extension is implemented, it
780	   is REQUIRED that implementations are capable of encrypting the header
781	   extension according to [RFC6904] since the information contained in
782	   these header extensions can be considered sensitive.  It is further
783	   RECOMMENDED that this encryption is used, unless the encryption has
784	   been explicitly disabled through API or signalling.

786	6.  WebRTC Use of RTP: Improving Transport Robustness

788	   There are tools that can make RTP packet streams robust against
789	   packet loss and reduce the impact of loss on media quality.  However,
790	   they all add overhead compared to a non-robust stream.  The overhead
791	   needs to be considered, and the aggregate bit-rate MUST be rate
792	   controlled to avoid causing network congestion (see Section 7).  As a
793	   result, improving robustness might require a lower base encoding
794	   quality, but has the potential to deliver that quality with fewer
795	   errors.  The mechanisms described in the following sub-sections can
796	   be used to improve tolerance to packet loss.

798	6.1.  Negative Acknowledgements and RTP Retransmission

800	   As a consequence of supporting the RTP/SAVPF profile, implementations
801	   can send negative acknowledgements (NACKs) for RTP data packets
802	   [RFC4585].  This feedback can be used to inform a sender of the loss
803	   of particular RTP packets, subject to the capacity limitations of the
804	   RTCP feedback channel.  A sender can use this information to optimise
805	   the user experience by adapting the media encoding to compensate for
806	   known lost packets.

808	   RTP packet stream Senders are REQUIRED to understand the Generic NACK
809	   message defined in Section 6.2.1 of [RFC4585], but MAY choose to
810	   ignore some or all of this feedback (following Section 4.2 of
811	   [RFC4585]).  Receivers MAY send NACKs for missing RTP packets.
812	   Guidelines on when to send NACKs are provided in [RFC4585].  It is
813	   not expected that a receiver will send a NACK for every lost RTP
814	   packet, rather it needs to consider the cost of sending NACK
815	   feedback, and the importance of the lost packet, to make an informed
816	   decision on whether it is worth telling the sender about a packet
817	   loss event.

819	   The RTP Retransmission Payload Format [RFC4588] offers the ability to
820	   retransmit lost packets based on NACK feedback.  Retransmission needs
821	   to be used with care in interactive real-time applications to ensure
822	   that the retransmitted packet arrives in time to be useful, but can
823	   be effective in environments with relatively low network RTT (an RTP
824	   sender can estimate the RTT to the receivers using the information in
825	   RTCP SR and RR packets, as described at the end of Section 6.4.1 of
826	   [RFC3550]).  The use of retransmissions can also increase the forward
827	   RTP bandwidth, and can potentially caused increased packet loss if
828	   the original packet loss was caused by network congestion.  We note,
829	   however, that retransmission of an important lost packet to repair
830	   decoder state can have lower cost than sending a full intra frame.
831	   It is not appropriate to blindly retransmit RTP packets in response
832	   to a NACK.  The importance of lost packets and the likelihood of them
833	   arriving in time to be useful needs to be considered before RTP
834	   retransmission is used.

836	   Receivers are REQUIRED to implement support for RTP retransmission
837	   packets [RFC4588].  Senders MAY send RTP retransmission packets in
838	   response to NACKs if the RTP retransmission payload format has been
839	   negotiated for the session, and if the sender believes it is useful
840	   to send a retransmission of the packet(s) referenced in the NACK.  An
841	   RTP sender does not need to retransmit every NACKed packet.

843	6.2.  Forward Error Correction (FEC)

845	   The use of Forward Error Correction (FEC) can provide an effective
846	   protection against some degree of packet loss, at the cost of steady
847	   bandwidth overhead.  There are several FEC schemes that are defined
848	   for use with RTP.  Some of these schemes are specific to a particular
849	   RTP payload format, others operate across RTP packets and can be used
850	   with any payload format.  It needs to be noted that using redundant
851	   encoding or FEC will lead to increased play out delay, which needs to
852	   be considered when choosing the redundancy or FEC formats and their
853	   respective parameters.

855	   If an RTP payload format negotiated for use in a RTCPeerConnection
856	   supports redundant transmission or FEC as a standard feature of that
857	   payload format, then that support MAY be used in the
858	   RTCPeerConnection, subject to any appropriate signalling.

860	   There are several block-based FEC schemes that are designed for use
861	   with RTP independent of the chosen RTP payload format.  At the time
862	   of this writing there is no consensus on which, if any, of these FEC
863	   schemes is appropriate for use in the WebRTC context.  Accordingly,
864	   this memo makes no recommendation on the choice of block-based FEC
865	   for WebRTC use.

867	7.  WebRTC Use of RTP: Rate Control and Media Adaptation

869	   WebRTC will be used in heterogeneous network environments using a
870	   variety set of link technologies, including both wired and wireless
871	   links, to interconnect potentially large groups of users around the
872	   world.  As a result, the network paths between users can have widely
873	   varying one-way delays, available bit-rates, load levels, and traffic
874	   mixtures.  Individual end-points can send one or more RTP packet
875	   streams to each participant in a WebRTC conference, and there can be
876	   several participants.  Each of these RTP packet streams can contain
877	   different types of media, and the type of media, bit rate, and number
878	   of RTP packet streams as well as transport-layer flows can be highly
879	   asymmetric.  Non-RTP traffic can share the network paths with RTP
880	   transport-layer flows.  Since the network environment is not
881	   predictable or stable, WebRTC end-points MUST ensure that the RTP
882	   traffic they generate can adapt to match changes in the available
883	   network capacity.

885	   The quality of experience for users of WebRTC implementation is very
886	   dependent on effective adaptation of the media to the limitations of
887	   the network.  End-points have to be designed so they do not transmit
888	   significantly more data than the network path can support, except for
889	   very short time periods, otherwise high levels of network packet loss
890	   or delay spikes will occur, causing media quality degradation.  The
891	   limiting factor on the capacity of the network path might be the link
892	   bandwidth, or it might be competition with other traffic on the link
893	   (this can be non-WebRTC traffic, traffic due to other WebRTC flows,
894	   or even competition with other WebRTC flows in the same session).

896	   An effective media congestion control algorithm is therefore an
897	   essential part of the WebRTC framework.  However, at the time of this
898	   writing, there is no standard congestion control algorithm that can
899	   be used for interactive media applications such as WebRTC's flows.
900	   Some requirements for congestion control algorithms for
901	   RTCPeerConnections are discussed in [I-D.ietf-rmcat-cc-requirements].
902	   It is expected that a future version of this memo will mandate the
903	   use of a congestion control algorithm that satisfies these
904	   requirements.

906	7.1.  Boundary Conditions and Circuit Breakers

908	   In the absence of a concrete congestion control algorithm, all WebRTC
909	   implementations MUST implement the RTP circuit breaker algorithm that
910	   is described in [I-D.ietf-avtcore-rtp-circuit-breakers].  The RTP
911	   circuit breaker is designed to enable applications to recognise and
912	   react to situations of extreme network congestion.  However, since
913	   the RTP circuit breaker might not be triggered until congestion
914	   becomes extreme, it cannot be considered a substitute for congestion
915	   control, and applications MUST also implement congestion control to
916	   allow them to adapt to changes in network capacity.  Any future RTP
917	   congestion control algorithms are expected to operate within the
918	   envelope allowed by the circuit breaker.

920	   The session establishment signalling will also necessarily establish
921	   boundaries to which the media bit-rate will conform.  The choice of
922	   media codecs provides upper- and lower-bounds on the supported bit-
923	   rates that the application can utilise to provide useful quality, and
924	   the packetization choices that exist.  In addition, the signalling
925	   channel can establish maximum media bit-rate boundaries using the SDP
926	   "b=AS:" or "b=CT:" lines, and the RTP/AVPF Temporary Maximum Media
927	   Stream Bit Rate (TMMBR) Requests (see Section 5.1.6 of this memo).
928	   The combination of media codec choice and signalled bandwidth limits
929	   SHOULD be used to limit traffic based on known bandwidth limitations,
930	   for example the capacity of the edge links, to the extent possible.

932	7.2.  RTCP Limitations for Congestion Control

934	   Experience with the congestion control algorithms of TCP [RFC5681],
935	   TFRC [RFC5348], and DCCP [RFC4341], [RFC4342], [RFC4828], has shown
936	   that feedback on packet arrivals needs to be sent frequently (roughly
937	   once per round trip time is common).  We note that the real-time
938	   media traffic might not be able to adapt to changing path conditions
939	   as rapidly as elastic applications using TCP, but frequent feedback,
940	   perhaps on the order of once per video frame, is still needed to
941	   allow the congestion control algorithm to track the path dynamics.

943	   As an example of the type of RTCP congestion control feedback that is
944	   possible, consider one of the simplest scenarios for WebRTC: a point
945	   to point video call between two end systems.  There will be four RTP
946	   flows in this scenario, two audio and two video, with all four flows
947	   being active for essentially all the time (the audio flows will
948	   likely use voice activity detection and comfort noise to reduce the
949	   packet rate during silent periods, but doesn't cause transmissions to
950	   stop).  Assume all four flows are sent in a single RTP session, each
951	   using a separate SSRC.  Further, assume each SSRC sends RTCP reports
952	   for all other SSRCs in the session (i.e., the optimisations in
953	   [I-D.ietf-avtcore-rtp-multi-stream-optimisation] are not used, giving
954	   the worst case for the RTCP overhead).  When all members are senders
955	   like this, the RTCP timing rules in Sections 6.2 and 6.3 of [RFC3550]
956	   and [RFC4585] reduce to:

958	               rtcp_interval = avg_rtcp_size * n / rtcp_bw

960	   where avg_rtcp_size is measured in octets, and the rtcp_bw is the
961	   bandwidth available for RTCP.  The average RTCP size will depend on
962	   the amount of feedback that is sent in each RTCP packet, on the
963	   number of members in the session, and on the size of source
964	   description (RTCP SDES) information sent.  As a baseline, each RTCP
965	   packet will be a compound RTCP packet that contains an RTCP SR and an
966	   RTCP SDES packet.  In the scenario above, each RTCP SR packet will
967	   contain three report blocks, once for each of the other RTP SSRCs
968	   sending data, for a total of 100 octets (this is 8 octets header, 20
969	   octets sender info, and 3 * 24 octets report blocks).  The RTCP SDES
970	   packet will comprise a header (4 octets), an originating SSRC (4
971	   octets), a CNAME chunk, and padding.  If the CNAME follows [RFC7022]
972	   and it will be 19 octets in size, and require 1 octet of padding.
973	   The resulting compound RTCP packet will be 128 octets in size.  If
974	   sent in UDP/IPv4 with no IP options and using Secure RTP, which adds
975	   20 (IPv4) + 8 (UDP) + 14 (SRTP with 80 bit Authentication tag), the
976	   avg_rtcp_size will therefore be 170 octets, including the header
977	   overhead.  The value n is this scenario is 4, and the rtcp_bw is
978	   assumed to be 5% of the session bandwidth.

980	   If it is desired to send RTCP feedback packets on average 30 times
981	   per second, to correspond to one RTCP report every frame for 30fps
982	   video, we can invert the above rtcp_interval calculation to get an
983	   rtcp_bw that gives an interval of 1/30th of a second or lower.  This
984	   corresponds to an rtcp_bw of 20400 octets per second (since 1/30 =
985	   170 * 4 / 20400).  This is 163200 bits per second, which if 5% of the
986	   session bandwidth, gives a session bandwidth of approximately 3.3Mbps
987	   (i.e., 3.3Mbps media rate, plus an additional 5% for RTCP, to give a
988	   total data rate of approximately 3.4Mbps).  That is, RTCP can report
989	   on every frame of video provided the session bandwidth is 3.3Mbps or
990	   larger, when every SSRC sends a report for every video frame.  Please
991	   note that the actual RTCP transmission intervals will be within the
992	   interval [0.0135, 0.0406]s, but maintaining an average RTCP
993	   transmission interval of 0.033s.

995	      Note: To achieve the RTCP transmission intervals above the RTP/
996	      SAVPF profile with T_rr_interval=0 is used, since even when using
997	      the reduced minimal transmission interval, the RTP/SAVP profile
998	      would only allow sending RTCP at most every 0.11s (every third
999	      frame of video).  Using RTP/SAVPF with T_rr_interval=0 however is
1000	      capable of fully utilizing the configured 5% RTCP bandwidth
1001	      fraction.

1003	   If additional feedback beyond the standard report block is needed,
1004	   the session bandwidth needed will increase.  For example, with an
1005	   additional 20 octets data being reported in each RTCP packet, the
1006	   session bandwidth needed increases to 3.5Mbps for every SSRC to be
1007	   able to report on every frame.  However, the above baseline might not
1008	   be the most appropriate usage of the RTCP bandwidth.  Depending on
1009	   needs, a less frequent usage of regular RTCP compound packets,
1010	   controlled by T_rr_interval combined with using the reduced size RTCP
1011	   packets, can achieve more frequent and useful reporting.  Also the
1012	   reporting requirements defined in
1013	   [I-D.ietf-avtcore-rtp-multi-stream-optimisation] will reduced the
1014	   amount of bandwidth consumed for reporting when each endpoint has
1015	   multiple SSRCs.

1017	   Calculations such as these show that RTCP cannot be used to send per-
1018	   packet congestion feedback.  RTCP can, however, be used to send
1019	   congestion feedback on each frame of video sent in an interactive
1020	   video conferencing scenario, provided the RTCP parameters are
1021	   correctly configured and the overall session bandwidth exceeds a
1022	   couple of megabits per second (the exact rate depending on the number
1023	   of session participants, the RTCP bandwidth fraction, and whether
1024	   audio and video are sent in one or two RTP sessions).  Using similar
1025	   calculations, it can be shown that RTCP can likely also be used to
1026	   send feedback on a per-RTT basis, provided the RTT is not too low.

1028	   Interactive communication might not be able to afford to wait for
1029	   packet losses to occur to indicate congestion, because an increase in
1030	   play out delay due to queuing (most prominent in wireless networks)
1031	   can easily lead to packets being dropped due to late arrival at the
1032	   receiver.  Therefore, more sophisticated cues might need to be
1033	   reported -- to be defined in a suitable congestion control framework
1034	   as noted above -- which, in turn, increase the report size again.
1035	   For example, different RTCP XR report blocks (jointly) provide the
1036	   necessary details to implement a variety of congestion control
1037	   algorithms, but the (compound) report size grows quickly.

1039	7.3.  Congestion Control Interoperability and Legacy Systems

1041	   There are legacy RTP implementations that do not implement RTCP, and
1042	   hence do not provide any congestion feedback.  Congestion control
1043	   cannot be performed with these end-points.  WebRTC implementations
1044	   that need to interwork with such end-points MUST limit their
1045	   transmission to a low rate, equivalent to a VoIP call using a low
1046	   bandwidth codec, that is unlikely to cause any significant
1047	   congestion.

1049	   When interworking with legacy implementations that support RTCP using
1050	   the RTP/AVP profile [RFC3551], congestion feedback is provided in
1051	   RTCP RR packets every few seconds.  Implementations that have to
1052	   interwork with such end-points MUST ensure that they keep within the
1053	   RTP circuit breaker [I-D.ietf-avtcore-rtp-circuit-breakers]
1054	   constraints to limit the congestion they can cause.

1056	   If a legacy end-point supports RTP/AVPF, this enables negotiation of
1057	   important parameters for frequent reporting, such as the "trr-int"
1058	   parameter, and the possibility that the end-point supports some
1059	   useful feedback format for congestion control purpose such as TMMBR
1060	   [RFC5104].  Implementations that have to interwork with such end-
1061	   points MUST ensure that they stay within the RTP circuit breaker
1062	   [I-D.ietf-avtcore-rtp-circuit-breakers] constraints to limit the
1063	   congestion they can cause, but might find that they can achieve
1064	   better congestion response depending on the amount of feedback that
1065	   is available.

1067	   With proprietary congestion control algorithms issues can arise when
1068	   different algorithms and implementations interact in a communication
1069	   session.  If the different implementations have made different
1070	   choices in regards to the type of adaptation, for example one sender
1071	   based, and one receiver based, then one could end up in situation
1072	   where one direction is dual controlled, when the other direction is
1073	   not controlled.  This memo cannot mandate behaviour for proprietary
1074	   congestion control algorithms, but implementations that use such
1075	   algorithms ought to be aware of this issue, and try to ensure that
1076	   both effective congestion control is negotiated for media flowing in
1077	   both directions.  If the IETF were to standardise both sender- and
1078	   receiver-based congestion control algorithms for WebRTC traffic in
1079	   the future, the issues of interoperability, control, and ensuring
1080	   that both directions of media flow are congestion controlled would
1081	   also need to be considered.

1083	8.  WebRTC Use of RTP: Performance Monitoring

1085	   As described in Section 4.1, implementations are REQUIRED to generate
1086	   RTCP Sender Report (SR) and Reception Report (RR) packets relating to
1087	   the RTP packet streams they send and receive.  These RTCP reports can
1088	   be used for performance monitoring purposes, since they include basic
1089	   packet loss and jitter statistics.

1091	   A large number of additional performance metrics are supported by the
1092	   RTCP Extended Reports (XR) framework [RFC3611][RFC6792].  At the time
1093	   of this writing, it is not clear what extended metrics are suitable
1094	   for use in the WebRTC context, so there is no requirement that
1095	   implementations generate RTCP XR packets.  However, implementations
1096	   that can use detailed performance monitoring data MAY generate RTCP
1097	   XR packets as appropriate; the use of such packets SHOULD be
1098	   signalled in advance.

1100	   All WebRTC implementations MUST be prepared to receive RTP XR report
1101	   packets, whether or not they were signalled.  There is no requirement
1102	   that the data contained in such reports be used, or exposed to the
1103	   Javascript application, however.

1105	9.  WebRTC Use of RTP: Future Extensions

1107	   It is possible that the core set of RTP protocols and RTP extensions
1108	   specified in this memo will prove insufficient for the future needs
1109	   of WebRTC applications.  In this case, future updates to this memo
1110	   MUST be made following the Guidelines for Writers of RTP Payload
1111	   Format Specifications [RFC2736], How to Write an RTP Payload Format
1112	   [I-D.ietf-payload-rtp-howto] and Guidelines for Extending the RTP
1113	   Control Protocol [RFC5968], and SHOULD take into account any future
1114	   guidelines for extending RTP and related protocols that have been
1115	   developed.

1117	   Authors of future extensions are urged to consider the wide range of
1118	   environments in which RTP is used when recommending extensions, since
1119	   extensions that are applicable in some scenarios can be problematic
1120	   in others.  Where possible, the WebRTC framework will adopt RTP
1121	   extensions that are of general utility, to enable easy implementation
1122	   of a gateway to other applications using RTP, rather than adopt
1123	   mechanisms that are narrowly targeted at specific WebRTC use cases.

1125	10.  Signalling Considerations

1127	   RTP is built with the assumption that an external signalling channel
1128	   exists, and can be used to configure RTP sessions and their features.
1129	   The basic configuration of an RTP session consists of the following
1130	   parameters:

1132	   RTP Profile:  The name of the RTP profile to be used in session.  The
1133	      RTP/AVP [RFC3551] and RTP/AVPF [RFC4585] profiles can interoperate
1134	      on basic level, as can their secure variants RTP/SAVP [RFC3711]
1135	      and RTP/SAVPF [RFC5124].  The secure variants of the profiles do
1136	      not directly interoperate with the non-secure variants, due to the
1137	      presence of additional header fields for authentication in SRTP
1138	      packets and cryptographic transformation of the payload.  WebRTC
1139	      requires the use of the RTP/SAVPF profile, and this MUST be
1140	      signalled if SDP is used.  Interworking functions might transform
1141	      this into the RTP/SAVP profile for a legacy use case, by
1142	      indicating to the WebRTC end-point that the RTP/SAVPF is used, and
1143	      limiting the usage of the "a=rtcp-fb:" attribute to indicate a
1144	      trr-int value of 4 seconds.

1146	   Transport Information:  Source and destination IP address(s) and
1147	      ports for RTP and RTCP MUST be signalled for each RTP session.  In
1148	      WebRTC these transport addresses will be provided by ICE that
1149	      signals candidates and arrives at nominated candidate address
1150	      pairs.  If RTP and RTCP multiplexing [RFC5761] is to be used, such
1151	      that a single port, i.e. transport-layer flow, is used for RTP and
1152	      RTCP flows, this MUST be signalled (see Section 4.5).

1154	   RTP Payload Types, media formats, and format parameters:  The mapping
1155	      between media type names (and hence the RTP payload formats to be
1156	      used), and the RTP payload type numbers MUST be signalled.  Each
1157	      media type MAY also have a number of media type parameters that
1158	      MUST also be signalled to configure the codec and RTP payload
1159	      format (the "a=fmtp:" line from SDP).  Section 4.3 of this memo
1160	      discusses requirements for uniqueness of payload types.

1162	   RTP Extensions:  The RTP extensions to be used SHOULD be agreed upon,
1163	      including any parameters for each respective extension.  At the
1164	      very least, this will help avoiding using bandwidth for features
1165	      that the other end-point will ignore.  But for certain mechanisms
1166	      there is requirement for this to happen as interoperability
1167	      failure otherwise happens.

1169	   RTCP Bandwidth:  Support for exchanging RTCP Bandwidth values to the
1170	      end-points will be necessary.  This SHALL be done as described in
1171	      "Session Description Protocol (SDP) Bandwidth Modifiers for RTP
1172	      Control Protocol (RTCP) Bandwidth" [RFC3556], or something
1173	      semantically equivalent.  This also ensures that the end-points
1174	      have a common view of the RTCP bandwidth, this is important as too
1175	      different view of the bandwidths can lead to failure to
1176	      interoperate.

1178	   These parameters are often expressed in SDP messages conveyed within
1179	   an offer/answer exchange.  RTP does not depend on SDP or on the offer
1180	   /answer model, but does require all the necessary parameters to be
1181	   agreed upon, and provided to the RTP implementation.  We note that in
1182	   the WebRTC context it will depend on the signalling model and API how
1183	   these parameters need to be configured but they will be need to
1184	   either set in the API or explicitly signalled between the peers.

1186	11.  WebRTC API Considerations

1188	   The WebRTC API [W3C.WD-webrtc-20130910] and the Media Capture and
1189	   Streams API [W3C.WD-mediacapture-streams-20130903] defines and uses
1190	   the concept of a MediaStream that consists of zero or more
1191	   MediaStreamTracks.  A MediaStreamTrack is an individual stream of
1192	   media from any type of media source like a microphone or a camera,
1193	   but also conceptual sources, like a audio mix or a video composition,
1194	   are possible.  The MediaStreamTracks within a MediaStream need to be
1195	   possible to play out synchronised.

1197	   A MediaStreamTrack's realisation in RTP in the context of an
1198	   RTCPeerConnection consists of a source packet stream identified with
1199	   an SSRC within an RTP session part of the RTCPeerConnection.  The
1200	   MediaStreamTrack can also result in additional packet streams, and
1201	   thus SSRCs, in the same RTP session.  These can be dependent packet
1202	   streams from scalable encoding of the source stream associated with
1203	   the MediaStreamTrack, if such a media encoder is used.  They can also
1204	   be redundancy packet streams, these are created when applying Forward
1205	   Error Correction (Section 6.2) or RTP retransmission (Section 6.1) to
1206	   the source packet stream.

1208	   It is important to note that the same media source can be feeding
1209	   multiple MediaStreamTracks.  As different sets of constraints or
1210	   other parameters can be applied to the MediaStreamTrack, each
1211	   MediaStreamTrack instance added to a RTCPeerConnection SHALL result
1212	   in an independent source packet stream, with its own set of
1213	   associated packet streams, and thus different SSRC(s).  It will
1214	   depend on applied constraints and parameters if the source stream and
1215	   the encoding configuration will be identical between different
1216	   MediaStreamTracks sharing the same media source.  Thus it is possible
1217	   for multiple source packet streams to share encoded streams (but not
1218	   packet streams), but this is an implementation choice to try to
1219	   utilise such optimisations.  Note that such optimizations would need
1220	   to take into account that the constraints for one of the
1221	   MediaStreamTracks can at any moment change, meaning that the encoding
1222	   configurations might no longer be identical.

1224	   The same MediaStreamTrack can also be included in multiple
1225	   MediaStreams, thus multiple sets of MediaStreams can implicitly need
1226	   to use the same synchronisation base.  To ensure that this works in
1227	   all cases, and don't forces a end-point to change synchronisation
1228	   base and CNAME in the middle of a ongoing delivery of any packet
1229	   streams, which would cause media disruption; all MediaStreamTracks
1230	   and their associated SSRCs originating from the same end-point needs
1231	   to be sent using the same CNAME within one RTCPeerConnection.  This
1232	   is motivating the strong recommendation in Section 4.9 to only use a
1233	   single CNAME.

1235	      The requirement on using the same CNAME for all SSRCs that
1236	      originates from the same end-point, does not require middleboxes
1237	      that forwards traffic from multiple end-points to only use a
1238	      single CNAME.

1240	   Different CNAMEs normally need to be used for different
1241	   RTCPeerConnection instances, as specified in Section 4.9.  Having two
1242	   communication sessions with the same CNAME could enable tracking of a
1243	   user or device across different services (see Section 4.4.1 of
1244	   [I-D.ietf-rtcweb-security] for details).  A web application can
1245	   request that the CNAMEs used in different RTCPeerConnection within a
1246	   same-orign context to be the same, this allow for synchronization of
1247	   the endpoint's RTP packet streams across the different
1248	   RTCPeerConnections.

1250	      Note: this doesn't result in a tracking issue, since the creation
1251	      of matching CNAMEs depends on existing tracking.

1253	   The above will currently force a WebRTC end-point that receives an
1254	   MediaStreamTrack on one RTCPeerConnection and adds it as an outgoing
1255	   on any RTCPeerConnection to perform resynchronisation of the stream.
1256	   This, as the sending party needs to change the CNAME, which implies
1257	   that it has to use a locally available system clock as timebase for
1258	   the synchronisation.  Thus, the relative relation between the
1259	   timebase of the incoming stream and the system sending out needs to
1260	   defined.  This relation also needs monitoring for clock drift and
1261	   likely adjustments of the synchronisation.  The sending entity is
1262	   also responsible for congestion control for its the sent streams.  In
1263	   cases of packet loss the loss of incoming data also needs to be
1264	   handled.  This leads to the observation that the method that is least
1265	   likely to cause issues or interruptions in the outgoing source packet
1266	   stream is a model of full decoding, including repair etc followed by
1267	   encoding of the media again into the outgoing packet stream.
1268	   Optimisations of this method is clearly possible and implementation
1269	   specific.

1271	   A WebRTC end-point MUST support receiving multiple MediaStreamTracks,
1272	   where each of different MediaStreamTracks (and their sets of
1273	   associated packet streams) uses different CNAMEs.  However,
1274	   MediaStreamTracks that are received with different CNAMEs have no
1275	   defined synchronisation.

1277	      Note: The motivation for supporting reception of multiple CNAMEs
1278	      are to allow for forward compatibility with any future changes
1279	      that enables more efficient stream handling when end-points relay/
1280	      forward streams.  It also ensures that end-points can interoperate
1281	      with certain types of multi-stream middleboxes or end-points that
1282	      are not WebRTC.

1284	   The binding between the WebRTC MediaStreams, MediaStreamTracks and
1285	   the SSRC is done as specified in "Cross Session Stream Identification
1286	   in the Session Description Protocol" [I-D.ietf-mmusic-msid].  This
1287	   document [I-D.ietf-mmusic-msid] also defines, in section 4.1, how to
1288	   map unknown source packet stream SSRCs to MediaStreamTracks and
1289	   MediaStreams.  Commonly the RTP Payload Type of any incoming packets
1290	   will reveal if the packet stream is a source stream or a redundancy
1291	   or dependent packet stream.  The association to the correct source
1292	   packet stream depends on the payload format in use for the packet
1293	   stream.

1295	   Finally this specification puts a requirement on the WebRTC API to
1296	   realize a method for determining the CSRC list (Section 4.1) as well
1297	   as the Mixer-to-Client audio levels (Section 5.2.3) (when supported)
1298	   and the basic requirements for this is further discussed in
1299	   Section 12.2.1.

1301	12.  RTP Implementation Considerations

1303	   The following discussion provides some guidance on the implementation
1304	   of the RTP features described in this memo.  The focus is on a WebRTC
1305	   end-point implementation perspective, and while some mention is made
1306	   of the behaviour of middleboxes, that is not the focus of this memo.

1308	12.1.  Configuration and Use of RTP Sessions

1310	   A WebRTC end-point will be a simultaneous participant in one or more
1311	   RTP sessions.  Each RTP session can convey multiple media sources,
1312	   and can include media data from multiple end-points.  In the
1313	   following, we outline some ways in which WebRTC end-points can
1314	   configure and use RTP sessions.

1316	12.1.1.  Use of Multiple Media Sources Within an RTP Session

1318	   RTP is a group communication protocol, and every RTP session can
1319	   potentially contain multiple RTP packet streams.  There are several
1320	   reasons why this might be desirable:

1322	   Multiple media types:  Outside of WebRTC, it is common to use one RTP
1323	      session for each type of media sources (e.g., one RTP session for
1324	      audio sources and one for video sources, each sent over different
1325	      transport layer flows).  However, to reduce the number of UDP
1326	      ports used, the default in WebRTC is to send all types of media in
1327	      a single RTP session, as described in Section 4.4, using RTP and
1328	      RTCP multiplexing (Section 4.5) to further reduce the number of
1329	      UDP ports needed.  This RTP session then uses only one bi-
1330	      directional transport-layer flow, but will contain multiple RTP
1331	      packet streams, each containing a different type of media.  A
1332	      common example might be an end-point with a camera and microphone
1333	      that sends two RTP packet streams, one video and one audio, into a
1334	      single RTP session.

1336	   Multiple Capture Devices:  A WebRTC end-point might have multiple
1337	      cameras, microphones, or other media capture devices, and so might
1338	      want to generate several RTP packet streams of the same media
1339	      type.  Alternatively, it might want to send media from a single
1340	      capture device in several different formats or quality settings at
1341	      once.  Both can result in a single end-point sending multiple RTP
1342	      packet streams of the same media type into a single RTP session at
1343	      the same time.

1345	   Associated Repair Data:  An end-point might send a RTP packet stream
1346	      that is somehow associated with another stream.  For example, it
1347	      might send an RTP packet stream that contains FEC or
1348	      retransmission data relating to another stream.  Some RTP payload
1349	      formats send this sort of associated repair data as part of the
1350	      source packet stream, while others send it as a separate packet
1351	      stream.

1353	   Layered or Multiple Description Coding:  An end-point can use a
1354	      layered media codec, for example H.264 SVC, or a multiple
1355	      description codec, that generates multiple RTP packet streams,
1356	      each with a distinct RTP SSRC, within a single RTP session.

1358	   RTP Mixers, Translators, and Other Middleboxes:  An RTP session, in
1359	      the WebRTC context, is a point-to-point association between an
1360	      end-point and some other peer device, where those devices share a
1361	      common SSRC space.  The peer device might be another WebRTC end-
1362	      point, or it might be an RTP mixer, translator, or some other form
1363	      of media processing middlebox.  In the latter cases, the middlebox
1364	      might send mixed or relayed RTP streams from several participants,
1365	      that the WebRTC end-point will need to render.  Thus, even though
1366	      a WebRTC end-point might only be a member of a single RTP session,
1367	      the peer device might be extending that RTP session to incorporate
1368	      other end-points.  WebRTC is a group communication environment and
1369	      end-points need to be capable of receiving, decoding, and playing
1370	      out multiple RTP packet streams at once, even in a single RTP
1371	      session.

1373	12.1.2.  Use of Multiple RTP Sessions

1375	   In addition to sending and receiving multiple RTP packet streams
1376	   within a single RTP session, a WebRTC end-point might participate in
1377	   multiple RTP sessions.  There are several reasons why a WebRTC end-
1378	   point might choose to do this:

1380	   To interoperate with legacy devices:  The common practice in the non-
1381	      WebRTC world is to send different types of media in separate RTP
1382	      sessions, for example using one RTP session for audio and another
1383	      RTP session, on a separate transport layer flow, for video.  All
1384	      WebRTC end-points need to support the option of sending different
1385	      types of media on different RTP sessions, so they can interwork
1386	      with such legacy devices.  This is discussed further in
1387	      Section 4.4.

1389	   To provide enhanced quality of service:  Some network-based quality
1390	      of service mechanisms operate on the granularity of transport
1391	      layer flows.  If it is desired to use these mechanisms to provide
1392	      differentiated quality of service for some RTP packet streams,
1393	      then those RTP packet streams need to be sent in a separate RTP
1394	      session using a different transport-layer flow, and with
1395	      appropriate quality of service marking.  This is discussed further
1396	      in Section 12.1.3.

1398	   To separate media with different purposes:  An end-point might want
1399	      to send RTP packet streams that have different purposes on
1400	      different RTP sessions, to make it easy for the peer device to
1401	      distinguish them.  For example, some centralised multiparty
1402	      conferencing systems display the active speaker in high
1403	      resolution, but show low resolution "thumbnails" of other
1404	      participants.  Such systems might configure the end-points to send
1405	      simulcast high- and low-resolution versions of their video using
1406	      separate RTP sessions, to simplify the operation of the RTP
1407	      middlebox.  In the WebRTC context this is currently possible to
1408	      accomplished by establishing multiple WebRTC MediaStreamTracks
1409	      that have the same media source in one (or more)
1410	      RTCPeerConnection.  Each MediaStreamTrack is then configured to
1411	      deliver a particular media quality and thus media bit-rate, and
1412	      will produce an independently encoded version with the codec
1413	      parameters agreed specifically in the context of that
1414	      RTCPeerConnection.  The RTP middlebox can distinguish packets
1415	      corresponding to the low- and high-resolution streams by
1416	      inspecting their SSRC, RTP payload type, or some other information
1417	      contained in RTP payload, RTP header extension or RTCP packets,
1418	      but it can be easier to distinguish the RTP packet streams if they
1419	      arrive on separate RTP sessions on separate transport-layer flows.

1421	   To directly connect with multiple peers:  A multi-party conference
1422	      does not need to use an RTP middlebox.  Rather, a multi-unicast
1423	      mesh can be created, comprising several distinct RTP sessions,
1424	      with each participant sending RTP traffic over a separate RTP
1425	      session (that is, using an independent RTCPeerConnection object)
1426	      to every other participant, as shown in Figure 1.  This topology
1427	      has the benefit of not requiring an RTP middlebox node that is
1428	      trusted to access and manipulate the media data.  The downside is
1429	      that it increases the used bandwidth at each sender by requiring
1430	      one copy of the RTP packet streams for each participant that are
1431	      part of the same session beyond the sender itself.

1433	   +---+     +---+
1434	   | A |<--->| B |
1435	   +---+     +---+
1436	     ^         ^
1437	      \       /
1438	       \     /
1439	        v   v
1440	        +---+
1441	        | C |
1442	        +---+

1444	            Figure 1: Multi-unicast using several RTP sessions

1446	      The multi-unicast topology could also be implemented as a single
1447	      RTP session, spanning multiple peer-to-peer transport layer
1448	      connections, or as several pairwise RTP sessions, one between each
1449	      pair of peers.  To maintain a coherent mapping between the
1450	      relation between RTP sessions and RTCPeerConnection objects we
1451	      recommend that this is implemented as several individual RTP
1452	      sessions.  The only downside is that end-point A will not learn of
1453	      the quality of any transmission happening between B and C, since
1454	      it will not see RTCP reports for the RTP session between B and C,
1455	      whereas it would it all three participants were part of a single
1456	      RTP session.  Experience with the Mbone tools (experimental RTP-
1457	      based multicast conferencing tools from the late 1990s) has showed
1458	      that RTCP reception quality reports for third parties can usefully
1459	      be presented to the users in a way that helps them understand
1460	      asymmetric network problems, and the approach of using separate
1461	      RTP sessions prevents this.  However, an advantage of using
1462	      separate RTP sessions is that it enables using different media
1463	      bit-rates and RTP session configurations between the different
1464	      peers, thus not forcing B to endure the same quality reductions if
1465	      there are limitations in the transport from A to C as C will.  It
1466	      it believed that these advantages outweigh the limitations in
1467	      debugging power.

1469	   To indirectly connect with multiple peers:  A common scenario in
1470	      multi-party conferencing is to create indirect connections to
1471	      multiple peers, using an RTP mixer, translator, or some other type
1472	      of RTP middlebox.  Figure 2 outlines a simple topology that might
1473	      be used in a four-person centralised conference.  The middlebox
1474	      acts to optimise the transmission of RTP packet streams from
1475	      certain perspectives, either by only sending some of the received
1476	      RTP packet stream to any given receiver, or by providing a
1477	      combined RTP packet stream out of a set of contributing streams.

1479	   +---+      +-------------+      +---+
1480	   | A |<---->|             |<---->| B |
1481	   +---+      | RTP mixer,  |      +---+
1482	              | translator, |
1483	              | or other    |
1484	   +---+      | middlebox   |      +---+
1485	   | C |<---->|             |<---->| D |
1486	   +---+      +-------------+      +---+

1488	                Figure 2: RTP mixer with only unicast paths

1490	      There are various methods of implementation for the middlebox.  If
1491	      implemented as a standard RTP mixer or translator, a single RTP
1492	      session will extend across the middlebox and encompass all the
1493	      end-points in one multi-party session.  Other types of middlebox
1494	      might use separate RTP sessions between each end-point and the
1495	      middlebox.  A common aspect is that these RTP middleboxes can use
1496	      a number of tools to control the media encoding provided by a
1497	      WebRTC end-point.  This includes functions like requesting
1498	      breaking the encoding chain and have the encoder produce a so
1499	      called Intra frame.  Another is limiting the bit-rate of a given
1500	      stream to better suit the mixer view of the multiple down-streams.
1501	      Others are controlling the most suitable frame-rate, picture
1502	      resolution, the trade-off between frame-rate and spatial quality.
1503	      The middlebox gets the significant responsibility to correctly
1504	      perform congestion control, source identification, manage
1505	      synchronisation while providing the application with suitable
1506	      media optimizations.  The middlebox is also has to be a trusted
1507	      node when it comes to security, since it manipulates either the
1508	      RTP header or the media itself (or both) received from one end-
1509	      point, before sending it on towards the end-point(s), thus they
1510	      need to be able to decrypt and then encrypt it before sending it
1511	      out.

1513	      RTP Mixers can create a situation where an end-point experiences a
1514	      situation in-between a session with only two end-points and
1515	      multiple RTP sessions.  Mixers are expected to not forward RTCP
1516	      reports regarding RTP packet streams across themselves.  This is
1517	      due to the difference in the RTP packet streams provided to the
1518	      different end-points.  The original media source lacks information
1519	      about a mixer's manipulations prior to sending it the different
1520	      receivers.  This scenario also results in that an end-point's
1521	      feedback or requests goes to the mixer.  When the mixer can't act
1522	      on this by itself, it is forced to go to the original media source
1523	      to fulfil the receivers request.  This will not necessarily be
1524	      explicitly visible any RTP and RTCP traffic, but the interactions
1525	      and the time to complete them will indicate such dependencies.

1527	      Providing source authentication in multi-party scenarios is a
1528	      challenge.  In the mixer-based topologies, end-points source
1529	      authentication is based on, firstly, verifying that media comes
1530	      from the mixer by cryptographic verification and, secondly, trust
1531	      in the mixer to correctly identify any source towards the end-
1532	      point.  In RTP sessions where multiple end-points are directly
1533	      visible to an end-point, all end-points will have knowledge about
1534	      each others' master keys, and can thus inject packets claimed to
1535	      come from another end-point in the session.  Any node performing
1536	      relay can perform non-cryptographic mitigation by preventing
1537	      forwarding of packets that have SSRC fields that came from other
1538	      end-points before.  For cryptographic verification of the source
1539	      SRTP would require additional security mechanisms, for example
1540	      TESLA for SRTP [RFC4383], that are not part of the base WebRTC
1541	      standards.

1543	   To forward media between multiple peers:  It is sometimes desirable
1544	      for an end-point that receives an RTP packet stream to be able to
1545	      forward that RTP packet stream to a third party.  The are some
1546	      obvious security and privacy implications in supporting this, but
1547	      also potential uses.  This is supported in the W3C API by taking
1548	      the received and decoded media and using it as media source that
1549	      is re-encoding and transmitted as a new stream.

1551	      At the RTP layer, media forwarding acts as a back-to-back RTP
1552	      receiver and RTP sender.  The receiving side terminates the RTP
1553	      session and decodes the media, while the sender side re-encodes
1554	      and transmits the media using an entirely separate RTP session.
1555	      The original sender will only see a single receiver of the media,
1556	      and will not be able to tell that forwarding is happening based on
1557	      RTP-layer information since the RTP session that is used to send
1558	      the forwarded media is not connected to the RTP session on which
1559	      the media was received by the node doing the forwarding.

1561	      The end-point that is performing the forwarding is responsible for
1562	      producing an RTP packet stream suitable for onwards transmission.
1563	      The outgoing RTP session that is used to send the forwarded media
1564	      is entirely separate to the RTP session on which the media was
1565	      received.  This will require media transcoding for congestion
1566	      control purpose to produce a suitable bit-rate for the outgoing
1567	      RTP session, reducing media quality and forcing the forwarding
1568	      end-point to spend the resource on the transcoding.  The media
1569	      transcoding does result in a separation of the two different legs
1570	      removing almost all dependencies, and allowing the forwarding end-
1571	      point to optimize its media transcoding operation.  The cost is
1572	      greatly increased computational complexity on the forwarding node.
1573	      Receivers of the forwarded stream will see the forwarding device
1574	      as the sender of the stream, and will not be able to tell from the
1575	      RTP layer that they are receiving a forwarded stream rather than
1576	      an entirely new RTP packet stream generated by the forwarding
1577	      device.

1579	12.1.3.  Differentiated Treatment of RTP Packet Streams

1581	   There are use cases for differentiated treatment of RTP packet
1582	   streams.  Such differentiation can happen at several places in the
1583	   system.  First of all is the prioritization within the end-point
1584	   sending the media, which controls, both which RTP packet streams that
1585	   will be sent, and their allocation of bit-rate out of the current
1586	   available aggregate as determined by the congestion control.

1588	   It is expected that the WebRTC API [W3C.WD-webrtc-20130910] will
1589	   allow the application to indicate relative priorities for different
1590	   MediaStreamTracks.  These priorities can then be used to influence
1591	   the local RTP processing, especially when it comes to congestion
1592	   control response in how to divide the available bandwidth between the
1593	   RTP packet streams.  Any changes in relative priority will also need
1594	   to be considered for RTP packet streams that are associated with the
1595	   main RTP packet streams, such as redundant streams for RTP
1596	   retransmission and FEC.  The importance of such redundant RTP packet
1597	   streams is dependent on the media type and codec used, in regards to
1598	   how robust that codec is to packet loss.  However, a default policy
1599	   might to be to use the same priority for redundant RTP packet stream
1600	   as for the source RTP packet stream.

1602	   Secondly, the network can prioritize transport-layer flows and sub-
1603	   flows, including RTP packet streams.  Typically, differential
1604	   treatment includes two steps, the first being identifying whether an
1605	   IP packet belongs to a class that has to be treated differently, the
1606	   second the actual mechanism to prioritize packets.  This is done
1607	   according to three methods:

1609	   DiffServ:  The end-point marks a packet with a DiffServ code point to
1610	      indicate to the network that the packet belongs to a particular
1611	      class.

1613	   Flow based:  Packets that need to be given a particular treatment are
1614	      identified using a combination of IP and port address.

1616	   Deep Packet Inspection:  A network classifier (DPI) inspects the
1617	      packet and tries to determine if the packet represents a
1618	      particular application and type that is to be prioritized.

1620	   Flow-based differentiation will provide the same treatment to all
1621	   packets within a transport-layer flow, i.e., relative prioritization
1622	   is not possible.  Moreover, if the resources are limited it might not
1623	   be possible to provide differential treatment compared to best-effort
1624	   for all the RTP packet streams in a WebRTC application.  When flow-
1625	   based differentiation is available the WebRTC application needs to
1626	   know about it so that it can provide the separation of the RTP packet
1627	   streams onto different UDP flows to enable a more granular usage of
1628	   flow based differentiation.  That way at least providing different
1629	   prioritization of audio and video if desired by application.

1631	   DiffServ assumes that either the end-point or a classifier can mark
1632	   the packets with an appropriate DSCP so that the packets are treated
1633	   according to that marking.  If the end-point is to mark the traffic
1634	   two requirements arise in the WebRTC context: 1) The WebRTC
1635	   application or browser has to know which DSCP to use and that it can
1636	   use them on some set of RTP packet streams. 2) The information needs
1637	   to be propagated to the operating system when transmitting the
1638	   packet.  Details of this process are outside the scope of this memo
1639	   and are further discussed in "DSCP and other packet markings for
1640	   RTCWeb QoS" [I-D.ietf-tsvwg-rtcweb-qos].

1642	   For packet based marking schemes it might be possible to mark
1643	   individual RTP packets differently based on the relative priority of
1644	   the RTP payload.  For example video codecs that have I, P, and B
1645	   pictures could prioritise any payloads carrying only B frames less,
1646	   as these are less damaging to loose.  However, depending on the QoS
1647	   mechanism and what markings that are applied, this can result in not
1648	   only different packet drop probabilities but also packet reordering,
1649	   see [I-D.ietf-tsvwg-rtcweb-qos] for further discussion.  As default
1650	   policy all RTP packets related to a RTP packet stream ought to be
1651	   provided with the same prioritization; per-packet prioritization is
1652	   outside the scope of this memo, but might be specified elsewhere in
1653	   future.

1655	   It is also important to consider how RTCP packets associated with a
1656	   particular RTP packet stream need to be marked.  RTCP compound
1657	   packets with Sender Reports (SR), ought to be marked with the same
1658	   priority as the RTP packet stream itself, so the RTCP-based round-
1659	   trip time (RTT) measurements are done using the same transport-layer
1660	   flow priority as the RTP packet stream experiences.  RTCP compound
1661	   packets containing RR packet ought to be sent with the priority used
1662	   by the majority of the RTP packet streams reported on.  RTCP packets
1663	   containing time-critical feedback packets can use higher priority to
1664	   improve the timeliness and likelihood of delivery of such feedback.

1666	12.2.  Media Source, RTP Packet Streams, and Participant Identification
1667	12.2.1.  Media Source

1669	   Each RTP packet stream is identified by a unique synchronisation
1670	   source (SSRC) identifier.  The SSRC identifier is carried in each of
1671	   the RTP packets comprising a RTP packet stream, and is also used to
1672	   identify that stream in the corresponding RTCP reports.  The SSRC is
1673	   chosen as discussed in Section 4.8.  The first stage in
1674	   demultiplexing RTP and RTCP packets received on a single transport
1675	   layer flow at a WebRTC end-point is to separate the RTP packet
1676	   streams based on their SSRC value; once that is done, additional
1677	   demultiplexing steps can determine how and where to render the media.

1679	   RTP allows a mixer, or other RTP-layer middlebox, to combine encoded
1680	   streams from multiple media sources to form a new encoded stream from
1681	   a new media source (the mixer).  The RTP packets in that new RTP
1682	   packet stream can include a Contributing Source (CSRC) list,
1683	   indicating which original SSRCs contributed to the combined source
1684	   stream.  As described in Section 4.1, implementations need to support
1685	   reception of RTP data packets containing a CSRC list and RTCP packets
1686	   that relate to sources present in the CSRC list.  The CSRC list can
1687	   change on a packet-by-packet basis, depending on the mixing operation
1688	   being performed.  Knowledge of what media sources contributed to a
1689	   particular RTP packet can be important if the user interface
1690	   indicates which participants are active in the session.  Changes in
1691	   the CSRC list included in packets needs to be exposed to the WebRTC
1692	   application using some API, if the application is to be able to track
1693	   changes in session participation.  It is desirable to map CSRC values
1694	   back into WebRTC MediaStream identities as they cross this API, to
1695	   avoid exposing the SSRC/CSRC name space to JavaScript applications.

1697	   If the mixer-to-client audio level extension [RFC6465] is being used
1698	   in the session (see Section 5.2.3), the information in the CSRC list
1699	   is augmented by audio level information for each contributing source.
1700	   This information can usefully be exposed in the user interface.

1702	12.2.2.  SSRC Collision Detection

1704	   The RTP standard [RFC3550] requires any RTP implementation to have
1705	   support for detecting and handling SSRC collisions, i.e., resolve the
1706	   conflict when two different end-points use the same SSRC value.  This
1707	   requirement also applies to WebRTC end-points.  There are several
1708	   scenarios where SSRC collisions can occur:

1710	   o  In a point-to-point session where each SSRC is associated with
1711	      either of the two end-points and where the main media carrying
1712	      SSRC identifier will be announced in the signalling channel, a
1713	      collision is less likely to occur due to the information about
1714	      used SSRCs provided by Source-Specific SDP Attributes [RFC5576].

1716	      Still, collisions can occur if both end-points start uses an new
1717	      SSRC identifier prior to having signalled it to the peer and
1718	      received acknowledgement on the signalling message.  The Source-
1719	      Specific SDP Attributes [RFC5576] contains no mechanism to resolve
1720	      SSRC collisions or reject a end-points usage of an SSRC.

1722	   o  SSRC values that have not been signalled could also appear in an
1723	      RTP session.  This is more likely than it appears, since some RTP
1724	      functions use extra SSRCs to provide their functionality.  For
1725	      example, retransmission data might be transmitted using a separate
1726	      RTP packet stream that requires its own SSRC, separate to the SSRC
1727	      of the source RTP packet stream [RFC4588].  In those cases, an
1728	      end-point can create a new SSRC that strictly doesn't need to be
1729	      announced over the signalling channel to function correctly on
1730	      both RTP and RTCPeerConnection level.

1732	   o  Multiple end-points in a multiparty conference can create new
1733	      sources and signal those towards the RTP middlebox.  In cases
1734	      where the SSRC/CSRC are propagated between the different end-
1735	      points from the RTP middlebox collisions can occur.

1737	   o  An RTP middlebox could connect an end-point's RTCPeerConnection to
1738	      another RTCPeerConnection from the same end-point, thus forming a
1739	      loop where the end-point will receive its own traffic.  While is
1740	      is clearly considered a bug, it is important that the end-point is
1741	      able to recognise and handle the case when it occurs.  This case
1742	      becomes even more problematic when media mixers, and so on, are
1743	      involved, where the stream received is a different stream but
1744	      still contains this client's input.

1746	   These SSRC/CSRC collisions can only be handled on RTP level as long
1747	   as the same RTP session is extended across multiple
1748	   RTCPeerConnections by a RTP middlebox.  To resolve the more generic
1749	   case where multiple RTCPeerConnections are interconnected, then
1750	   identification of the media source(s) part of a MediaStreamTrack
1751	   being propagated across multiple interconnected RTCPeerConnection
1752	   needs to be preserved across these interconnections.

1754	12.2.3.  Media Synchronisation Context

1756	   When an end-point sends media from more than one media source, it
1757	   needs to consider if (and which of) these media sources are to be
1758	   synchronized.  In RTP/RTCP, synchronisation is provided by having a
1759	   set of RTP packet streams be indicated as coming from the same
1760	   synchronisation context and logical end-point by using the same RTCP
1761	   CNAME identifier.

1763	   The next provision is that the internal clocks of all media sources,
1764	   i.e., what drives the RTP timestamp, can be correlated to a system
1765	   clock that is provided in RTCP Sender Reports encoded in an NTP
1766	   format.  By correlating all RTP timestamps to a common system clock
1767	   for all sources, the timing relation of the different RTP packet
1768	   streams, also across multiple RTP sessions can be derived at the
1769	   receiver and, if desired, the streams can be synchronized.  The
1770	   requirement is for the media sender to provide the correlation
1771	   information; it is up to the receiver to use it or not.

1773	13.  Security Considerations

1775	   The overall security architecture for WebRTC is described in
1776	   [I-D.ietf-rtcweb-security-arch], and security considerations for the
1777	   WebRTC framework are described in [I-D.ietf-rtcweb-security].  These
1778	   considerations also apply to this memo.

1780	   The security considerations of the RTP specification, the RTP/SAVPF
1781	   profile, and the various RTP/RTCP extensions and RTP payload formats
1782	   that form the complete protocol suite described in this memo apply.
1783	   We do not believe there are any new security considerations resulting
1784	   from the combination of these various protocol extensions.

1786	   The Extended Secure RTP Profile for Real-time Transport Control
1787	   Protocol (RTCP)-Based Feedback [RFC5124] (RTP/SAVPF) provides
1788	   handling of fundamental issues by offering confidentiality, integrity
1789	   and partial source authentication.  A mandatory to implement media
1790	   security solution is created by combing this secured RTP profile and
1791	   DTLS-SRTP keying [RFC5764] as defined by Section 5.5 of
1792	   [I-D.ietf-rtcweb-security-arch].

1794	   RTCP packets convey a Canonical Name (CNAME) identifier that is used
1795	   to associate RTP packet streams that need to be synchronised across
1796	   related RTP sessions.  Inappropriate choice of CNAME values can be a
1797	   privacy concern, since long-term persistent CNAME identifiers can be
1798	   used to track users across multiple WebRTC calls.  Section 4.9 of
1799	   this memo provides guidelines for generation of untraceable CNAME
1800	   values that alleviate this risk.

1802	   The guidelines in [RFC6562] apply when using variable bit rate (VBR)
1803	   audio codecs such as Opus (see Section 4.3 for discussion of mandated
1804	   audio codecs).  The guidelines in [RFC6562] also apply, but are of
1805	   lesser importance, when using the client-to-mixer audio level header
1806	   extensions (Section 5.2.2) or the mixer-to-client audio level header
1807	   extensions (Section 5.2.3).  The use of the encryption of the header
1808	   extensions are RECOMMENDED, unless there are known reasons, like RTP
1809	   middleboxes or third party monitoring that will greatly benefit from
1810	   the information, and this has been expressed using API or signalling.

1812	   If further evidence are produced to show that information leakage is
1813	   significant from audio level indications, then use of encryption
1814	   needs to be mandated at that time.

1816	14.  IANA Considerations

1818	   This memo makes no request of IANA.

1820	   Note to RFC Editor: this section is to be removed on publication as
1821	   an RFC.

1823	15.  Acknowledgements

1825	   The authors would like to thank Bernard Aboba, Harald Alvestrand,
1826	   Cary Bran, Charles Eckel, Christian Groves, Cullen Jennings, Dan
1827	   Romascanu, Martin Thomson, and the other members of the IETF RTCWEB
1828	   working group for their valuable feedback.

1830	16.  References

1832	16.1.  Normative References

1834	   [I-D.ietf-avtcore-multi-media-rtp-session]
1835	              Westerlund, M., Perkins, C., and J. Lennox, "Sending
1836	              Multiple Types of Media in a Single RTP Session", draft-
1837	              ietf-avtcore-multi-media-rtp-session-05 (work in
1838	              progress), February 2014.

1840	   [I-D.ietf-avtcore-rtp-circuit-breakers]
1841	              Perkins, C. and V. Singh, "Multimedia Congestion Control:
1842	              Circuit Breakers for Unicast RTP Sessions", draft-ietf-
1843	              avtcore-rtp-circuit-breakers-05 (work in progress),
1844	              February 2014.

1846	   [I-D.ietf-avtcore-rtp-multi-stream-optimisation]
1847	              Lennox, J., Westerlund, M., Wu, W., and C. Perkins,
1848	              "Sending Multiple Media Streams in a Single RTP Session:
1849	              Grouping RTCP Reception Statistics and Other Feedback",
1850	              draft-ietf-avtcore-rtp-multi-stream-optimisation-02 (work
1851	              in progress), February 2014.

1853	   [I-D.ietf-avtcore-rtp-multi-stream]
1854	              Lennox, J., Westerlund, M., Wu, W., and C. Perkins,
1855	              "Sending Multiple Media Streams in a Single RTP Session",
1856	              draft-ietf-avtcore-rtp-multi-stream-03 (work in progress),
1857	              February 2014.

1859	   [I-D.ietf-rtcweb-security-arch]
1860	              Rescorla, E., "WebRTC Security Architecture", draft-ietf-
1861	              rtcweb-security-arch-09 (work in progress), February 2014.

1863	   [I-D.ietf-rtcweb-security]
1864	              Rescorla, E., "Security Considerations for WebRTC", draft-
1865	              ietf-rtcweb-security-06 (work in progress), January 2014.

1867	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1868	              Requirement Levels", BCP 14, RFC 2119, March 1997.

1870	   [RFC2736]  Handley, M. and C. Perkins, "Guidelines for Writers of RTP
1871	              Payload Format Specifications", BCP 36, RFC 2736, December
1872	              1999.

1874	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
1875	              Jacobson, "RTP: A Transport Protocol for Real-Time
1876	              Applications", STD 64, RFC 3550, July 2003.

1878	   [RFC3551]  Schulzrinne, H. and S. Casner, "RTP Profile for Audio and
1879	              Video Conferences with Minimal Control", STD 65, RFC 3551,
1880	              July 2003.

1882	   [RFC3556]  Casner, S., "Session Description Protocol (SDP) Bandwidth
1883	              Modifiers for RTP Control Protocol (RTCP) Bandwidth", RFC
1884	              3556, July 2003.

1886	   [RFC3711]  Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
1887	              Norrman, "The Secure Real-time Transport Protocol (SRTP)",
1888	              RFC 3711, March 2004.

1890	   [RFC4566]  Handley, M., Jacobson, V., and C. Perkins, "SDP: Session
1891	              Description Protocol", RFC 4566, July 2006.

1893	   [RFC4585]  Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey,
1894	              "Extended RTP Profile for Real-time Transport Control
1895	              Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July
1896	              2006.

1898	   [RFC4588]  Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R.
1899	              Hakenberg, "RTP Retransmission Payload Format", RFC 4588,
1900	              July 2006.

1902	   [RFC4961]  Wing, D., "Symmetric RTP / RTP Control Protocol (RTCP)",
1903	              BCP 131, RFC 4961, July 2007.

1905	   [RFC5104]  Wenger, S., Chandra, U., Westerlund, M., and B. Burman,
1906	              "Codec Control Messages in the RTP Audio-Visual Profile
1907	              with Feedback (AVPF)", RFC 5104, February 2008.

1909	   [RFC5124]  Ott, J. and E. Carrara, "Extended Secure RTP Profile for
1910	              Real-time Transport Control Protocol (RTCP)-Based Feedback
1911	              (RTP/SAVPF)", RFC 5124, February 2008.

1913	   [RFC5285]  Singer, D. and H. Desineni, "A General Mechanism for RTP
1914	              Header Extensions", RFC 5285, July 2008.

1916	   [RFC5506]  Johansson, I. and M. Westerlund, "Support for Reduced-Size
1917	              Real-Time Transport Control Protocol (RTCP): Opportunities
1918	              and Consequences", RFC 5506, April 2009.

1920	   [RFC5761]  Perkins, C. and M. Westerlund, "Multiplexing RTP Data and
1921	              Control Packets on a Single Port", RFC 5761, April 2010.

1923	   [RFC5764]  McGrew, D. and E. Rescorla, "Datagram Transport Layer
1924	              Security (DTLS) Extension to Establish Keys for the Secure
1925	              Real-time Transport Protocol (SRTP)", RFC 5764, May 2010.

1927	   [RFC6051]  Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP
1928	              Flows", RFC 6051, November 2010.

1930	   [RFC6464]  Lennox, J., Ivov, E., and E. Marocco, "A Real-time
1931	              Transport Protocol (RTP) Header Extension for Client-to-
1932	              Mixer Audio Level Indication", RFC 6464, December 2011.

1934	   [RFC6465]  Ivov, E., Marocco, E., and J. Lennox, "A Real-time
1935	              Transport Protocol (RTP) Header Extension for Mixer-to-
1936	              Client Audio Level Indication", RFC 6465, December 2011.

1938	   [RFC6562]  Perkins, C. and JM. Valin, "Guidelines for the Use of
1939	              Variable Bit Rate Audio with Secure RTP", RFC 6562, March
1940	              2012.

1942	   [RFC6904]  Lennox, J., "Encryption of Header Extensions in the Secure
1943	              Real-time Transport Protocol (SRTP)", RFC 6904, April
1944	              2013.

1946	   [RFC7007]  Terriberry, T., "Update to Remove DVI4 from the
1947	              Recommended Codecs for the RTP Profile for Audio and Video
1948	              Conferences with Minimal Control (RTP/AVP)", RFC 7007,
1949	              August 2013.

1951	   [RFC7022]  Begen, A., Perkins, C., Wing, D., and E. Rescorla,
1952	              "Guidelines for Choosing RTP Control Protocol (RTCP)
1953	              Canonical Names (CNAMEs)", RFC 7022, September 2013.

1955	   [RFC7160]  Petit-Huguenin, M. and G. Zorn, "Support for Multiple
1956	              Clock Rates in an RTP Session", RFC 7160, April 2014.

1958	   [RFC7164]  Gross, K. and R. Brandenburg, "RTP and Leap Seconds", RFC
1959	              7164, March 2014.

1961	   [W3C.WD-mediacapture-streams-20130903]
1962	              Burnett, D., Bergkvist, A., Jennings, C., and A.
1963	              Narayanan, "Media Capture and Streams", World Wide Web
1964	              Consortium WD WD-mediacapture-streams-20130903, September
1965	              2013, <http://www.w3.org/TR/2013/
1966	              WD-mediacapture-streams-20130903>.

1968	   [W3C.WD-webrtc-20130910]
1969	              Bergkvist, A., Burnett, D., Jennings, C., and A.
1970	              Narayanan, "WebRTC 1.0: Real-time Communication Between
1971	              Browsers", World Wide Web Consortium WD WD-
1972	              webrtc-20130910, September 2013,
1973	              <http://www.w3.org/TR/2013/WD-webrtc-20130910>.

1975	16.2.  Informative References

1977	   [I-D.ietf-avtcore-multiplex-guidelines]
1978	              Westerlund, M., Perkins, C., and H. Alvestrand,
1979	              "Guidelines for using the Multiplexing Features of RTP to
1980	              Support Multiple Media Streams", draft-ietf-avtcore-
1981	              multiplex-guidelines-02 (work in progress), January 2014.

1983	   [I-D.ietf-avtcore-rtp-topologies-update]
1984	              Westerlund, M. and S. Wenger, "RTP Topologies", draft-
1985	              ietf-avtcore-rtp-topologies-update-01 (work in progress),
1986	              October 2013.

1988	   [I-D.ietf-avtext-rtp-grouping-taxonomy]
1989	              Lennox, J., Gross, K., Nandakumar, S., and G. Salgueiro,
1990	              "A Taxonomy of Grouping Semantics and Mechanisms for Real-
1991	              Time Transport Protocol (RTP) Sources", draft-ietf-avtext-
1992	              rtp-grouping-taxonomy-01 (work in progress), February
1993	              2014.

1995	   [I-D.ietf-mmusic-msid]
1996	              Alvestrand, H., "WebRTC MediaStream Identification in the
1997	              Session Description Protocol", draft-ietf-mmusic-msid-05
1998	              (work in progress), March 2014.

2000	   [I-D.ietf-mmusic-sdp-bundle-negotiation]
2001	              Holmberg, C., Alvestrand, H., and C. Jennings,
2002	              "Negotiating Media Multiplexing Using the Session
2003	              Description Protocol (SDP)", draft-ietf-mmusic-sdp-bundle-
2004	              negotiation-07 (work in progress), April 2014.

2006	   [I-D.ietf-payload-rtp-howto]
2007	              Westerlund, M., "How to Write an RTP Payload Format",
2008	              draft-ietf-payload-rtp-howto-13 (work in progress),
2009	              January 2014.

2011	   [I-D.ietf-rmcat-cc-requirements]
2012	              Jesup, R., "Congestion Control Requirements For RMCAT",
2013	              draft-ietf-rmcat-cc-requirements-04 (work in progress),
2014	              April 2014.

2016	   [I-D.ietf-rtcweb-audio]
2017	              Valin, J. and C. Bran, "WebRTC Audio Codec and Processing
2018	              Requirements", draft-ietf-rtcweb-audio-05 (work in
2019	              progress), February 2014.

2021	   [I-D.ietf-rtcweb-overview]
2022	              Alvestrand, H., "Overview: Real Time Protocols for Brower-
2023	              based Applications", draft-ietf-rtcweb-overview-09 (work
2024	              in progress), February 2014.

2026	   [I-D.ietf-rtcweb-use-cases-and-requirements]
2027	              Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real-
2028	              Time Communication Use-cases and Requirements", draft-
2029	              ietf-rtcweb-use-cases-and-requirements-14 (work in
2030	              progress), February 2014.

2032	   [I-D.ietf-tsvwg-rtcweb-qos]
2033	              Dhesikan, S., Druta, D., Jones, P., and J. Polk, "DSCP and
2034	              other packet markings for RTCWeb QoS", draft-ietf-tsvwg-
2035	              rtcweb-qos-00 (work in progress), April 2014.

2037	   [RFC3611]  Friedman, T., Caceres, R., and A. Clark, "RTP Control
2038	              Protocol Extended Reports (RTCP XR)", RFC 3611, November
2039	              2003.

2041	   [RFC4341]  Floyd, S. and E. Kohler, "Profile for Datagram Congestion
2042	              Control Protocol (DCCP) Congestion Control ID 2: TCP-like
2043	              Congestion Control", RFC 4341, March 2006.

2045	   [RFC4342]  Floyd, S., Kohler, E., and J. Padhye, "Profile for
2046	              Datagram Congestion Control Protocol (DCCP) Congestion
2047	              Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342,
2048	              March 2006.

2050	   [RFC4383]  Baugher, M. and E. Carrara, "The Use of Timed Efficient
2051	              Stream Loss-Tolerant Authentication (TESLA) in the Secure
2052	              Real-time Transport Protocol (SRTP)", RFC 4383, February
2053	              2006.

2055	   [RFC4828]  Floyd, S. and E. Kohler, "TCP Friendly Rate Control
2056	              (TFRC): The Small-Packet (SP) Variant", RFC 4828, April
2057	              2007.

2059	   [RFC5348]  Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP
2060	              Friendly Rate Control (TFRC): Protocol Specification", RFC
2061	              5348, September 2008.

2063	   [RFC5576]  Lennox, J., Ott, J., and T. Schierl, "Source-Specific
2064	              Media Attributes in the Session Description Protocol
2065	              (SDP)", RFC 5576, June 2009.

2067	   [RFC5681]  Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
2068	              Control", RFC 5681, September 2009.

2070	   [RFC5968]  Ott, J. and C. Perkins, "Guidelines for Extending the RTP
2071	              Control Protocol (RTCP)", RFC 5968, September 2010.

2073	   [RFC6263]  Marjou, X. and A. Sollaud, "Application Mechanism for
2074	              Keeping Alive the NAT Mappings Associated with RTP / RTP
2075	              Control Protocol (RTCP) Flows", RFC 6263, June 2011.

2077	   [RFC6792]  Wu, Q., Hunt, G., and P. Arden, "Guidelines for Use of the
2078	              RTP Monitoring Framework", RFC 6792, November 2012.

2080	Authors' Addresses

2082	   Colin Perkins
2083	   University of Glasgow
2084	   School of Computing Science
2085	   Glasgow  G12 8QQ
2086	   United Kingdom

2088	   Email: csp@csperkins.org
2089	   URI:   http://csperkins.org/
2090	   Magnus Westerlund
2091	   Ericsson
2092	   Farogatan 6
2093	   SE-164 80 Kista
2094	   Sweden

2096	   Phone: +46 10 714 82 87
2097	   Email: magnus.westerlund@ericsson.com

2099	   Joerg Ott
2100	   Aalto University
2101	   School of Electrical Engineering
2102	   Espoo  02150
2103	   Finland

2105	   Email: jorg.ott@aalto.fi