idnits 2.17.1 draft-ietf-rtcweb-rtp-usage-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 23, 2014) is 3650 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-13) exists of draft-ietf-avtcore-multi-media-rtp-session-05 == Outdated reference: A later version (-18) exists of draft-ietf-avtcore-rtp-circuit-breakers-05 == Outdated reference: A later version (-12) exists of draft-ietf-avtcore-rtp-multi-stream-optimisation-02 == Outdated reference: A later version (-11) exists of draft-ietf-avtcore-rtp-multi-stream-03 == Outdated reference: A later version (-20) exists of draft-ietf-rtcweb-security-arch-09 == Outdated reference: A later version (-12) exists of draft-ietf-rtcweb-security-06 ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) ** Obsolete normative reference: RFC 5285 (Obsoleted by RFC 8285) == Outdated reference: A later version (-12) exists of draft-ietf-avtcore-multiplex-guidelines-02 == Outdated reference: A later version (-10) exists of draft-ietf-avtcore-rtp-topologies-update-01 == Outdated reference: A later version (-08) exists of draft-ietf-avtext-rtp-grouping-taxonomy-01 == Outdated reference: A later version (-17) exists of draft-ietf-mmusic-msid-05 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-07 == Outdated reference: A later version (-14) exists of draft-ietf-payload-rtp-howto-13 == Outdated reference: A later version (-09) exists of draft-ietf-rmcat-cc-requirements-04 == Outdated reference: A later version (-11) exists of draft-ietf-rtcweb-audio-05 == Outdated reference: A later version (-19) exists of draft-ietf-rtcweb-overview-09 == Outdated reference: A later version (-16) exists of draft-ietf-rtcweb-use-cases-and-requirements-14 == Outdated reference: A later version (-18) exists of draft-ietf-tsvwg-rtcweb-qos-00 Summary: 2 errors (**), 0 flaws (~~), 18 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTCWEB Working Group C. Perkins 3 Internet-Draft University of Glasgow 4 Intended status: Standards Track M. Westerlund 5 Expires: October 25, 2014 Ericsson 6 J. Ott 7 Aalto University 8 April 23, 2014 10 Web Real-Time Communication (WebRTC): Media Transport and Use of RTP 11 draft-ietf-rtcweb-rtp-usage-13 13 Abstract 15 The Web Real-Time Communication (WebRTC) framework provides support 16 for direct interactive rich communication using audio, video, text, 17 collaboration, games, etc. between two peers' web-browsers. This 18 memo describes the media transport aspects of the WebRTC framework. 19 It specifies how the Real-time Transport Protocol (RTP) is used in 20 the WebRTC context, and gives requirements for which RTP features, 21 profiles, and extensions need to be supported. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on October 25, 2014. 40 Copyright Notice 42 Copyright (c) 2014 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 4. WebRTC Use of RTP: Core Protocols . . . . . . . . . . . . . . 5 61 4.1. RTP and RTCP . . . . . . . . . . . . . . . . . . . . . . 5 62 4.2. Choice of the RTP Profile . . . . . . . . . . . . . . . . 7 63 4.3. Choice of RTP Payload Formats . . . . . . . . . . . . . . 7 64 4.4. Use of RTP Sessions . . . . . . . . . . . . . . . . . . . 9 65 4.5. RTP and RTCP Multiplexing . . . . . . . . . . . . . . . . 9 66 4.6. Reduced Size RTCP . . . . . . . . . . . . . . . . . . . . 10 67 4.7. Symmetric RTP/RTCP . . . . . . . . . . . . . . . . . . . 10 68 4.8. Choice of RTP Synchronisation Source (SSRC) . . . . . . . 11 69 4.9. Generation of the RTCP Canonical Name (CNAME) . . . . . . 11 70 4.10. Handling of Leap Seconds . . . . . . . . . . . . . . . . 12 71 5. WebRTC Use of RTP: Extensions . . . . . . . . . . . . . . . . 12 72 5.1. Conferencing Extensions and Topologies . . . . . . . . . 12 73 5.1.1. Full Intra Request (FIR) . . . . . . . . . . . . . . 14 74 5.1.2. Picture Loss Indication (PLI) . . . . . . . . . . . . 14 75 5.1.3. Slice Loss Indication (SLI) . . . . . . . . . . . . . 14 76 5.1.4. Reference Picture Selection Indication (RPSI) . . . . 15 77 5.1.5. Temporal-Spatial Trade-off Request (TSTR) . . . . . . 15 78 5.1.6. Temporary Maximum Media Stream Bit Rate Request 79 (TMMBR) . . . . . . . . . . . . . . . . . . . . . . . 15 80 5.2. Header Extensions . . . . . . . . . . . . . . . . . . . . 16 81 5.2.1. Rapid Synchronisation . . . . . . . . . . . . . . . . 16 82 5.2.2. Client-to-Mixer Audio Level . . . . . . . . . . . . . 16 83 5.2.3. Mixer-to-Client Audio Level . . . . . . . . . . . . . 17 84 6. WebRTC Use of RTP: Improving Transport Robustness . . . . . . 17 85 6.1. Negative Acknowledgements and RTP Retransmission . . . . 17 86 6.2. Forward Error Correction (FEC) . . . . . . . . . . . . . 18 87 7. WebRTC Use of RTP: Rate Control and Media Adaptation . . . . 19 88 7.1. Boundary Conditions and Circuit Breakers . . . . . . . . 20 89 7.2. RTCP Limitations for Congestion Control . . . . . . . . . 20 90 7.3. Congestion Control Interoperability and Legacy Systems . 22 91 8. WebRTC Use of RTP: Performance Monitoring . . . . . . . . . . 23 92 9. WebRTC Use of RTP: Future Extensions . . . . . . . . . . . . 24 93 10. Signalling Considerations . . . . . . . . . . . . . . . . . . 24 94 11. WebRTC API Considerations . . . . . . . . . . . . . . . . . . 25 95 12. RTP Implementation Considerations . . . . . . . . . . . . . . 28 96 12.1. Configuration and Use of RTP Sessions . . . . . . . . . 28 97 12.1.1. Use of Multiple Media Sources Within an RTP Session 28 98 12.1.2. Use of Multiple RTP Sessions . . . . . . . . . . . . 29 99 12.1.3. Differentiated Treatment of RTP Packet Streams . . . 34 100 12.2. Media Source, RTP Packet Streams, and Participant 101 Identification . . . . . . . . . . . . . . . . . . . . . 35 102 12.2.1. Media Source . . . . . . . . . . . . . . . . . . . . 36 103 12.2.2. SSRC Collision Detection . . . . . . . . . . . . . . 36 104 12.2.3. Media Synchronisation Context . . . . . . . . . . . 37 105 13. Security Considerations . . . . . . . . . . . . . . . . . . . 38 106 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39 107 15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 39 108 16. References . . . . . . . . . . . . . . . . . . . . . . . . . 39 109 16.1. Normative References . . . . . . . . . . . . . . . . . . 39 110 16.2. Informative References . . . . . . . . . . . . . . . . . 42 111 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 44 113 1. Introduction 115 The Real-time Transport Protocol (RTP) [RFC3550] provides a framework 116 for delivery of audio and video teleconferencing data and other real- 117 time media applications. Previous work has defined the RTP protocol, 118 along with numerous profiles, payload formats, and other extensions. 119 When combined with appropriate signalling, these form the basis for 120 many teleconferencing systems. 122 The Web Real-Time communication (WebRTC) framework provides the 123 protocol building blocks to support direct, interactive, real-time 124 communication using audio, video, collaboration, games, etc., between 125 two peers' web-browsers. This memo describes how the RTP framework 126 is to be used in the WebRTC context. It proposes a baseline set of 127 RTP features that are to be implemented by all WebRTC-aware end- 128 points, along with suggested extensions for enhanced functionality. 130 This memo specifies a protocol intended for use within the WebRTC 131 framework, but is not restricted to that context. An overview of the 132 WebRTC framework is given in [I-D.ietf-rtcweb-overview]. 134 The structure of this memo is as follows. Section 2 outlines our 135 rationale in preparing this memo and choosing these RTP features. 136 Section 3 defines terminology. Requirements for core RTP protocols 137 are described in Section 4 and suggested RTP extensions are described 138 in Section 5. Section 6 outlines mechanisms that can increase 139 robustness to network problems, while Section 7 describes congestion 140 control and rate adaptation mechanisms. The discussion of mandated 141 RTP mechanisms concludes in Section 8 with a review of performance 142 monitoring and network management tools that can be used in the 143 WebRTC context. Section 9 gives some guidelines for future 144 incorporation of other RTP and RTP Control Protocol (RTCP) extensions 145 into this framework. Section 10 describes requirements placed on the 146 signalling channel. Section 11 discusses the relationship between 147 features of the RTP framework and the WebRTC application programming 148 interface (API), and Section 12 discusses RTP implementation 149 considerations. The memo concludes with security considerations 150 (Section 13) and IANA considerations (Section 14). 152 2. Rationale 154 The RTP framework comprises the RTP data transfer protocol, the RTP 155 control protocol, and numerous RTP payload formats, profiles, and 156 extensions. This range of add-ons has allowed RTP to meet various 157 needs that were not envisaged by the original protocol designers, and 158 to support many new media encodings, but raises the question of what 159 extensions are to be supported by new implementations. The 160 development of the WebRTC framework provides an opportunity to review 161 the available RTP features and extensions, and to define a common 162 baseline feature set for all WebRTC implementations of RTP. This 163 builds on the past 20 years development of RTP to mandate the use of 164 extensions that have shown widespread utility, while still remaining 165 compatible with the wide installed base of RTP implementations where 166 possible. 168 RTP and RTCP extensions that are not discussed in this document can 169 be implemented by WebRTC end-points if they are beneficial for new 170 use cases. However, they are not necessary to address the WebRTC use 171 cases and requirements identified in 172 [I-D.ietf-rtcweb-use-cases-and-requirements]. 174 While the baseline set of RTP features and extensions defined in this 175 memo is targeted at the requirements of the WebRTC framework, it is 176 expected to be broadly useful for other conferencing-related uses of 177 RTP. In particular, it is likely that this set of RTP features and 178 extensions will be appropriate for other desktop or mobile video 179 conferencing systems, or for room-based high-quality telepresence 180 applications. 182 3. Terminology 184 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 185 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 186 document are to be interpreted as described in [RFC2119]. The RFC 187 2119 interpretation of these key words applies only when written in 188 ALL CAPS. Lower- or mixed-case uses of these key words are not to be 189 interpreted as carrying special significance in this memo. 191 We define the following additional terms: 193 WebRTC MediaStream: The MediaStream concept defined by the W3C in 194 the WebRTC API [W3C.WD-mediacapture-streams-20130903]. 196 Transport-layer Flow: A uni-directional flow of transport packets 197 that are identified by having a particular 5-tuple of source IP 198 address, source port, destination IP address, destination port, 199 and transport protocol used. 201 Bi-directional Transport-layer Flow: A bi-directional transport- 202 layer flow is a transport-layer flow that is symmetric. That is, 203 the transport-layer flow in the reverse direction has a 5-tuple 204 where the source and destination address and ports are swapped 205 compared to the forward path transport-layer flow, and the 206 transport protocol is the same. 208 This document uses the terminology from 209 [I-D.ietf-avtext-rtp-grouping-taxonomy]. Other terms are used 210 according to their definitions from the RTP Specification [RFC3550]. 211 We especially note the following frequently used terms: RTP Packet 212 Stream, RTP Session, and End-point. 214 4. WebRTC Use of RTP: Core Protocols 216 The following sections describe the core features of RTP and RTCP 217 that need to be implemented, along with the mandated RTP profiles. 218 Also described are the core extensions providing essential features 219 that all WebRTC implementations need to implement to function 220 effectively on today's networks. 222 4.1. RTP and RTCP 224 The Real-time Transport Protocol (RTP) [RFC3550] is REQUIRED to be 225 implemented as the media transport protocol for WebRTC. RTP itself 226 comprises two parts: the RTP data transfer protocol, and the RTP 227 control protocol (RTCP). RTCP is a fundamental and integral part of 228 RTP, and MUST be implemented in all WebRTC applications. 230 The following RTP and RTCP features are sometimes omitted in limited 231 functionality implementations of RTP, but are REQUIRED in all WebRTC 232 implementations: 234 o Support for use of multiple simultaneous SSRC values in a single 235 RTP session, including support for RTP end-points that send many 236 SSRC values simultaneously, following [RFC3550] and 237 [I-D.ietf-avtcore-rtp-multi-stream]. Support for the RTCP 238 optimisations for multi-SSRC sessions defined in 239 [I-D.ietf-avtcore-rtp-multi-stream-optimisation] is RECOMMENDED. 241 o Random choice of SSRC on joining a session; collision detection 242 and resolution for SSRC values (see also Section 4.8). 244 o Support for reception of RTP data packets containing CSRC lists, 245 as generated by RTP mixers, and RTCP packets relating to CSRCs. 247 o Sending correct synchronisation information in the RTCP Sender 248 Reports, to allow receivers to implement lip-synchronisation; 249 support for the rapid RTP synchronisation extensions (see 250 Section 5.2.1) is RECOMMENDED. 252 o Support for multiple synchronisation contexts. Participants that 253 send multiple simultaneous RTP packet streams SHOULD do so as part 254 of a single synchronisation context, using a single RTCP CNAME for 255 all streams and allowing receivers to play the streams out in a 256 synchronised manner. For compatibility with potential future 257 versions of this specification, or for interoperability with non- 258 WebRTC devices through a gateway, receivers MUST support multiple 259 synchronisation contexts, indicated by the use of multiple RTCP 260 CNAMEs in an RTP session. This specification requires the usage 261 of a single CNAME when sending RTP Packet Streams in some 262 circumstances, see Section 4.9. 264 o Support for sending and receiving RTCP SR, RR, SDES, and BYE 265 packet types, with OPTIONAL support for other RTCP packet types 266 unless mandated by other parts of this specification; 267 implementations MUST ignore unknown RTCP packet types. Note that 268 additional RTCP Packet types are used by the RTP/SAVPF Profile 269 (Section 4.2) and the other RTCP extensions (Section 5). 271 o Support for multiple end-points in a single RTP session, and for 272 scaling the RTCP transmission interval according to the number of 273 participants in the session; support for randomised RTCP 274 transmission intervals to avoid synchronisation of RTCP reports; 275 support for RTCP timer reconsideration. 277 o Support for configuring the RTCP bandwidth as a fraction of the 278 media bandwidth, and for configuring the fraction of the RTCP 279 bandwidth allocated to senders, e.g., using the SDP "b=" line 280 [RFC4566][RFC3556]. Support for the reduced minimum RTCP 281 reporting interval described in Section 6.2 of [RFC3550] is 282 RECOMMENDED. 284 It is known that a significant number of legacy RTP implementations, 285 especially those targeted at VoIP-only systems, do not support all of 286 the above features, and in some cases do not support RTCP at all. 287 Implementers are advised to consider the requirements for graceful 288 degradation when interoperating with legacy implementations. 290 Other implementation considerations are discussed in Section 12. 292 4.2. Choice of the RTP Profile 294 The complete specification of RTP for a particular application domain 295 requires the choice of an RTP Profile. For WebRTC use, the Extended 296 Secure RTP Profile for RTCP-Based Feedback (RTP/SAVPF) [RFC5124], as 297 extended by [RFC7007], MUST be implemented. The RTP/SAVPF profile is 298 the combination of basic RTP/AVP profile [RFC3551], the RTP profile 299 for RTCP-based feedback (RTP/AVPF) [RFC4585], and the secure RTP 300 profile (RTP/SAVP) [RFC3711]. 302 The RTCP-based feedback extensions [RFC4585] are needed for the 303 improved RTCP timer model. This allows more flexible transmission of 304 RTCP packets in response to events, rather than strictly according to 305 bandwidth, and is vital for being able to report congestion signals 306 as well as media events. These extensions also allow saving RTCP 307 bandwidth, and an end-point will commonly only use the full RTCP 308 bandwidth allocation if there are many events that require feedback. 309 The timer rules are also needed to make use of the RTP conferencing 310 extensions discussed in Section 5.1. 312 Note: The enhanced RTCP timer model defined in the RTP/AVPF 313 profile is backwards compatible with legacy systems that implement 314 only the RTP/AVP or RTP/SAVP profile, given some constraints on 315 parameter configuration such as the RTCP bandwidth value and "trr- 316 int" (the most important factor for interworking with RTP/(S)AVP 317 end-points via a gateway is to set the trr-int parameter to a 318 value representing 4 seconds). 320 The secure RTP (SRTP) profile extensions [RFC3711] are needed to 321 provide media encryption, integrity protection, replay protection and 322 a limited form of source authentication. WebRTC implementations MUST 323 NOT send packets using the basic RTP/AVP profile or the RTP/AVPF 324 profile; they MUST employ the full RTP/SAVPF profile to protect all 325 RTP and RTCP packets that are generated (i.e., implementations MUST 326 use SRTP and SRTCP). The RTP/SAVPF profile MUST be configured using 327 the cipher suites, DTLS-SRTP protection profiles, keying mechanisms, 328 and other parameters described in [I-D.ietf-rtcweb-security-arch]. 330 4.3. Choice of RTP Payload Formats 332 The set of mandatory to implement codecs and RTP payload formats for 333 WebRTC is not specified in this memo, instead they are defined in 334 separate specifications, such as [I-D.ietf-rtcweb-audio]. 335 Implementations can support any codec for which an RTP payload format 336 and associated signalling is defined. Implementation cannot assume 337 that the other participants in an RTP session understand any RTP 338 payload format, no matter how common; the mapping between RTP payload 339 type numbers and specific configurations of particular RTP payload 340 formats MUST be agreed before those payload types/formats can be 341 used. In an SDP context, this can be done using the "a=rtpmap:" and 342 "a=fmtp:" attributes associated with an "m=" line, along with any 343 other SDP attributes needed to configure the RTP payload format. 345 End-points can signal support for multiple RTP payload formats, or 346 multiple configurations of a single RTP payload format, as long as 347 each unique RTP payload format configuration uses a different RTP 348 payload type number. As outlined in Section 4.8, the RTP payload 349 type number is sometimes used to associate an RTP packet stream with 350 a signalling context. This association is possible provided unique 351 RTP payload type numbers are used in each context. For example, an 352 RTP packet stream can be associated with an SDP "m=" line by 353 comparing the RTP payload type numbers used by the RTP packet stream 354 with payload types signalled in the "a=rtpmap:" lines in the media 355 sections of the SDP. If RTP packet streams are being associated with 356 signalling contexts based on the RTP payload type, then the 357 assignment of RTP payload type numbers MUST be unique across 358 signalling contexts; if the same RTP payload format configuration is 359 used in multiple contexts, then a different RTP payload type number 360 has to be assigned in each context to ensure uniqueness. If the RTP 361 payload type number is not being used to associate RTP packet streams 362 with a signalling context, then the same RTP payload type number can 363 be used to indicate the exact same RTP payload format configuration 364 in multiple contexts. A single RTP payload type number MUST NOT be 365 assigned to different RTP payload formats, or different 366 configurations of the same RTP payload format, within a single RTP 367 session (note that the different "m=" lines in an SDP bundle group 368 [I-D.ietf-mmusic-sdp-bundle-negotiation] form a single RTP session). 370 An end-point that has signalled support for multiple RTP payload 371 formats SHOULD be able to accept data in any of those payload formats 372 at any time, unless it has previously signalled limitations on its 373 decoding capability. This requirement is constrained if several 374 types of media (e.g., audio and video) are sent in the same RTP 375 session. In such a case, a source (SSRC) is restricted to switching 376 only between the RTP payload formats signalled for the type of media 377 that is being sent by that source; see Section 4.4. To support rapid 378 rate adaptation by changing codec, RTP does not require advance 379 signalling for changes between RTP payload formats used by a single 380 SSRC that were signalled during session set-up. 382 An RTP sender that changes between two RTP payload types that use 383 different RTP clock rates MUST follow the recommendations in 384 Section 4.1 of [RFC7160]. RTP receivers MUST follow the 385 recommendations in Section 4.3 of [RFC7160] in order to support 386 sources that switch between clock rates in an RTP session (these 387 recommendations for receivers are backwards compatible with the case 388 where senders use only a single clock rate). 390 4.4. Use of RTP Sessions 392 An association amongst a set of end-points communicating using RTP is 393 known as an RTP session [RFC3550]. An end-point can be involved in 394 several RTP sessions at the same time. In a multimedia session, each 395 type of media has typically been carried in a separate RTP session 396 (e.g., using one RTP session for the audio, and a separate RTP 397 session using a different transport-layer flow for the video). 398 WebRTC implementations of RTP are REQUIRED to implement support for 399 multimedia sessions in this way, separating each session using 400 different transport-layer flows for compatibility with legacy 401 systems. 403 In modern day networks, however, with the widespread use of network 404 address/port translators (NAT/NAPT) and firewalls, it is desirable to 405 reduce the number of transport-layer flows used by RTP applications. 406 This can be done by sending all the RTP packet streams in a single 407 RTP session, which will comprise a single transport-layer flow (this 408 will prevent the use of some quality-of-service mechanisms, as 409 discussed in Section 12.1.3). Implementations are therefore also 410 REQUIRED to support transport of all RTP packet streams, independent 411 of media type, in a single RTP session using a single transport layer 412 flow, according to [I-D.ietf-avtcore-multi-media-rtp-session]. If 413 multiple types of media are to be used in a single RTP session, all 414 participants in that RTP session MUST agree to this usage. In an SDP 415 context, [I-D.ietf-mmusic-sdp-bundle-negotiation] can be used to 416 signal such a bundle of RTP packet streams forming a single RTP 417 session. 419 Further discussion about the suitability of different RTP session 420 structures and multiplexing methods to different scenarios are 421 suitable can be found in [I-D.ietf-avtcore-multiplex-guidelines]. 423 4.5. RTP and RTCP Multiplexing 425 Historically, RTP and RTCP have been run on separate transport layer 426 flows (e.g., two UDP ports for each RTP session, one port for RTP and 427 one port for RTCP). With the increased use of Network Address/Port 428 Translation (NAT/NAPT) this has become problematic, since maintaining 429 multiple NAT bindings can be costly. It also complicates firewall 430 administration, since multiple ports need to be opened to allow RTP 431 traffic. To reduce these costs and session set-up times, support for 432 multiplexing RTP data packets and RTCP control packets on a single 433 transport-layer flow for each RTP session is REQUIRED, provided it is 434 negotiated in the signalling channel before use as specified in 435 [RFC5761]. For backwards compatibility, implementations are also 436 REQUIRED to support RTP and RTCP sent on separate transport-layer 437 flows. 439 Note that the use of RTP and RTCP multiplexed onto a single 440 transport-layer flow ensures that there is occasional traffic sent on 441 that port, even if there is no active media traffic. This can be 442 useful to keep NAT bindings alive, and is the recommend method for 443 application level keep-alives of RTP sessions [RFC6263]. 445 4.6. Reduced Size RTCP 447 RTCP packets are usually sent as compound RTCP packets, and [RFC3550] 448 requires that those compound packets start with an Sender Report (SR) 449 or Receiver Report (RR) packet. When using frequent RTCP feedback 450 messages under the RTP/AVPF Profile [RFC4585] these statistics are 451 not needed in every packet, and unnecessarily increase the mean RTCP 452 packet size. This can limit the frequency at which RTCP packets can 453 be sent within the RTCP bandwidth share. 455 To avoid this problem, [RFC5506] specifies how to reduce the mean 456 RTCP message size and allow for more frequent feedback. Frequent 457 feedback, in turn, is essential to make real-time applications 458 quickly aware of changing network conditions, and to allow them to 459 adapt their transmission and encoding behaviour. Support for non- 460 compound RTCP feedback packets [RFC5506] is REQUIRED, but MUST be 461 negotiated using the signalling channel before use. For backwards 462 compatibility, implementations are also REQUIRED to support the use 463 of compound RTCP feedback packets if the remote end-point does not 464 agree to the use of non-compound RTCP in the signalling exchange. 466 4.7. Symmetric RTP/RTCP 468 To ease traversal of NAT and firewall devices, implementations are 469 REQUIRED to implement and use Symmetric RTP [RFC4961]. The reason 470 for using symmetric RTP is primarily to avoid issues with NATs and 471 Firewalls by ensuring that the send and receive RTP packet streams, 472 as well as RTCP, are actually bi-directional transport-layer flows. 473 This will keep alive the NAT and firewall pinholes, and help indicate 474 consent that the receive direction is a transport-layer flow the 475 intended recipient actually wants. In addition, it saves resources, 476 specifically ports at the end-points, but also in the network as NAT 477 mappings or firewall state is not unnecessary bloated. The amount of 478 per flow QoS state kept in the network is also reduced. 480 4.8. Choice of RTP Synchronisation Source (SSRC) 482 Implementations are REQUIRED to support signalled RTP synchronisation 483 source (SSRC) identifiers, using the "a=ssrc:" SDP attribute defined 484 in Section 4.1 and Section 5 of [RFC5576]. Implementations MUST also 485 support the "previous-ssrc" source attribute defined in Section 6.2 486 of [RFC5576]. Other per-SSRC attributes defined in [RFC5576] MAY be 487 supported. 489 Use of the "a=ssrc:" attribute to signal SSRC identifiers in an RTP 490 session is OPTIONAL. Implementations MUST be prepared to accept RTP 491 and RTCP packets using SSRCs that have not been explicitly signalled 492 ahead of time. Implementations MUST support random SSRC assignment, 493 and MUST support SSRC collision detection and resolution, according 494 to [RFC3550]. When using signalled SSRC values, collision detection 495 MUST be performed as described in Section 5 of [RFC5576]. 497 It is often desirable to associate an RTP packet stream with a non- 498 RTP context. For users of the WebRTC API a mapping between SSRCs and 499 MediaStreamTracks are provided per Section 11. For gateways or other 500 usages it is possible to associate an RTP packet stream with an "m=" 501 line in a session description formatted using SDP. If SSRCs are 502 signalled this is straightforward (in SDP the "a=ssrc:" line will be 503 at the media level, allowing a direct association with an "m=" line). 504 If SSRCs are not signalled, the RTP payload type numbers used in an 505 RTP packet stream are often sufficient to associate that packet 506 stream with a signalling context (e.g., if RTP payload type numbers 507 are assigned as described in Section 4.3 of this memo, the RTP 508 payload types used by an RTP packet stream can be compared with 509 values in SDP "a=rtpmap:" lines, which are at the media level in SDP, 510 and so map to an "m=" line). 512 4.9. Generation of the RTCP Canonical Name (CNAME) 514 The RTCP Canonical Name (CNAME) provides a persistent transport-level 515 identifier for an RTP end-point. While the Synchronisation Source 516 (SSRC) identifier for an RTP end-point can change if a collision is 517 detected, or when the RTP application is restarted, its RTCP CNAME is 518 meant to stay unchanged for the duration of a RTCPeerConnection 519 [W3C.WD-webrtc-20130910], so that RTP end-points can be uniquely 520 identified and associated with their RTP packet streams within a set 521 of related RTP sessions. 523 Each RTP end-point MUST have at least one RTCP CNAME, and that RTCP 524 CNAME MUST be unique within the RTCPeerConnection. RTCP CNAMEs 525 identify a particular synchronisation context, i.e., all SSRCs 526 associated with a single RTCP CNAME share a common reference clock. 527 If an end-point has SSRCs that are associated with several 528 unsynchronised reference clocks, and hence different synchronisation 529 contexts, it will need to use multiple RTCP CNAMEs, one for each 530 synchronisation context. 532 Taking the discussion in Section 11 into account, a WebRTC end-point 533 MUST NOT use more than one RTCP CNAME in the RTP sessions belonging 534 to single RTCPeerConnection (that is, an RTCPeerConnection forms a 535 synchronisation context). RTP middleboxes MAY generate RTP packet 536 streams associated with more than one RTCP CNAME, to allow them to 537 avoid having to resynchronize media from multiple different end- 538 points part of a multi-party RTP session. 540 The RTP specification [RFC3550] includes guidelines for choosing a 541 unique RTP CNAME, but these are not sufficient in the presence of NAT 542 devices. In addition, long-term persistent identifiers can be 543 problematic from a privacy viewpoint (Section 13). Accordingly, a 544 WebRTC endpoint MUST generate a new, unique, short-term persistent 545 RTCP CNAME for each RTCPeerConnection, following [RFC7022], with a 546 single exception; if explicitly requested at creation an 547 RTCPeerConnection MAY use the same CNAME as as an existing 548 RTCPeerConnection within their common same-origin context. 550 An WebRTC end-point MUST support reception of any CNAME that matches 551 the syntax limitations specified by the RTP specification [RFC3550] 552 and cannot assume that any CNAME will be chosen according to the form 553 suggested above. 555 4.10. Handling of Leap Seconds 557 The guidelines regarding handling of leap seconds to limit their 558 impact on RTP media playout and synchronization given in [RFC7164] 559 SHOULD be followed. 561 5. WebRTC Use of RTP: Extensions 563 There are a number of RTP extensions that are either needed to obtain 564 full functionality, or extremely useful to improve on the baseline 565 performance, in the WebRTC application context. One set of these 566 extensions is related to conferencing, while others are more generic 567 in nature. The following subsections describe the various RTP 568 extensions mandated or suggested for use within the WebRTC context. 570 5.1. Conferencing Extensions and Topologies 572 RTP is a protocol that inherently supports group communication. 573 Groups can be implemented by having each endpoint send its RTP packet 574 streams to an RTP middlebox that redistributes the traffic, by using 575 a mesh of unicast RTP packet streams between endpoints, or by using 576 an IP multicast group to distribute the RTP packet streams. These 577 topologies can be implemented in a number of ways as discussed in 578 [I-D.ietf-avtcore-rtp-topologies-update]. 580 While the use of IP multicast groups is popular in IPTV systems, the 581 topologies based on RTP middleboxes are dominant in interactive video 582 conferencing environments. Topologies based on a mesh of unicast 583 transport-layer flows to create a common RTP session have not seen 584 widespread deployment to date. Accordingly, WebRTC implementations 585 are not expected to support topologies based on IP multicast groups 586 or to support mesh-based topologies, such as a point-to-multipoint 587 mesh configured as a single RTP session (Topo-Mesh in the terminology 588 of [I-D.ietf-avtcore-rtp-topologies-update]). However, a point-to- 589 multipoint mesh constructed using several RTP sessions, implemented 590 in the WebRTC context using independent RTCPeerConnections, can be 591 expected to be utilised by WebRTC applications and needs to be 592 supported. 594 WebRTC implementations of RTP endpoints implemented according to this 595 memo are expected to support all the topologies described in 596 [I-D.ietf-avtcore-rtp-topologies-update] where the RTP endpoints send 597 and receive unicast RTP packet streams to and from some peer device, 598 provided that peer can participate in performing congestion control 599 on the RTP packet streams. The peer device could be another RTP 600 endpoint, or it could be an RTP middlebox that redistributes the RTP 601 packet streams to other RTP endpoints. This limitation means that 602 some of the RTP middlebox-based topologies are not suitable for use 603 in the WebRTC environment. Specifically: 605 o Video switching MCUs (Topo-Video-switch-MCU) SHOULD NOT be used, 606 since they make the use of RTCP for congestion control and quality 607 of service reports problematic (see Section 3.8 of 608 [I-D.ietf-avtcore-rtp-topologies-update]). 610 o The Relay-Transport Translator (Topo-PtM-Trn-Translator) topology 611 SHOULD NOT be used because its safe use requires a congestion 612 control algorithm or RTP circuit breaker that handles point to 613 multipoint, which has not yet been standardised. 615 The following topology can be used, however it has some issues worth 616 noting: 618 o Content modifying MCUs with RTCP termination (Topo-RTCP- 619 terminating-MCU) MAY be used. Note that in this RTP Topology, RTP 620 loop detection and identification of active senders is the 621 responsibility of the WebRTC application; since the clients are 622 isolated from each other at the RTP layer, RTP cannot assist with 623 these functions (see section 3.9 of 624 [I-D.ietf-avtcore-rtp-topologies-update]). 626 The RTP extensions described in Section 5.1.1 to Section 5.1.6 are 627 designed to be used with centralised conferencing, where an RTP 628 middlebox (e.g., a conference bridge) receives a participant's RTP 629 packet streams and distributes them to the other participants. These 630 extensions are not necessary for interoperability; an RTP end-point 631 that does not implement these extensions will work correctly, but 632 might offer poor performance. Support for the listed extensions will 633 greatly improve the quality of experience and, to provide a 634 reasonable baseline quality, some of these extensions are mandatory 635 to be supported by WebRTC end-points. 637 The RTCP conferencing extensions are defined in Extended RTP Profile 638 for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/ 639 AVPF) [RFC4585] and the memo on Codec Control Messages (CCM) in RTP/ 640 AVPF [RFC5104]; they are fully usable by the Secure variant of this 641 profile (RTP/SAVPF) [RFC5124]. 643 5.1.1. Full Intra Request (FIR) 645 The Full Intra Request message is defined in Sections 3.5.1 and 4.3.1 646 of the Codec Control Messages [RFC5104]. It is used to make the 647 mixer request a new Intra picture from a participant in the session. 648 This is used when switching between sources to ensure that the 649 receivers can decode the video or other predictive media encoding 650 with long prediction chains. WebRTC senders MUST understand and 651 react to FIR feedback messages they receiver, since this greatly 652 improves the user experience when using centralised mixer-based 653 conferencing. Support for sending FIR messages is OPTIONAL. 655 5.1.2. Picture Loss Indication (PLI) 657 The Picture Loss Indication message is defined in Section 6.3.1 of 658 the RTP/AVPF profile [RFC4585]. It is used by a receiver to tell the 659 sending encoder that it lost the decoder context and would like to 660 have it repaired somehow. This is semantically different from the 661 Full Intra Request above as there could be multiple ways to fulfil 662 the request. WebRTC senders MUST understand and react to PLI 663 feedback messages as a loss tolerance mechanism. Receivers MAY send 664 PLI messages. 666 5.1.3. Slice Loss Indication (SLI) 668 The Slice Loss Indication message is defined in Section 6.3.2 of the 669 RTP/AVPF profile [RFC4585]. It is used by a receiver to tell the 670 encoder that it has detected the loss or corruption of one or more 671 consecutive macro blocks, and would like to have these repaired 672 somehow. It is RECOMMENDED that receivers generate SLI feedback 673 messages if slices are lost when using a codec that supports the 674 concept of macro blocks. A sender that receives an SLI feedback 675 message SHOULD attempt to repair the lost slice(s). 677 5.1.4. Reference Picture Selection Indication (RPSI) 679 Reference Picture Selection Indication (RPSI) messages are defined in 680 Section 6.3.3 of the RTP/AVPF profile [RFC4585]. Some video encoding 681 standards allow the use of older reference pictures than the most 682 recent one for predictive coding. If such a codec is in use, and if 683 the encoder has learnt that encoder-decoder synchronisation has been 684 lost, then a known as correct reference picture can be used as a base 685 for future coding. The RPSI message allows this to be signalled. 686 Receivers that detect that encoder-decoder synchronisation has been 687 lost SHOULD generate an RPSI feedback message if codec being used 688 supports reference picture selection. A RTP packet stream sender 689 that receives such an RPSI message SHOULD act on that messages to 690 change the reference picture, if it is possible to do so within the 691 available bandwidth constraints, and with the codec being used. 693 5.1.5. Temporal-Spatial Trade-off Request (TSTR) 695 The temporal-spatial trade-off request and notification are defined 696 in Sections 3.5.2 and 4.3.2 of [RFC5104]. This request can be used 697 to ask the video encoder to change the trade-off it makes between 698 temporal and spatial resolution, for example to prefer high spatial 699 image quality but low frame rate. Support for TSTR requests and 700 notifications is OPTIONAL. 702 5.1.6. Temporary Maximum Media Stream Bit Rate Request (TMMBR) 704 The TMMBR feedback message is defined in Sections 3.5.4 and 4.2.1 of 705 the Codec Control Messages [RFC5104]. This request and its 706 notification message are used by a media receiver to inform the 707 sending party that there is a current limitation on the amount of 708 bandwidth available to this receiver. This can be various reasons 709 for this: for example, an RTP mixer can use this message to limit the 710 media rate of the sender being forwarded by the mixer (without doing 711 media transcoding) to fit the bottlenecks existing towards the other 712 session participants. WebRTC senders are REQUIRED to implement 713 support for TMMBR messages, and MUST follow bandwidth limitations set 714 by a TMMBR message received for their SSRC. The sending of TMMBR 715 requests is OPTIONAL. 717 5.2. Header Extensions 719 The RTP specification [RFC3550] provides the capability to include 720 RTP header extensions containing in-band data, but the format and 721 semantics of the extensions are poorly specified. The use of header 722 extensions is OPTIONAL in the WebRTC context, but if they are used, 723 they MUST be formatted and signalled following the general mechanism 724 for RTP header extensions defined in [RFC5285], since this gives 725 well-defined semantics to RTP header extensions. 727 As noted in [RFC5285], the requirement from the RTP specification 728 that header extensions are "designed so that the header extension may 729 be ignored" [RFC3550] stands. To be specific, header extensions MUST 730 only be used for data that can safely be ignored by the recipient 731 without affecting interoperability, and MUST NOT be used when the 732 presence of the extension has changed the form or nature of the rest 733 of the packet in a way that is not compatible with the way the stream 734 is signalled (e.g., as defined by the payload type). Valid examples 735 of RTP header extensions might include metadata that is additional to 736 the usual RTP information, but that can safely be ignored without 737 compromising interoperability. 739 5.2.1. Rapid Synchronisation 741 Many RTP sessions require synchronisation between audio, video, and 742 other content. This synchronisation is performed by receivers, using 743 information contained in RTCP SR packets, as described in the RTP 744 specification [RFC3550]. This basic mechanism can be slow, however, 745 so it is RECOMMENDED that the rapid RTP synchronisation extensions 746 described in [RFC6051] be implemented in addition to RTCP SR-based 747 synchronisation. The rapid synchronisation extensions use the 748 general RTP header extension mechanism [RFC5285], which requires 749 signalling, but are otherwise backwards compatible. 751 5.2.2. Client-to-Mixer Audio Level 753 The Client to Mixer Audio Level extension [RFC6464] is an RTP header 754 extension used by an endpoint to inform a mixer about the level of 755 audio activity in the packet to which the header is attached. This 756 enables an RTP middlebox to make mixing or selection decisions 757 without decoding or detailed inspection of the payload, reducing the 758 complexity in some types of mixer. It can also save decoding 759 resources in receivers, which can choose to decode only the most 760 relevant RTP packet streams based on audio activity levels. 762 The Client-to-Mixer Audio Level [RFC6464] header extension is 763 RECOMMENDED to be implemented. If this header extension is 764 implemented, it is REQUIRED that implementations are capable of 765 encrypting the header extension according to [RFC6904] since the 766 information contained in these header extensions can be considered 767 sensitive. It is further RECOMMENDED that this encryption is used, 768 unless the encryption has been explicitly disabled through API or 769 signalling. 771 5.2.3. Mixer-to-Client Audio Level 773 The Mixer to Client Audio Level header extension [RFC6465] provides 774 an endpoint with the audio level of the different sources mixed into 775 a common mix by a RTP mixer. This enables a user interface to 776 indicate the relative activity level of each session participant, 777 rather than just being included or not based on the CSRC field. This 778 is a pure optimisations of non critical functions, and is hence 779 OPTIONAL to implement. If this header extension is implemented, it 780 is REQUIRED that implementations are capable of encrypting the header 781 extension according to [RFC6904] since the information contained in 782 these header extensions can be considered sensitive. It is further 783 RECOMMENDED that this encryption is used, unless the encryption has 784 been explicitly disabled through API or signalling. 786 6. WebRTC Use of RTP: Improving Transport Robustness 788 There are tools that can make RTP packet streams robust against 789 packet loss and reduce the impact of loss on media quality. However, 790 they all add overhead compared to a non-robust stream. The overhead 791 needs to be considered, and the aggregate bit-rate MUST be rate 792 controlled to avoid causing network congestion (see Section 7). As a 793 result, improving robustness might require a lower base encoding 794 quality, but has the potential to deliver that quality with fewer 795 errors. The mechanisms described in the following sub-sections can 796 be used to improve tolerance to packet loss. 798 6.1. Negative Acknowledgements and RTP Retransmission 800 As a consequence of supporting the RTP/SAVPF profile, implementations 801 can send negative acknowledgements (NACKs) for RTP data packets 802 [RFC4585]. This feedback can be used to inform a sender of the loss 803 of particular RTP packets, subject to the capacity limitations of the 804 RTCP feedback channel. A sender can use this information to optimise 805 the user experience by adapting the media encoding to compensate for 806 known lost packets. 808 RTP packet stream Senders are REQUIRED to understand the Generic NACK 809 message defined in Section 6.2.1 of [RFC4585], but MAY choose to 810 ignore some or all of this feedback (following Section 4.2 of 811 [RFC4585]). Receivers MAY send NACKs for missing RTP packets. 812 Guidelines on when to send NACKs are provided in [RFC4585]. It is 813 not expected that a receiver will send a NACK for every lost RTP 814 packet, rather it needs to consider the cost of sending NACK 815 feedback, and the importance of the lost packet, to make an informed 816 decision on whether it is worth telling the sender about a packet 817 loss event. 819 The RTP Retransmission Payload Format [RFC4588] offers the ability to 820 retransmit lost packets based on NACK feedback. Retransmission needs 821 to be used with care in interactive real-time applications to ensure 822 that the retransmitted packet arrives in time to be useful, but can 823 be effective in environments with relatively low network RTT (an RTP 824 sender can estimate the RTT to the receivers using the information in 825 RTCP SR and RR packets, as described at the end of Section 6.4.1 of 826 [RFC3550]). The use of retransmissions can also increase the forward 827 RTP bandwidth, and can potentially caused increased packet loss if 828 the original packet loss was caused by network congestion. We note, 829 however, that retransmission of an important lost packet to repair 830 decoder state can have lower cost than sending a full intra frame. 831 It is not appropriate to blindly retransmit RTP packets in response 832 to a NACK. The importance of lost packets and the likelihood of them 833 arriving in time to be useful needs to be considered before RTP 834 retransmission is used. 836 Receivers are REQUIRED to implement support for RTP retransmission 837 packets [RFC4588]. Senders MAY send RTP retransmission packets in 838 response to NACKs if the RTP retransmission payload format has been 839 negotiated for the session, and if the sender believes it is useful 840 to send a retransmission of the packet(s) referenced in the NACK. An 841 RTP sender does not need to retransmit every NACKed packet. 843 6.2. Forward Error Correction (FEC) 845 The use of Forward Error Correction (FEC) can provide an effective 846 protection against some degree of packet loss, at the cost of steady 847 bandwidth overhead. There are several FEC schemes that are defined 848 for use with RTP. Some of these schemes are specific to a particular 849 RTP payload format, others operate across RTP packets and can be used 850 with any payload format. It needs to be noted that using redundant 851 encoding or FEC will lead to increased play out delay, which needs to 852 be considered when choosing the redundancy or FEC formats and their 853 respective parameters. 855 If an RTP payload format negotiated for use in a RTCPeerConnection 856 supports redundant transmission or FEC as a standard feature of that 857 payload format, then that support MAY be used in the 858 RTCPeerConnection, subject to any appropriate signalling. 860 There are several block-based FEC schemes that are designed for use 861 with RTP independent of the chosen RTP payload format. At the time 862 of this writing there is no consensus on which, if any, of these FEC 863 schemes is appropriate for use in the WebRTC context. Accordingly, 864 this memo makes no recommendation on the choice of block-based FEC 865 for WebRTC use. 867 7. WebRTC Use of RTP: Rate Control and Media Adaptation 869 WebRTC will be used in heterogeneous network environments using a 870 variety set of link technologies, including both wired and wireless 871 links, to interconnect potentially large groups of users around the 872 world. As a result, the network paths between users can have widely 873 varying one-way delays, available bit-rates, load levels, and traffic 874 mixtures. Individual end-points can send one or more RTP packet 875 streams to each participant in a WebRTC conference, and there can be 876 several participants. Each of these RTP packet streams can contain 877 different types of media, and the type of media, bit rate, and number 878 of RTP packet streams as well as transport-layer flows can be highly 879 asymmetric. Non-RTP traffic can share the network paths with RTP 880 transport-layer flows. Since the network environment is not 881 predictable or stable, WebRTC end-points MUST ensure that the RTP 882 traffic they generate can adapt to match changes in the available 883 network capacity. 885 The quality of experience for users of WebRTC implementation is very 886 dependent on effective adaptation of the media to the limitations of 887 the network. End-points have to be designed so they do not transmit 888 significantly more data than the network path can support, except for 889 very short time periods, otherwise high levels of network packet loss 890 or delay spikes will occur, causing media quality degradation. The 891 limiting factor on the capacity of the network path might be the link 892 bandwidth, or it might be competition with other traffic on the link 893 (this can be non-WebRTC traffic, traffic due to other WebRTC flows, 894 or even competition with other WebRTC flows in the same session). 896 An effective media congestion control algorithm is therefore an 897 essential part of the WebRTC framework. However, at the time of this 898 writing, there is no standard congestion control algorithm that can 899 be used for interactive media applications such as WebRTC's flows. 900 Some requirements for congestion control algorithms for 901 RTCPeerConnections are discussed in [I-D.ietf-rmcat-cc-requirements]. 902 It is expected that a future version of this memo will mandate the 903 use of a congestion control algorithm that satisfies these 904 requirements. 906 7.1. Boundary Conditions and Circuit Breakers 908 In the absence of a concrete congestion control algorithm, all WebRTC 909 implementations MUST implement the RTP circuit breaker algorithm that 910 is described in [I-D.ietf-avtcore-rtp-circuit-breakers]. The RTP 911 circuit breaker is designed to enable applications to recognise and 912 react to situations of extreme network congestion. However, since 913 the RTP circuit breaker might not be triggered until congestion 914 becomes extreme, it cannot be considered a substitute for congestion 915 control, and applications MUST also implement congestion control to 916 allow them to adapt to changes in network capacity. Any future RTP 917 congestion control algorithms are expected to operate within the 918 envelope allowed by the circuit breaker. 920 The session establishment signalling will also necessarily establish 921 boundaries to which the media bit-rate will conform. The choice of 922 media codecs provides upper- and lower-bounds on the supported bit- 923 rates that the application can utilise to provide useful quality, and 924 the packetization choices that exist. In addition, the signalling 925 channel can establish maximum media bit-rate boundaries using the SDP 926 "b=AS:" or "b=CT:" lines, and the RTP/AVPF Temporary Maximum Media 927 Stream Bit Rate (TMMBR) Requests (see Section 5.1.6 of this memo). 928 The combination of media codec choice and signalled bandwidth limits 929 SHOULD be used to limit traffic based on known bandwidth limitations, 930 for example the capacity of the edge links, to the extent possible. 932 7.2. RTCP Limitations for Congestion Control 934 Experience with the congestion control algorithms of TCP [RFC5681], 935 TFRC [RFC5348], and DCCP [RFC4341], [RFC4342], [RFC4828], has shown 936 that feedback on packet arrivals needs to be sent frequently (roughly 937 once per round trip time is common). We note that the real-time 938 media traffic might not be able to adapt to changing path conditions 939 as rapidly as elastic applications using TCP, but frequent feedback, 940 perhaps on the order of once per video frame, is still needed to 941 allow the congestion control algorithm to track the path dynamics. 943 As an example of the type of RTCP congestion control feedback that is 944 possible, consider one of the simplest scenarios for WebRTC: a point 945 to point video call between two end systems. There will be four RTP 946 flows in this scenario, two audio and two video, with all four flows 947 being active for essentially all the time (the audio flows will 948 likely use voice activity detection and comfort noise to reduce the 949 packet rate during silent periods, but doesn't cause transmissions to 950 stop). Assume all four flows are sent in a single RTP session, each 951 using a separate SSRC. Further, assume each SSRC sends RTCP reports 952 for all other SSRCs in the session (i.e., the optimisations in 953 [I-D.ietf-avtcore-rtp-multi-stream-optimisation] are not used, giving 954 the worst case for the RTCP overhead). When all members are senders 955 like this, the RTCP timing rules in Sections 6.2 and 6.3 of [RFC3550] 956 and [RFC4585] reduce to: 958 rtcp_interval = avg_rtcp_size * n / rtcp_bw 960 where avg_rtcp_size is measured in octets, and the rtcp_bw is the 961 bandwidth available for RTCP. The average RTCP size will depend on 962 the amount of feedback that is sent in each RTCP packet, on the 963 number of members in the session, and on the size of source 964 description (RTCP SDES) information sent. As a baseline, each RTCP 965 packet will be a compound RTCP packet that contains an RTCP SR and an 966 RTCP SDES packet. In the scenario above, each RTCP SR packet will 967 contain three report blocks, once for each of the other RTP SSRCs 968 sending data, for a total of 100 octets (this is 8 octets header, 20 969 octets sender info, and 3 * 24 octets report blocks). The RTCP SDES 970 packet will comprise a header (4 octets), an originating SSRC (4 971 octets), a CNAME chunk, and padding. If the CNAME follows [RFC7022] 972 and it will be 19 octets in size, and require 1 octet of padding. 973 The resulting compound RTCP packet will be 128 octets in size. If 974 sent in UDP/IPv4 with no IP options and using Secure RTP, which adds 975 20 (IPv4) + 8 (UDP) + 14 (SRTP with 80 bit Authentication tag), the 976 avg_rtcp_size will therefore be 170 octets, including the header 977 overhead. The value n is this scenario is 4, and the rtcp_bw is 978 assumed to be 5% of the session bandwidth. 980 If it is desired to send RTCP feedback packets on average 30 times 981 per second, to correspond to one RTCP report every frame for 30fps 982 video, we can invert the above rtcp_interval calculation to get an 983 rtcp_bw that gives an interval of 1/30th of a second or lower. This 984 corresponds to an rtcp_bw of 20400 octets per second (since 1/30 = 985 170 * 4 / 20400). This is 163200 bits per second, which if 5% of the 986 session bandwidth, gives a session bandwidth of approximately 3.3Mbps 987 (i.e., 3.3Mbps media rate, plus an additional 5% for RTCP, to give a 988 total data rate of approximately 3.4Mbps). That is, RTCP can report 989 on every frame of video provided the session bandwidth is 3.3Mbps or 990 larger, when every SSRC sends a report for every video frame. Please 991 note that the actual RTCP transmission intervals will be within the 992 interval [0.0135, 0.0406]s, but maintaining an average RTCP 993 transmission interval of 0.033s. 995 Note: To achieve the RTCP transmission intervals above the RTP/ 996 SAVPF profile with T_rr_interval=0 is used, since even when using 997 the reduced minimal transmission interval, the RTP/SAVP profile 998 would only allow sending RTCP at most every 0.11s (every third 999 frame of video). Using RTP/SAVPF with T_rr_interval=0 however is 1000 capable of fully utilizing the configured 5% RTCP bandwidth 1001 fraction. 1003 If additional feedback beyond the standard report block is needed, 1004 the session bandwidth needed will increase. For example, with an 1005 additional 20 octets data being reported in each RTCP packet, the 1006 session bandwidth needed increases to 3.5Mbps for every SSRC to be 1007 able to report on every frame. However, the above baseline might not 1008 be the most appropriate usage of the RTCP bandwidth. Depending on 1009 needs, a less frequent usage of regular RTCP compound packets, 1010 controlled by T_rr_interval combined with using the reduced size RTCP 1011 packets, can achieve more frequent and useful reporting. Also the 1012 reporting requirements defined in 1013 [I-D.ietf-avtcore-rtp-multi-stream-optimisation] will reduced the 1014 amount of bandwidth consumed for reporting when each endpoint has 1015 multiple SSRCs. 1017 Calculations such as these show that RTCP cannot be used to send per- 1018 packet congestion feedback. RTCP can, however, be used to send 1019 congestion feedback on each frame of video sent in an interactive 1020 video conferencing scenario, provided the RTCP parameters are 1021 correctly configured and the overall session bandwidth exceeds a 1022 couple of megabits per second (the exact rate depending on the number 1023 of session participants, the RTCP bandwidth fraction, and whether 1024 audio and video are sent in one or two RTP sessions). Using similar 1025 calculations, it can be shown that RTCP can likely also be used to 1026 send feedback on a per-RTT basis, provided the RTT is not too low. 1028 Interactive communication might not be able to afford to wait for 1029 packet losses to occur to indicate congestion, because an increase in 1030 play out delay due to queuing (most prominent in wireless networks) 1031 can easily lead to packets being dropped due to late arrival at the 1032 receiver. Therefore, more sophisticated cues might need to be 1033 reported -- to be defined in a suitable congestion control framework 1034 as noted above -- which, in turn, increase the report size again. 1035 For example, different RTCP XR report blocks (jointly) provide the 1036 necessary details to implement a variety of congestion control 1037 algorithms, but the (compound) report size grows quickly. 1039 7.3. Congestion Control Interoperability and Legacy Systems 1041 There are legacy RTP implementations that do not implement RTCP, and 1042 hence do not provide any congestion feedback. Congestion control 1043 cannot be performed with these end-points. WebRTC implementations 1044 that need to interwork with such end-points MUST limit their 1045 transmission to a low rate, equivalent to a VoIP call using a low 1046 bandwidth codec, that is unlikely to cause any significant 1047 congestion. 1049 When interworking with legacy implementations that support RTCP using 1050 the RTP/AVP profile [RFC3551], congestion feedback is provided in 1051 RTCP RR packets every few seconds. Implementations that have to 1052 interwork with such end-points MUST ensure that they keep within the 1053 RTP circuit breaker [I-D.ietf-avtcore-rtp-circuit-breakers] 1054 constraints to limit the congestion they can cause. 1056 If a legacy end-point supports RTP/AVPF, this enables negotiation of 1057 important parameters for frequent reporting, such as the "trr-int" 1058 parameter, and the possibility that the end-point supports some 1059 useful feedback format for congestion control purpose such as TMMBR 1060 [RFC5104]. Implementations that have to interwork with such end- 1061 points MUST ensure that they stay within the RTP circuit breaker 1062 [I-D.ietf-avtcore-rtp-circuit-breakers] constraints to limit the 1063 congestion they can cause, but might find that they can achieve 1064 better congestion response depending on the amount of feedback that 1065 is available. 1067 With proprietary congestion control algorithms issues can arise when 1068 different algorithms and implementations interact in a communication 1069 session. If the different implementations have made different 1070 choices in regards to the type of adaptation, for example one sender 1071 based, and one receiver based, then one could end up in situation 1072 where one direction is dual controlled, when the other direction is 1073 not controlled. This memo cannot mandate behaviour for proprietary 1074 congestion control algorithms, but implementations that use such 1075 algorithms ought to be aware of this issue, and try to ensure that 1076 both effective congestion control is negotiated for media flowing in 1077 both directions. If the IETF were to standardise both sender- and 1078 receiver-based congestion control algorithms for WebRTC traffic in 1079 the future, the issues of interoperability, control, and ensuring 1080 that both directions of media flow are congestion controlled would 1081 also need to be considered. 1083 8. WebRTC Use of RTP: Performance Monitoring 1085 As described in Section 4.1, implementations are REQUIRED to generate 1086 RTCP Sender Report (SR) and Reception Report (RR) packets relating to 1087 the RTP packet streams they send and receive. These RTCP reports can 1088 be used for performance monitoring purposes, since they include basic 1089 packet loss and jitter statistics. 1091 A large number of additional performance metrics are supported by the 1092 RTCP Extended Reports (XR) framework [RFC3611][RFC6792]. At the time 1093 of this writing, it is not clear what extended metrics are suitable 1094 for use in the WebRTC context, so there is no requirement that 1095 implementations generate RTCP XR packets. However, implementations 1096 that can use detailed performance monitoring data MAY generate RTCP 1097 XR packets as appropriate; the use of such packets SHOULD be 1098 signalled in advance. 1100 All WebRTC implementations MUST be prepared to receive RTP XR report 1101 packets, whether or not they were signalled. There is no requirement 1102 that the data contained in such reports be used, or exposed to the 1103 Javascript application, however. 1105 9. WebRTC Use of RTP: Future Extensions 1107 It is possible that the core set of RTP protocols and RTP extensions 1108 specified in this memo will prove insufficient for the future needs 1109 of WebRTC applications. In this case, future updates to this memo 1110 MUST be made following the Guidelines for Writers of RTP Payload 1111 Format Specifications [RFC2736], How to Write an RTP Payload Format 1112 [I-D.ietf-payload-rtp-howto] and Guidelines for Extending the RTP 1113 Control Protocol [RFC5968], and SHOULD take into account any future 1114 guidelines for extending RTP and related protocols that have been 1115 developed. 1117 Authors of future extensions are urged to consider the wide range of 1118 environments in which RTP is used when recommending extensions, since 1119 extensions that are applicable in some scenarios can be problematic 1120 in others. Where possible, the WebRTC framework will adopt RTP 1121 extensions that are of general utility, to enable easy implementation 1122 of a gateway to other applications using RTP, rather than adopt 1123 mechanisms that are narrowly targeted at specific WebRTC use cases. 1125 10. Signalling Considerations 1127 RTP is built with the assumption that an external signalling channel 1128 exists, and can be used to configure RTP sessions and their features. 1129 The basic configuration of an RTP session consists of the following 1130 parameters: 1132 RTP Profile: The name of the RTP profile to be used in session. The 1133 RTP/AVP [RFC3551] and RTP/AVPF [RFC4585] profiles can interoperate 1134 on basic level, as can their secure variants RTP/SAVP [RFC3711] 1135 and RTP/SAVPF [RFC5124]. The secure variants of the profiles do 1136 not directly interoperate with the non-secure variants, due to the 1137 presence of additional header fields for authentication in SRTP 1138 packets and cryptographic transformation of the payload. WebRTC 1139 requires the use of the RTP/SAVPF profile, and this MUST be 1140 signalled if SDP is used. Interworking functions might transform 1141 this into the RTP/SAVP profile for a legacy use case, by 1142 indicating to the WebRTC end-point that the RTP/SAVPF is used, and 1143 limiting the usage of the "a=rtcp-fb:" attribute to indicate a 1144 trr-int value of 4 seconds. 1146 Transport Information: Source and destination IP address(s) and 1147 ports for RTP and RTCP MUST be signalled for each RTP session. In 1148 WebRTC these transport addresses will be provided by ICE that 1149 signals candidates and arrives at nominated candidate address 1150 pairs. If RTP and RTCP multiplexing [RFC5761] is to be used, such 1151 that a single port, i.e. transport-layer flow, is used for RTP and 1152 RTCP flows, this MUST be signalled (see Section 4.5). 1154 RTP Payload Types, media formats, and format parameters: The mapping 1155 between media type names (and hence the RTP payload formats to be 1156 used), and the RTP payload type numbers MUST be signalled. Each 1157 media type MAY also have a number of media type parameters that 1158 MUST also be signalled to configure the codec and RTP payload 1159 format (the "a=fmtp:" line from SDP). Section 4.3 of this memo 1160 discusses requirements for uniqueness of payload types. 1162 RTP Extensions: The RTP extensions to be used SHOULD be agreed upon, 1163 including any parameters for each respective extension. At the 1164 very least, this will help avoiding using bandwidth for features 1165 that the other end-point will ignore. But for certain mechanisms 1166 there is requirement for this to happen as interoperability 1167 failure otherwise happens. 1169 RTCP Bandwidth: Support for exchanging RTCP Bandwidth values to the 1170 end-points will be necessary. This SHALL be done as described in 1171 "Session Description Protocol (SDP) Bandwidth Modifiers for RTP 1172 Control Protocol (RTCP) Bandwidth" [RFC3556], or something 1173 semantically equivalent. This also ensures that the end-points 1174 have a common view of the RTCP bandwidth, this is important as too 1175 different view of the bandwidths can lead to failure to 1176 interoperate. 1178 These parameters are often expressed in SDP messages conveyed within 1179 an offer/answer exchange. RTP does not depend on SDP or on the offer 1180 /answer model, but does require all the necessary parameters to be 1181 agreed upon, and provided to the RTP implementation. We note that in 1182 the WebRTC context it will depend on the signalling model and API how 1183 these parameters need to be configured but they will be need to 1184 either set in the API or explicitly signalled between the peers. 1186 11. WebRTC API Considerations 1188 The WebRTC API [W3C.WD-webrtc-20130910] and the Media Capture and 1189 Streams API [W3C.WD-mediacapture-streams-20130903] defines and uses 1190 the concept of a MediaStream that consists of zero or more 1191 MediaStreamTracks. A MediaStreamTrack is an individual stream of 1192 media from any type of media source like a microphone or a camera, 1193 but also conceptual sources, like a audio mix or a video composition, 1194 are possible. The MediaStreamTracks within a MediaStream need to be 1195 possible to play out synchronised. 1197 A MediaStreamTrack's realisation in RTP in the context of an 1198 RTCPeerConnection consists of a source packet stream identified with 1199 an SSRC within an RTP session part of the RTCPeerConnection. The 1200 MediaStreamTrack can also result in additional packet streams, and 1201 thus SSRCs, in the same RTP session. These can be dependent packet 1202 streams from scalable encoding of the source stream associated with 1203 the MediaStreamTrack, if such a media encoder is used. They can also 1204 be redundancy packet streams, these are created when applying Forward 1205 Error Correction (Section 6.2) or RTP retransmission (Section 6.1) to 1206 the source packet stream. 1208 It is important to note that the same media source can be feeding 1209 multiple MediaStreamTracks. As different sets of constraints or 1210 other parameters can be applied to the MediaStreamTrack, each 1211 MediaStreamTrack instance added to a RTCPeerConnection SHALL result 1212 in an independent source packet stream, with its own set of 1213 associated packet streams, and thus different SSRC(s). It will 1214 depend on applied constraints and parameters if the source stream and 1215 the encoding configuration will be identical between different 1216 MediaStreamTracks sharing the same media source. Thus it is possible 1217 for multiple source packet streams to share encoded streams (but not 1218 packet streams), but this is an implementation choice to try to 1219 utilise such optimisations. Note that such optimizations would need 1220 to take into account that the constraints for one of the 1221 MediaStreamTracks can at any moment change, meaning that the encoding 1222 configurations might no longer be identical. 1224 The same MediaStreamTrack can also be included in multiple 1225 MediaStreams, thus multiple sets of MediaStreams can implicitly need 1226 to use the same synchronisation base. To ensure that this works in 1227 all cases, and don't forces a end-point to change synchronisation 1228 base and CNAME in the middle of a ongoing delivery of any packet 1229 streams, which would cause media disruption; all MediaStreamTracks 1230 and their associated SSRCs originating from the same end-point needs 1231 to be sent using the same CNAME within one RTCPeerConnection. This 1232 is motivating the strong recommendation in Section 4.9 to only use a 1233 single CNAME. 1235 The requirement on using the same CNAME for all SSRCs that 1236 originates from the same end-point, does not require middleboxes 1237 that forwards traffic from multiple end-points to only use a 1238 single CNAME. 1240 Different CNAMEs normally need to be used for different 1241 RTCPeerConnection instances, as specified in Section 4.9. Having two 1242 communication sessions with the same CNAME could enable tracking of a 1243 user or device across different services (see Section 4.4.1 of 1244 [I-D.ietf-rtcweb-security] for details). A web application can 1245 request that the CNAMEs used in different RTCPeerConnection within a 1246 same-orign context to be the same, this allow for synchronization of 1247 the endpoint's RTP packet streams across the different 1248 RTCPeerConnections. 1250 Note: this doesn't result in a tracking issue, since the creation 1251 of matching CNAMEs depends on existing tracking. 1253 The above will currently force a WebRTC end-point that receives an 1254 MediaStreamTrack on one RTCPeerConnection and adds it as an outgoing 1255 on any RTCPeerConnection to perform resynchronisation of the stream. 1256 This, as the sending party needs to change the CNAME, which implies 1257 that it has to use a locally available system clock as timebase for 1258 the synchronisation. Thus, the relative relation between the 1259 timebase of the incoming stream and the system sending out needs to 1260 defined. This relation also needs monitoring for clock drift and 1261 likely adjustments of the synchronisation. The sending entity is 1262 also responsible for congestion control for its the sent streams. In 1263 cases of packet loss the loss of incoming data also needs to be 1264 handled. This leads to the observation that the method that is least 1265 likely to cause issues or interruptions in the outgoing source packet 1266 stream is a model of full decoding, including repair etc followed by 1267 encoding of the media again into the outgoing packet stream. 1268 Optimisations of this method is clearly possible and implementation 1269 specific. 1271 A WebRTC end-point MUST support receiving multiple MediaStreamTracks, 1272 where each of different MediaStreamTracks (and their sets of 1273 associated packet streams) uses different CNAMEs. However, 1274 MediaStreamTracks that are received with different CNAMEs have no 1275 defined synchronisation. 1277 Note: The motivation for supporting reception of multiple CNAMEs 1278 are to allow for forward compatibility with any future changes 1279 that enables more efficient stream handling when end-points relay/ 1280 forward streams. It also ensures that end-points can interoperate 1281 with certain types of multi-stream middleboxes or end-points that 1282 are not WebRTC. 1284 The binding between the WebRTC MediaStreams, MediaStreamTracks and 1285 the SSRC is done as specified in "Cross Session Stream Identification 1286 in the Session Description Protocol" [I-D.ietf-mmusic-msid]. This 1287 document [I-D.ietf-mmusic-msid] also defines, in section 4.1, how to 1288 map unknown source packet stream SSRCs to MediaStreamTracks and 1289 MediaStreams. Commonly the RTP Payload Type of any incoming packets 1290 will reveal if the packet stream is a source stream or a redundancy 1291 or dependent packet stream. The association to the correct source 1292 packet stream depends on the payload format in use for the packet 1293 stream. 1295 Finally this specification puts a requirement on the WebRTC API to 1296 realize a method for determining the CSRC list (Section 4.1) as well 1297 as the Mixer-to-Client audio levels (Section 5.2.3) (when supported) 1298 and the basic requirements for this is further discussed in 1299 Section 12.2.1. 1301 12. RTP Implementation Considerations 1303 The following discussion provides some guidance on the implementation 1304 of the RTP features described in this memo. The focus is on a WebRTC 1305 end-point implementation perspective, and while some mention is made 1306 of the behaviour of middleboxes, that is not the focus of this memo. 1308 12.1. Configuration and Use of RTP Sessions 1310 A WebRTC end-point will be a simultaneous participant in one or more 1311 RTP sessions. Each RTP session can convey multiple media sources, 1312 and can include media data from multiple end-points. In the 1313 following, we outline some ways in which WebRTC end-points can 1314 configure and use RTP sessions. 1316 12.1.1. Use of Multiple Media Sources Within an RTP Session 1318 RTP is a group communication protocol, and every RTP session can 1319 potentially contain multiple RTP packet streams. There are several 1320 reasons why this might be desirable: 1322 Multiple media types: Outside of WebRTC, it is common to use one RTP 1323 session for each type of media sources (e.g., one RTP session for 1324 audio sources and one for video sources, each sent over different 1325 transport layer flows). However, to reduce the number of UDP 1326 ports used, the default in WebRTC is to send all types of media in 1327 a single RTP session, as described in Section 4.4, using RTP and 1328 RTCP multiplexing (Section 4.5) to further reduce the number of 1329 UDP ports needed. This RTP session then uses only one bi- 1330 directional transport-layer flow, but will contain multiple RTP 1331 packet streams, each containing a different type of media. A 1332 common example might be an end-point with a camera and microphone 1333 that sends two RTP packet streams, one video and one audio, into a 1334 single RTP session. 1336 Multiple Capture Devices: A WebRTC end-point might have multiple 1337 cameras, microphones, or other media capture devices, and so might 1338 want to generate several RTP packet streams of the same media 1339 type. Alternatively, it might want to send media from a single 1340 capture device in several different formats or quality settings at 1341 once. Both can result in a single end-point sending multiple RTP 1342 packet streams of the same media type into a single RTP session at 1343 the same time. 1345 Associated Repair Data: An end-point might send a RTP packet stream 1346 that is somehow associated with another stream. For example, it 1347 might send an RTP packet stream that contains FEC or 1348 retransmission data relating to another stream. Some RTP payload 1349 formats send this sort of associated repair data as part of the 1350 source packet stream, while others send it as a separate packet 1351 stream. 1353 Layered or Multiple Description Coding: An end-point can use a 1354 layered media codec, for example H.264 SVC, or a multiple 1355 description codec, that generates multiple RTP packet streams, 1356 each with a distinct RTP SSRC, within a single RTP session. 1358 RTP Mixers, Translators, and Other Middleboxes: An RTP session, in 1359 the WebRTC context, is a point-to-point association between an 1360 end-point and some other peer device, where those devices share a 1361 common SSRC space. The peer device might be another WebRTC end- 1362 point, or it might be an RTP mixer, translator, or some other form 1363 of media processing middlebox. In the latter cases, the middlebox 1364 might send mixed or relayed RTP streams from several participants, 1365 that the WebRTC end-point will need to render. Thus, even though 1366 a WebRTC end-point might only be a member of a single RTP session, 1367 the peer device might be extending that RTP session to incorporate 1368 other end-points. WebRTC is a group communication environment and 1369 end-points need to be capable of receiving, decoding, and playing 1370 out multiple RTP packet streams at once, even in a single RTP 1371 session. 1373 12.1.2. Use of Multiple RTP Sessions 1375 In addition to sending and receiving multiple RTP packet streams 1376 within a single RTP session, a WebRTC end-point might participate in 1377 multiple RTP sessions. There are several reasons why a WebRTC end- 1378 point might choose to do this: 1380 To interoperate with legacy devices: The common practice in the non- 1381 WebRTC world is to send different types of media in separate RTP 1382 sessions, for example using one RTP session for audio and another 1383 RTP session, on a separate transport layer flow, for video. All 1384 WebRTC end-points need to support the option of sending different 1385 types of media on different RTP sessions, so they can interwork 1386 with such legacy devices. This is discussed further in 1387 Section 4.4. 1389 To provide enhanced quality of service: Some network-based quality 1390 of service mechanisms operate on the granularity of transport 1391 layer flows. If it is desired to use these mechanisms to provide 1392 differentiated quality of service for some RTP packet streams, 1393 then those RTP packet streams need to be sent in a separate RTP 1394 session using a different transport-layer flow, and with 1395 appropriate quality of service marking. This is discussed further 1396 in Section 12.1.3. 1398 To separate media with different purposes: An end-point might want 1399 to send RTP packet streams that have different purposes on 1400 different RTP sessions, to make it easy for the peer device to 1401 distinguish them. For example, some centralised multiparty 1402 conferencing systems display the active speaker in high 1403 resolution, but show low resolution "thumbnails" of other 1404 participants. Such systems might configure the end-points to send 1405 simulcast high- and low-resolution versions of their video using 1406 separate RTP sessions, to simplify the operation of the RTP 1407 middlebox. In the WebRTC context this is currently possible to 1408 accomplished by establishing multiple WebRTC MediaStreamTracks 1409 that have the same media source in one (or more) 1410 RTCPeerConnection. Each MediaStreamTrack is then configured to 1411 deliver a particular media quality and thus media bit-rate, and 1412 will produce an independently encoded version with the codec 1413 parameters agreed specifically in the context of that 1414 RTCPeerConnection. The RTP middlebox can distinguish packets 1415 corresponding to the low- and high-resolution streams by 1416 inspecting their SSRC, RTP payload type, or some other information 1417 contained in RTP payload, RTP header extension or RTCP packets, 1418 but it can be easier to distinguish the RTP packet streams if they 1419 arrive on separate RTP sessions on separate transport-layer flows. 1421 To directly connect with multiple peers: A multi-party conference 1422 does not need to use an RTP middlebox. Rather, a multi-unicast 1423 mesh can be created, comprising several distinct RTP sessions, 1424 with each participant sending RTP traffic over a separate RTP 1425 session (that is, using an independent RTCPeerConnection object) 1426 to every other participant, as shown in Figure 1. This topology 1427 has the benefit of not requiring an RTP middlebox node that is 1428 trusted to access and manipulate the media data. The downside is 1429 that it increases the used bandwidth at each sender by requiring 1430 one copy of the RTP packet streams for each participant that are 1431 part of the same session beyond the sender itself. 1433 +---+ +---+ 1434 | A |<--->| B | 1435 +---+ +---+ 1436 ^ ^ 1437 \ / 1438 \ / 1439 v v 1440 +---+ 1441 | C | 1442 +---+ 1444 Figure 1: Multi-unicast using several RTP sessions 1446 The multi-unicast topology could also be implemented as a single 1447 RTP session, spanning multiple peer-to-peer transport layer 1448 connections, or as several pairwise RTP sessions, one between each 1449 pair of peers. To maintain a coherent mapping between the 1450 relation between RTP sessions and RTCPeerConnection objects we 1451 recommend that this is implemented as several individual RTP 1452 sessions. The only downside is that end-point A will not learn of 1453 the quality of any transmission happening between B and C, since 1454 it will not see RTCP reports for the RTP session between B and C, 1455 whereas it would it all three participants were part of a single 1456 RTP session. Experience with the Mbone tools (experimental RTP- 1457 based multicast conferencing tools from the late 1990s) has showed 1458 that RTCP reception quality reports for third parties can usefully 1459 be presented to the users in a way that helps them understand 1460 asymmetric network problems, and the approach of using separate 1461 RTP sessions prevents this. However, an advantage of using 1462 separate RTP sessions is that it enables using different media 1463 bit-rates and RTP session configurations between the different 1464 peers, thus not forcing B to endure the same quality reductions if 1465 there are limitations in the transport from A to C as C will. It 1466 it believed that these advantages outweigh the limitations in 1467 debugging power. 1469 To indirectly connect with multiple peers: A common scenario in 1470 multi-party conferencing is to create indirect connections to 1471 multiple peers, using an RTP mixer, translator, or some other type 1472 of RTP middlebox. Figure 2 outlines a simple topology that might 1473 be used in a four-person centralised conference. The middlebox 1474 acts to optimise the transmission of RTP packet streams from 1475 certain perspectives, either by only sending some of the received 1476 RTP packet stream to any given receiver, or by providing a 1477 combined RTP packet stream out of a set of contributing streams. 1479 +---+ +-------------+ +---+ 1480 | A |<---->| |<---->| B | 1481 +---+ | RTP mixer, | +---+ 1482 | translator, | 1483 | or other | 1484 +---+ | middlebox | +---+ 1485 | C |<---->| |<---->| D | 1486 +---+ +-------------+ +---+ 1488 Figure 2: RTP mixer with only unicast paths 1490 There are various methods of implementation for the middlebox. If 1491 implemented as a standard RTP mixer or translator, a single RTP 1492 session will extend across the middlebox and encompass all the 1493 end-points in one multi-party session. Other types of middlebox 1494 might use separate RTP sessions between each end-point and the 1495 middlebox. A common aspect is that these RTP middleboxes can use 1496 a number of tools to control the media encoding provided by a 1497 WebRTC end-point. This includes functions like requesting 1498 breaking the encoding chain and have the encoder produce a so 1499 called Intra frame. Another is limiting the bit-rate of a given 1500 stream to better suit the mixer view of the multiple down-streams. 1501 Others are controlling the most suitable frame-rate, picture 1502 resolution, the trade-off between frame-rate and spatial quality. 1503 The middlebox gets the significant responsibility to correctly 1504 perform congestion control, source identification, manage 1505 synchronisation while providing the application with suitable 1506 media optimizations. The middlebox is also has to be a trusted 1507 node when it comes to security, since it manipulates either the 1508 RTP header or the media itself (or both) received from one end- 1509 point, before sending it on towards the end-point(s), thus they 1510 need to be able to decrypt and then encrypt it before sending it 1511 out. 1513 RTP Mixers can create a situation where an end-point experiences a 1514 situation in-between a session with only two end-points and 1515 multiple RTP sessions. Mixers are expected to not forward RTCP 1516 reports regarding RTP packet streams across themselves. This is 1517 due to the difference in the RTP packet streams provided to the 1518 different end-points. The original media source lacks information 1519 about a mixer's manipulations prior to sending it the different 1520 receivers. This scenario also results in that an end-point's 1521 feedback or requests goes to the mixer. When the mixer can't act 1522 on this by itself, it is forced to go to the original media source 1523 to fulfil the receivers request. This will not necessarily be 1524 explicitly visible any RTP and RTCP traffic, but the interactions 1525 and the time to complete them will indicate such dependencies. 1527 Providing source authentication in multi-party scenarios is a 1528 challenge. In the mixer-based topologies, end-points source 1529 authentication is based on, firstly, verifying that media comes 1530 from the mixer by cryptographic verification and, secondly, trust 1531 in the mixer to correctly identify any source towards the end- 1532 point. In RTP sessions where multiple end-points are directly 1533 visible to an end-point, all end-points will have knowledge about 1534 each others' master keys, and can thus inject packets claimed to 1535 come from another end-point in the session. Any node performing 1536 relay can perform non-cryptographic mitigation by preventing 1537 forwarding of packets that have SSRC fields that came from other 1538 end-points before. For cryptographic verification of the source 1539 SRTP would require additional security mechanisms, for example 1540 TESLA for SRTP [RFC4383], that are not part of the base WebRTC 1541 standards. 1543 To forward media between multiple peers: It is sometimes desirable 1544 for an end-point that receives an RTP packet stream to be able to 1545 forward that RTP packet stream to a third party. The are some 1546 obvious security and privacy implications in supporting this, but 1547 also potential uses. This is supported in the W3C API by taking 1548 the received and decoded media and using it as media source that 1549 is re-encoding and transmitted as a new stream. 1551 At the RTP layer, media forwarding acts as a back-to-back RTP 1552 receiver and RTP sender. The receiving side terminates the RTP 1553 session and decodes the media, while the sender side re-encodes 1554 and transmits the media using an entirely separate RTP session. 1555 The original sender will only see a single receiver of the media, 1556 and will not be able to tell that forwarding is happening based on 1557 RTP-layer information since the RTP session that is used to send 1558 the forwarded media is not connected to the RTP session on which 1559 the media was received by the node doing the forwarding. 1561 The end-point that is performing the forwarding is responsible for 1562 producing an RTP packet stream suitable for onwards transmission. 1563 The outgoing RTP session that is used to send the forwarded media 1564 is entirely separate to the RTP session on which the media was 1565 received. This will require media transcoding for congestion 1566 control purpose to produce a suitable bit-rate for the outgoing 1567 RTP session, reducing media quality and forcing the forwarding 1568 end-point to spend the resource on the transcoding. The media 1569 transcoding does result in a separation of the two different legs 1570 removing almost all dependencies, and allowing the forwarding end- 1571 point to optimize its media transcoding operation. The cost is 1572 greatly increased computational complexity on the forwarding node. 1573 Receivers of the forwarded stream will see the forwarding device 1574 as the sender of the stream, and will not be able to tell from the 1575 RTP layer that they are receiving a forwarded stream rather than 1576 an entirely new RTP packet stream generated by the forwarding 1577 device. 1579 12.1.3. Differentiated Treatment of RTP Packet Streams 1581 There are use cases for differentiated treatment of RTP packet 1582 streams. Such differentiation can happen at several places in the 1583 system. First of all is the prioritization within the end-point 1584 sending the media, which controls, both which RTP packet streams that 1585 will be sent, and their allocation of bit-rate out of the current 1586 available aggregate as determined by the congestion control. 1588 It is expected that the WebRTC API [W3C.WD-webrtc-20130910] will 1589 allow the application to indicate relative priorities for different 1590 MediaStreamTracks. These priorities can then be used to influence 1591 the local RTP processing, especially when it comes to congestion 1592 control response in how to divide the available bandwidth between the 1593 RTP packet streams. Any changes in relative priority will also need 1594 to be considered for RTP packet streams that are associated with the 1595 main RTP packet streams, such as redundant streams for RTP 1596 retransmission and FEC. The importance of such redundant RTP packet 1597 streams is dependent on the media type and codec used, in regards to 1598 how robust that codec is to packet loss. However, a default policy 1599 might to be to use the same priority for redundant RTP packet stream 1600 as for the source RTP packet stream. 1602 Secondly, the network can prioritize transport-layer flows and sub- 1603 flows, including RTP packet streams. Typically, differential 1604 treatment includes two steps, the first being identifying whether an 1605 IP packet belongs to a class that has to be treated differently, the 1606 second the actual mechanism to prioritize packets. This is done 1607 according to three methods: 1609 DiffServ: The end-point marks a packet with a DiffServ code point to 1610 indicate to the network that the packet belongs to a particular 1611 class. 1613 Flow based: Packets that need to be given a particular treatment are 1614 identified using a combination of IP and port address. 1616 Deep Packet Inspection: A network classifier (DPI) inspects the 1617 packet and tries to determine if the packet represents a 1618 particular application and type that is to be prioritized. 1620 Flow-based differentiation will provide the same treatment to all 1621 packets within a transport-layer flow, i.e., relative prioritization 1622 is not possible. Moreover, if the resources are limited it might not 1623 be possible to provide differential treatment compared to best-effort 1624 for all the RTP packet streams in a WebRTC application. When flow- 1625 based differentiation is available the WebRTC application needs to 1626 know about it so that it can provide the separation of the RTP packet 1627 streams onto different UDP flows to enable a more granular usage of 1628 flow based differentiation. That way at least providing different 1629 prioritization of audio and video if desired by application. 1631 DiffServ assumes that either the end-point or a classifier can mark 1632 the packets with an appropriate DSCP so that the packets are treated 1633 according to that marking. If the end-point is to mark the traffic 1634 two requirements arise in the WebRTC context: 1) The WebRTC 1635 application or browser has to know which DSCP to use and that it can 1636 use them on some set of RTP packet streams. 2) The information needs 1637 to be propagated to the operating system when transmitting the 1638 packet. Details of this process are outside the scope of this memo 1639 and are further discussed in "DSCP and other packet markings for 1640 RTCWeb QoS" [I-D.ietf-tsvwg-rtcweb-qos]. 1642 For packet based marking schemes it might be possible to mark 1643 individual RTP packets differently based on the relative priority of 1644 the RTP payload. For example video codecs that have I, P, and B 1645 pictures could prioritise any payloads carrying only B frames less, 1646 as these are less damaging to loose. However, depending on the QoS 1647 mechanism and what markings that are applied, this can result in not 1648 only different packet drop probabilities but also packet reordering, 1649 see [I-D.ietf-tsvwg-rtcweb-qos] for further discussion. As default 1650 policy all RTP packets related to a RTP packet stream ought to be 1651 provided with the same prioritization; per-packet prioritization is 1652 outside the scope of this memo, but might be specified elsewhere in 1653 future. 1655 It is also important to consider how RTCP packets associated with a 1656 particular RTP packet stream need to be marked. RTCP compound 1657 packets with Sender Reports (SR), ought to be marked with the same 1658 priority as the RTP packet stream itself, so the RTCP-based round- 1659 trip time (RTT) measurements are done using the same transport-layer 1660 flow priority as the RTP packet stream experiences. RTCP compound 1661 packets containing RR packet ought to be sent with the priority used 1662 by the majority of the RTP packet streams reported on. RTCP packets 1663 containing time-critical feedback packets can use higher priority to 1664 improve the timeliness and likelihood of delivery of such feedback. 1666 12.2. Media Source, RTP Packet Streams, and Participant Identification 1667 12.2.1. Media Source 1669 Each RTP packet stream is identified by a unique synchronisation 1670 source (SSRC) identifier. The SSRC identifier is carried in each of 1671 the RTP packets comprising a RTP packet stream, and is also used to 1672 identify that stream in the corresponding RTCP reports. The SSRC is 1673 chosen as discussed in Section 4.8. The first stage in 1674 demultiplexing RTP and RTCP packets received on a single transport 1675 layer flow at a WebRTC end-point is to separate the RTP packet 1676 streams based on their SSRC value; once that is done, additional 1677 demultiplexing steps can determine how and where to render the media. 1679 RTP allows a mixer, or other RTP-layer middlebox, to combine encoded 1680 streams from multiple media sources to form a new encoded stream from 1681 a new media source (the mixer). The RTP packets in that new RTP 1682 packet stream can include a Contributing Source (CSRC) list, 1683 indicating which original SSRCs contributed to the combined source 1684 stream. As described in Section 4.1, implementations need to support 1685 reception of RTP data packets containing a CSRC list and RTCP packets 1686 that relate to sources present in the CSRC list. The CSRC list can 1687 change on a packet-by-packet basis, depending on the mixing operation 1688 being performed. Knowledge of what media sources contributed to a 1689 particular RTP packet can be important if the user interface 1690 indicates which participants are active in the session. Changes in 1691 the CSRC list included in packets needs to be exposed to the WebRTC 1692 application using some API, if the application is to be able to track 1693 changes in session participation. It is desirable to map CSRC values 1694 back into WebRTC MediaStream identities as they cross this API, to 1695 avoid exposing the SSRC/CSRC name space to JavaScript applications. 1697 If the mixer-to-client audio level extension [RFC6465] is being used 1698 in the session (see Section 5.2.3), the information in the CSRC list 1699 is augmented by audio level information for each contributing source. 1700 This information can usefully be exposed in the user interface. 1702 12.2.2. SSRC Collision Detection 1704 The RTP standard [RFC3550] requires any RTP implementation to have 1705 support for detecting and handling SSRC collisions, i.e., resolve the 1706 conflict when two different end-points use the same SSRC value. This 1707 requirement also applies to WebRTC end-points. There are several 1708 scenarios where SSRC collisions can occur: 1710 o In a point-to-point session where each SSRC is associated with 1711 either of the two end-points and where the main media carrying 1712 SSRC identifier will be announced in the signalling channel, a 1713 collision is less likely to occur due to the information about 1714 used SSRCs provided by Source-Specific SDP Attributes [RFC5576]. 1716 Still, collisions can occur if both end-points start uses an new 1717 SSRC identifier prior to having signalled it to the peer and 1718 received acknowledgement on the signalling message. The Source- 1719 Specific SDP Attributes [RFC5576] contains no mechanism to resolve 1720 SSRC collisions or reject a end-points usage of an SSRC. 1722 o SSRC values that have not been signalled could also appear in an 1723 RTP session. This is more likely than it appears, since some RTP 1724 functions use extra SSRCs to provide their functionality. For 1725 example, retransmission data might be transmitted using a separate 1726 RTP packet stream that requires its own SSRC, separate to the SSRC 1727 of the source RTP packet stream [RFC4588]. In those cases, an 1728 end-point can create a new SSRC that strictly doesn't need to be 1729 announced over the signalling channel to function correctly on 1730 both RTP and RTCPeerConnection level. 1732 o Multiple end-points in a multiparty conference can create new 1733 sources and signal those towards the RTP middlebox. In cases 1734 where the SSRC/CSRC are propagated between the different end- 1735 points from the RTP middlebox collisions can occur. 1737 o An RTP middlebox could connect an end-point's RTCPeerConnection to 1738 another RTCPeerConnection from the same end-point, thus forming a 1739 loop where the end-point will receive its own traffic. While is 1740 is clearly considered a bug, it is important that the end-point is 1741 able to recognise and handle the case when it occurs. This case 1742 becomes even more problematic when media mixers, and so on, are 1743 involved, where the stream received is a different stream but 1744 still contains this client's input. 1746 These SSRC/CSRC collisions can only be handled on RTP level as long 1747 as the same RTP session is extended across multiple 1748 RTCPeerConnections by a RTP middlebox. To resolve the more generic 1749 case where multiple RTCPeerConnections are interconnected, then 1750 identification of the media source(s) part of a MediaStreamTrack 1751 being propagated across multiple interconnected RTCPeerConnection 1752 needs to be preserved across these interconnections. 1754 12.2.3. Media Synchronisation Context 1756 When an end-point sends media from more than one media source, it 1757 needs to consider if (and which of) these media sources are to be 1758 synchronized. In RTP/RTCP, synchronisation is provided by having a 1759 set of RTP packet streams be indicated as coming from the same 1760 synchronisation context and logical end-point by using the same RTCP 1761 CNAME identifier. 1763 The next provision is that the internal clocks of all media sources, 1764 i.e., what drives the RTP timestamp, can be correlated to a system 1765 clock that is provided in RTCP Sender Reports encoded in an NTP 1766 format. By correlating all RTP timestamps to a common system clock 1767 for all sources, the timing relation of the different RTP packet 1768 streams, also across multiple RTP sessions can be derived at the 1769 receiver and, if desired, the streams can be synchronized. The 1770 requirement is for the media sender to provide the correlation 1771 information; it is up to the receiver to use it or not. 1773 13. Security Considerations 1775 The overall security architecture for WebRTC is described in 1776 [I-D.ietf-rtcweb-security-arch], and security considerations for the 1777 WebRTC framework are described in [I-D.ietf-rtcweb-security]. These 1778 considerations also apply to this memo. 1780 The security considerations of the RTP specification, the RTP/SAVPF 1781 profile, and the various RTP/RTCP extensions and RTP payload formats 1782 that form the complete protocol suite described in this memo apply. 1783 We do not believe there are any new security considerations resulting 1784 from the combination of these various protocol extensions. 1786 The Extended Secure RTP Profile for Real-time Transport Control 1787 Protocol (RTCP)-Based Feedback [RFC5124] (RTP/SAVPF) provides 1788 handling of fundamental issues by offering confidentiality, integrity 1789 and partial source authentication. A mandatory to implement media 1790 security solution is created by combing this secured RTP profile and 1791 DTLS-SRTP keying [RFC5764] as defined by Section 5.5 of 1792 [I-D.ietf-rtcweb-security-arch]. 1794 RTCP packets convey a Canonical Name (CNAME) identifier that is used 1795 to associate RTP packet streams that need to be synchronised across 1796 related RTP sessions. Inappropriate choice of CNAME values can be a 1797 privacy concern, since long-term persistent CNAME identifiers can be 1798 used to track users across multiple WebRTC calls. Section 4.9 of 1799 this memo provides guidelines for generation of untraceable CNAME 1800 values that alleviate this risk. 1802 The guidelines in [RFC6562] apply when using variable bit rate (VBR) 1803 audio codecs such as Opus (see Section 4.3 for discussion of mandated 1804 audio codecs). The guidelines in [RFC6562] also apply, but are of 1805 lesser importance, when using the client-to-mixer audio level header 1806 extensions (Section 5.2.2) or the mixer-to-client audio level header 1807 extensions (Section 5.2.3). The use of the encryption of the header 1808 extensions are RECOMMENDED, unless there are known reasons, like RTP 1809 middleboxes or third party monitoring that will greatly benefit from 1810 the information, and this has been expressed using API or signalling. 1812 If further evidence are produced to show that information leakage is 1813 significant from audio level indications, then use of encryption 1814 needs to be mandated at that time. 1816 14. IANA Considerations 1818 This memo makes no request of IANA. 1820 Note to RFC Editor: this section is to be removed on publication as 1821 an RFC. 1823 15. Acknowledgements 1825 The authors would like to thank Bernard Aboba, Harald Alvestrand, 1826 Cary Bran, Charles Eckel, Christian Groves, Cullen Jennings, Dan 1827 Romascanu, Martin Thomson, and the other members of the IETF RTCWEB 1828 working group for their valuable feedback. 1830 16. References 1832 16.1. Normative References 1834 [I-D.ietf-avtcore-multi-media-rtp-session] 1835 Westerlund, M., Perkins, C., and J. Lennox, "Sending 1836 Multiple Types of Media in a Single RTP Session", draft- 1837 ietf-avtcore-multi-media-rtp-session-05 (work in 1838 progress), February 2014. 1840 [I-D.ietf-avtcore-rtp-circuit-breakers] 1841 Perkins, C. and V. Singh, "Multimedia Congestion Control: 1842 Circuit Breakers for Unicast RTP Sessions", draft-ietf- 1843 avtcore-rtp-circuit-breakers-05 (work in progress), 1844 February 2014. 1846 [I-D.ietf-avtcore-rtp-multi-stream-optimisation] 1847 Lennox, J., Westerlund, M., Wu, W., and C. Perkins, 1848 "Sending Multiple Media Streams in a Single RTP Session: 1849 Grouping RTCP Reception Statistics and Other Feedback", 1850 draft-ietf-avtcore-rtp-multi-stream-optimisation-02 (work 1851 in progress), February 2014. 1853 [I-D.ietf-avtcore-rtp-multi-stream] 1854 Lennox, J., Westerlund, M., Wu, W., and C. Perkins, 1855 "Sending Multiple Media Streams in a Single RTP Session", 1856 draft-ietf-avtcore-rtp-multi-stream-03 (work in progress), 1857 February 2014. 1859 [I-D.ietf-rtcweb-security-arch] 1860 Rescorla, E., "WebRTC Security Architecture", draft-ietf- 1861 rtcweb-security-arch-09 (work in progress), February 2014. 1863 [I-D.ietf-rtcweb-security] 1864 Rescorla, E., "Security Considerations for WebRTC", draft- 1865 ietf-rtcweb-security-06 (work in progress), January 2014. 1867 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1868 Requirement Levels", BCP 14, RFC 2119, March 1997. 1870 [RFC2736] Handley, M. and C. Perkins, "Guidelines for Writers of RTP 1871 Payload Format Specifications", BCP 36, RFC 2736, December 1872 1999. 1874 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1875 Jacobson, "RTP: A Transport Protocol for Real-Time 1876 Applications", STD 64, RFC 3550, July 2003. 1878 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1879 Video Conferences with Minimal Control", STD 65, RFC 3551, 1880 July 2003. 1882 [RFC3556] Casner, S., "Session Description Protocol (SDP) Bandwidth 1883 Modifiers for RTP Control Protocol (RTCP) Bandwidth", RFC 1884 3556, July 2003. 1886 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1887 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1888 RFC 3711, March 2004. 1890 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1891 Description Protocol", RFC 4566, July 2006. 1893 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 1894 "Extended RTP Profile for Real-time Transport Control 1895 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 1896 2006. 1898 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 1899 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 1900 July 2006. 1902 [RFC4961] Wing, D., "Symmetric RTP / RTP Control Protocol (RTCP)", 1903 BCP 131, RFC 4961, July 2007. 1905 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 1906 "Codec Control Messages in the RTP Audio-Visual Profile 1907 with Feedback (AVPF)", RFC 5104, February 2008. 1909 [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for 1910 Real-time Transport Control Protocol (RTCP)-Based Feedback 1911 (RTP/SAVPF)", RFC 5124, February 2008. 1913 [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP 1914 Header Extensions", RFC 5285, July 2008. 1916 [RFC5506] Johansson, I. and M. Westerlund, "Support for Reduced-Size 1917 Real-Time Transport Control Protocol (RTCP): Opportunities 1918 and Consequences", RFC 5506, April 2009. 1920 [RFC5761] Perkins, C. and M. Westerlund, "Multiplexing RTP Data and 1921 Control Packets on a Single Port", RFC 5761, April 2010. 1923 [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer 1924 Security (DTLS) Extension to Establish Keys for the Secure 1925 Real-time Transport Protocol (SRTP)", RFC 5764, May 2010. 1927 [RFC6051] Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP 1928 Flows", RFC 6051, November 2010. 1930 [RFC6464] Lennox, J., Ivov, E., and E. Marocco, "A Real-time 1931 Transport Protocol (RTP) Header Extension for Client-to- 1932 Mixer Audio Level Indication", RFC 6464, December 2011. 1934 [RFC6465] Ivov, E., Marocco, E., and J. Lennox, "A Real-time 1935 Transport Protocol (RTP) Header Extension for Mixer-to- 1936 Client Audio Level Indication", RFC 6465, December 2011. 1938 [RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of 1939 Variable Bit Rate Audio with Secure RTP", RFC 6562, March 1940 2012. 1942 [RFC6904] Lennox, J., "Encryption of Header Extensions in the Secure 1943 Real-time Transport Protocol (SRTP)", RFC 6904, April 1944 2013. 1946 [RFC7007] Terriberry, T., "Update to Remove DVI4 from the 1947 Recommended Codecs for the RTP Profile for Audio and Video 1948 Conferences with Minimal Control (RTP/AVP)", RFC 7007, 1949 August 2013. 1951 [RFC7022] Begen, A., Perkins, C., Wing, D., and E. Rescorla, 1952 "Guidelines for Choosing RTP Control Protocol (RTCP) 1953 Canonical Names (CNAMEs)", RFC 7022, September 2013. 1955 [RFC7160] Petit-Huguenin, M. and G. Zorn, "Support for Multiple 1956 Clock Rates in an RTP Session", RFC 7160, April 2014. 1958 [RFC7164] Gross, K. and R. Brandenburg, "RTP and Leap Seconds", RFC 1959 7164, March 2014. 1961 [W3C.WD-mediacapture-streams-20130903] 1962 Burnett, D., Bergkvist, A., Jennings, C., and A. 1963 Narayanan, "Media Capture and Streams", World Wide Web 1964 Consortium WD WD-mediacapture-streams-20130903, September 1965 2013, . 1968 [W3C.WD-webrtc-20130910] 1969 Bergkvist, A., Burnett, D., Jennings, C., and A. 1970 Narayanan, "WebRTC 1.0: Real-time Communication Between 1971 Browsers", World Wide Web Consortium WD WD- 1972 webrtc-20130910, September 2013, 1973 . 1975 16.2. Informative References 1977 [I-D.ietf-avtcore-multiplex-guidelines] 1978 Westerlund, M., Perkins, C., and H. Alvestrand, 1979 "Guidelines for using the Multiplexing Features of RTP to 1980 Support Multiple Media Streams", draft-ietf-avtcore- 1981 multiplex-guidelines-02 (work in progress), January 2014. 1983 [I-D.ietf-avtcore-rtp-topologies-update] 1984 Westerlund, M. and S. Wenger, "RTP Topologies", draft- 1985 ietf-avtcore-rtp-topologies-update-01 (work in progress), 1986 October 2013. 1988 [I-D.ietf-avtext-rtp-grouping-taxonomy] 1989 Lennox, J., Gross, K., Nandakumar, S., and G. Salgueiro, 1990 "A Taxonomy of Grouping Semantics and Mechanisms for Real- 1991 Time Transport Protocol (RTP) Sources", draft-ietf-avtext- 1992 rtp-grouping-taxonomy-01 (work in progress), February 1993 2014. 1995 [I-D.ietf-mmusic-msid] 1996 Alvestrand, H., "WebRTC MediaStream Identification in the 1997 Session Description Protocol", draft-ietf-mmusic-msid-05 1998 (work in progress), March 2014. 2000 [I-D.ietf-mmusic-sdp-bundle-negotiation] 2001 Holmberg, C., Alvestrand, H., and C. Jennings, 2002 "Negotiating Media Multiplexing Using the Session 2003 Description Protocol (SDP)", draft-ietf-mmusic-sdp-bundle- 2004 negotiation-07 (work in progress), April 2014. 2006 [I-D.ietf-payload-rtp-howto] 2007 Westerlund, M., "How to Write an RTP Payload Format", 2008 draft-ietf-payload-rtp-howto-13 (work in progress), 2009 January 2014. 2011 [I-D.ietf-rmcat-cc-requirements] 2012 Jesup, R., "Congestion Control Requirements For RMCAT", 2013 draft-ietf-rmcat-cc-requirements-04 (work in progress), 2014 April 2014. 2016 [I-D.ietf-rtcweb-audio] 2017 Valin, J. and C. Bran, "WebRTC Audio Codec and Processing 2018 Requirements", draft-ietf-rtcweb-audio-05 (work in 2019 progress), February 2014. 2021 [I-D.ietf-rtcweb-overview] 2022 Alvestrand, H., "Overview: Real Time Protocols for Brower- 2023 based Applications", draft-ietf-rtcweb-overview-09 (work 2024 in progress), February 2014. 2026 [I-D.ietf-rtcweb-use-cases-and-requirements] 2027 Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- 2028 Time Communication Use-cases and Requirements", draft- 2029 ietf-rtcweb-use-cases-and-requirements-14 (work in 2030 progress), February 2014. 2032 [I-D.ietf-tsvwg-rtcweb-qos] 2033 Dhesikan, S., Druta, D., Jones, P., and J. Polk, "DSCP and 2034 other packet markings for RTCWeb QoS", draft-ietf-tsvwg- 2035 rtcweb-qos-00 (work in progress), April 2014. 2037 [RFC3611] Friedman, T., Caceres, R., and A. Clark, "RTP Control 2038 Protocol Extended Reports (RTCP XR)", RFC 3611, November 2039 2003. 2041 [RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion 2042 Control Protocol (DCCP) Congestion Control ID 2: TCP-like 2043 Congestion Control", RFC 4341, March 2006. 2045 [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for 2046 Datagram Congestion Control Protocol (DCCP) Congestion 2047 Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, 2048 March 2006. 2050 [RFC4383] Baugher, M. and E. Carrara, "The Use of Timed Efficient 2051 Stream Loss-Tolerant Authentication (TESLA) in the Secure 2052 Real-time Transport Protocol (SRTP)", RFC 4383, February 2053 2006. 2055 [RFC4828] Floyd, S. and E. Kohler, "TCP Friendly Rate Control 2056 (TFRC): The Small-Packet (SP) Variant", RFC 4828, April 2057 2007. 2059 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 2060 Friendly Rate Control (TFRC): Protocol Specification", RFC 2061 5348, September 2008. 2063 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 2064 Media Attributes in the Session Description Protocol 2065 (SDP)", RFC 5576, June 2009. 2067 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 2068 Control", RFC 5681, September 2009. 2070 [RFC5968] Ott, J. and C. Perkins, "Guidelines for Extending the RTP 2071 Control Protocol (RTCP)", RFC 5968, September 2010. 2073 [RFC6263] Marjou, X. and A. Sollaud, "Application Mechanism for 2074 Keeping Alive the NAT Mappings Associated with RTP / RTP 2075 Control Protocol (RTCP) Flows", RFC 6263, June 2011. 2077 [RFC6792] Wu, Q., Hunt, G., and P. Arden, "Guidelines for Use of the 2078 RTP Monitoring Framework", RFC 6792, November 2012. 2080 Authors' Addresses 2082 Colin Perkins 2083 University of Glasgow 2084 School of Computing Science 2085 Glasgow G12 8QQ 2086 United Kingdom 2088 Email: csp@csperkins.org 2089 URI: http://csperkins.org/ 2090 Magnus Westerlund 2091 Ericsson 2092 Farogatan 6 2093 SE-164 80 Kista 2094 Sweden 2096 Phone: +46 10 714 82 87 2097 Email: magnus.westerlund@ericsson.com 2099 Joerg Ott 2100 Aalto University 2101 School of Electrical Engineering 2102 Espoo 02150 2103 Finland 2105 Email: jorg.ott@aalto.fi