idnits 2.17.1 draft-ietf-rtcweb-rtp-usage-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 16, 2013) is 3777 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-13) exists of draft-ietf-avtcore-multi-media-rtp-session-03 == Outdated reference: A later version (-18) exists of draft-ietf-avtcore-rtp-circuit-breakers-03 == Outdated reference: A later version (-12) exists of draft-ietf-avtcore-rtp-multi-stream-optimisation-00 == Outdated reference: A later version (-11) exists of draft-ietf-avtcore-rtp-multi-stream-01 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-05 == Outdated reference: A later version (-20) exists of draft-ietf-rtcweb-security-arch-07 == Outdated reference: A later version (-12) exists of draft-ietf-rtcweb-security-05 ** Obsolete normative reference: RFC 5285 (Obsoleted by RFC 8285) == Outdated reference: A later version (-07) exists of draft-dhesikan-tsvwg-rtcweb-qos-03 == Outdated reference: A later version (-12) exists of draft-ietf-avtcore-multiplex-guidelines-01 == Outdated reference: A later version (-10) exists of draft-ietf-avtcore-rtp-topologies-update-01 == Outdated reference: A later version (-17) exists of draft-ietf-mmusic-msid-02 == Outdated reference: A later version (-19) exists of draft-ietf-rtcweb-overview-08 == Outdated reference: A later version (-16) exists of draft-ietf-rtcweb-use-cases-and-requirements-12 Summary: 1 error (**), 0 flaws (~~), 14 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 RTCWEB Working Group C. Perkins 3 Internet-Draft University of Glasgow 4 Intended status: Standards Track M. Westerlund 5 Expires: June 19, 2014 Ericsson 6 J. Ott 7 Aalto University 8 December 16, 2013 10 Web Real-Time Communication (WebRTC): Media Transport and Use of RTP 11 draft-ietf-rtcweb-rtp-usage-11 13 Abstract 15 The Web Real-Time Communication (WebRTC) framework provides support 16 for direct interactive rich communication using audio, video, text, 17 collaboration, games, etc. between two peers' web-browsers. This 18 memo describes the media transport aspects of the WebRTC framework. 19 It specifies how the Real-time Transport Protocol (RTP) is used in 20 the WebRTC context, and gives requirements for which RTP features, 21 profiles, and extensions need to be supported. 23 Status of This Memo 25 This Internet-Draft is submitted in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF). Note that other groups may also distribute 30 working documents as Internet-Drafts. The list of current Internet- 31 Drafts is at http://datatracker.ietf.org/drafts/current/. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 This Internet-Draft will expire on June 19, 2014. 40 Copyright Notice 42 Copyright (c) 2013 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents 47 (http://trustee.ietf.org/license-info) in effect on the date of 48 publication of this document. Please review these documents 49 carefully, as they describe your rights and restrictions with respect 50 to this document. Code Components extracted from this document must 51 include Simplified BSD License text as described in Section 4.e of 52 the Trust Legal Provisions and are provided without warranty as 53 described in the Simplified BSD License. 55 Table of Contents 57 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2. Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 4. WebRTC Use of RTP: Core Protocols . . . . . . . . . . . . . . 5 61 4.1. RTP and RTCP . . . . . . . . . . . . . . . . . . . . . . 5 62 4.2. Choice of the RTP Profile . . . . . . . . . . . . . . . . 6 63 4.3. Choice of RTP Payload Formats . . . . . . . . . . . . . . 7 64 4.4. Use of RTP Sessions . . . . . . . . . . . . . . . . . . . 8 65 4.5. RTP and RTCP Multiplexing . . . . . . . . . . . . . . . . 9 66 4.6. Reduced Size RTCP . . . . . . . . . . . . . . . . . . . . 10 67 4.7. Symmetric RTP/RTCP . . . . . . . . . . . . . . . . . . . 10 68 4.8. Choice of RTP Synchronisation Source (SSRC) . . . . . . . 10 69 4.9. Generation of the RTCP Canonical Name (CNAME) . . . . . . 11 70 5. WebRTC Use of RTP: Extensions . . . . . . . . . . . . . . . . 12 71 5.1. Conferencing Extensions . . . . . . . . . . . . . . . . . 12 72 5.1.1. Full Intra Request (FIR) . . . . . . . . . . . . . . 13 73 5.1.2. Picture Loss Indication (PLI) . . . . . . . . . . . . 13 74 5.1.3. Slice Loss Indication (SLI) . . . . . . . . . . . . . 13 75 5.1.4. Reference Picture Selection Indication (RPSI) . . . . 13 76 5.1.5. Temporal-Spatial Trade-off Request (TSTR) . . . . . . 14 77 5.1.6. Temporary Maximum Media Stream Bit Rate Request 78 (TMMBR) . . . . . . . . . . . . . . . . . . . . . . . 14 79 5.2. Header Extensions . . . . . . . . . . . . . . . . . . . . 14 80 5.2.1. Rapid Synchronisation . . . . . . . . . . . . . . . . 15 81 5.2.2. Client-to-Mixer Audio Level . . . . . . . . . . . . . 15 82 5.2.3. Mixer-to-Client Audio Level . . . . . . . . . . . . . 15 83 5.2.4. Associating RTP Media Streams and Signalling Contexts 15 84 6. WebRTC Use of RTP: Improving Transport Robustness . . . . . . 16 85 6.1. Negative Acknowledgements and RTP Retransmission . . . . 16 86 6.2. Forward Error Correction (FEC) . . . . . . . . . . . . . 17 87 7. WebRTC Use of RTP: Rate Control and Media Adaptation . . . . 17 88 7.1. Boundary Conditions and Circuit Breakers . . . . . . . . 18 89 7.2. RTCP Limitations for Congestion Control . . . . . . . . . 19 90 7.3. Congestion Control Interoperability and Legacy Systems . 19 91 8. WebRTC Use of RTP: Performance Monitoring . . . . . . . . . . 20 92 9. WebRTC Use of RTP: Future Extensions . . . . . . . . . . . . 21 93 10. Signalling Considerations . . . . . . . . . . . . . . . . . . 21 94 11. WebRTC API Considerations . . . . . . . . . . . . . . . . . . 23 95 12. RTP Implementation Considerations . . . . . . . . . . . . . . 25 96 12.1. Configuration and Use of RTP Sessions . . . . . . . . . 25 97 12.1.1. Use of Multiple Media Flows Within an RTP Session . 25 98 12.1.2. Use of Multiple RTP Sessions . . . . . . . . . . . . 27 99 12.1.3. Differentiated Treatment of Flows . . . . . . . . . 31 100 12.2. Source, Flow, and Participant Identification . . . . . . 32 101 12.2.1. Media Streams . . . . . . . . . . . . . . . . . . . 33 102 12.2.2. Media Streams: SSRC Collision Detection . . . . . . 33 103 12.2.3. Media Synchronisation Context . . . . . . . . . . . 34 104 13. Security Considerations . . . . . . . . . . . . . . . . . . . 35 105 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 35 106 15. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 36 107 16. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 36 108 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 36 109 17.1. Normative References . . . . . . . . . . . . . . . . . . 36 110 17.2. Informative References . . . . . . . . . . . . . . . . . 39 111 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 41 113 1. Introduction 115 The Real-time Transport Protocol (RTP) [RFC3550] provides a framework 116 for delivery of audio and video teleconferencing data and other real- 117 time media applications. Previous work has defined the RTP protocol, 118 along with numerous profiles, payload formats, and other extensions. 119 When combined with appropriate signalling, these form the basis for 120 many teleconferencing systems. 122 The Web Real-Time communication (WebRTC) framework provides the 123 protocol building blocks to support direct, interactive, real-time 124 communication using audio, video, collaboration, games, etc., between 125 two peers' web-browsers. This memo describes how the RTP framework 126 is to be used in the WebRTC context. It proposes a baseline set of 127 RTP features that are to be implemented by all WebRTC-aware end- 128 points, along with suggested extensions for enhanced functionality. 130 This memo specifies a protocol intended for use within the WebRTC 131 framework, but is not restricted to that context. An overview of the 132 WebRTC framework is given in [I-D.ietf-rtcweb-overview]. 134 The structure of this memo is as follows. Section 2 outlines our 135 rationale in preparing this memo and choosing these RTP features. 136 Section 3 defines terminology. Requirements for core RTP protocols 137 are described in Section 4 and suggested RTP extensions are described 138 in Section 5. Section 6 outlines mechanisms that can increase 139 robustness to network problems, while Section 7 describes congestion 140 control and rate adaptation mechanisms. The discussion of mandated 141 RTP mechanisms concludes in Section 8 with a review of performance 142 monitoring and network management tools that can be used in the 143 WebRTC context. Section 9 gives some guidelines for future 144 incorporation of other RTP and RTP Control Protocol (RTCP) extensions 145 into this framework. Section 10 describes requirements placed on the 146 signalling channel. Section 11 discusses the relationship between 147 features of the RTP framework and the WebRTC application programming 148 interface (API), and Section 12 discusses RTP implementation 149 considerations. The memo concludes with security considerations 150 (Section 13) and IANA considerations (Section 14). 152 2. Rationale 154 The RTP framework comprises the RTP data transfer protocol, the RTP 155 control protocol, and numerous RTP payload formats, profiles, and 156 extensions. This range of add-ons has allowed RTP to meet various 157 needs that were not envisaged by the original protocol designers, and 158 to support many new media encodings, but raises the question of what 159 extensions are to be supported by new implementations. The 160 development of the WebRTC framework provides an opportunity for us to 161 review the available RTP features and extensions, and to define a 162 common baseline feature set for all WebRTC implementations of RTP. 163 This builds on the past 20 years development of RTP to mandate the 164 use of extensions that have shown widespread utility, while still 165 remaining compatible with the wide installed base of RTP 166 implementations where possible. 168 Other RTP and RTCP extensions not discussed in this document can be 169 implemented by WebRTC end-points if they are beneficial for new use 170 cases. However, they are not necessary to address the WebRTC use 171 cases and requirements identified to date 172 [I-D.ietf-rtcweb-use-cases-and-requirements]. 174 While the baseline set of RTP features and extensions defined in this 175 memo is targeted at the requirements of the WebRTC framework, it is 176 expected to be broadly useful for other conferencing-related uses of 177 RTP. In particular, it is likely that this set of RTP features and 178 extensions will be appropriate for other desktop or mobile video 179 conferencing systems, or for room-based high-quality telepresence 180 applications. 182 3. Terminology 184 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 185 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 186 document are to be interpreted as described in [RFC2119]. The RFC 187 2119 interpretation of these key words applies only when written in 188 ALL CAPS. Lower- or mixed-case uses of these key words are not to be 189 interpreted as carrying special significance in this memo. 191 We define the following terms: 193 RTP Media Stream: A sequence of RTP packets, and associated RTCP 194 packets, using a single synchronisation source (SSRC) that 195 together carries part or all of the content of a specific Media 196 Type from a specific sender source within a given RTP session. 198 RTP Session: As defined by [RFC3550], the endpoints belonging to the 199 same RTP Session are those that share a single SSRC space. That 200 is, those endpoints can see an SSRC identifier transmitted by any 201 one of the other endpoints. An endpoint can see an SSRC either 202 directly in RTP and RTCP packets, or as a contributing source 203 (CSRC) in RTP packets from a mixer. The RTP Session scope is 204 hence decided by the endpoints' network interconnection topology, 205 in combination with RTP and RTCP forwarding strategies deployed by 206 endpoints and any interconnecting middle nodes. 208 WebRTC MediaStream: The MediaStream concept defined by the W3C in 209 the API. 211 Other terms are used according to their definitions from the RTP 212 Specification [RFC3550]. 214 4. WebRTC Use of RTP: Core Protocols 216 The following sections describe the core features of RTP and RTCP 217 that need to be implemented, along with the mandated RTP profiles and 218 payload formats. Also described are the core extensions providing 219 essential features that all WebRTC implementations need to implement 220 to function effectively on today's networks. 222 4.1. RTP and RTCP 224 The Real-time Transport Protocol (RTP) [RFC3550] is REQUIRED to be 225 implemented as the media transport protocol for WebRTC. RTP itself 226 comprises two parts: the RTP data transfer protocol, and the RTP 227 control protocol (RTCP). RTCP is a fundamental and integral part of 228 RTP, and MUST be implemented in all WebRTC applications. 230 The following RTP and RTCP features are sometimes omitted in limited 231 functionality implementations of RTP, but are REQUIRED in all WebRTC 232 implementations: 234 o Support for use of multiple simultaneous SSRC values in a single 235 RTP session, including support for RTP end-points that send many 236 SSRC values simultaneously, following [RFC3550] and 237 [I-D.ietf-avtcore-rtp-multi-stream]. Support for the RTCP 238 optimisations for multi-SSRC sessions defined in 239 [I-D.ietf-avtcore-rtp-multi-stream-optimisation] is RECOMMENDED. 241 o Random choice of SSRC on joining a session; collision detection 242 and resolution for SSRC values (see also Section 4.8). 244 o Support for reception of RTP data packets containing CSRC lists, 245 as generated by RTP mixers, and RTCP packets relating to CSRCs. 247 o Sending correct synchronisation information in the RTCP Sender 248 Reports, to allow receivers to implement lip-sync, with support 249 for the rapid RTP synchronisation extensions (see Section 5.2.1) 250 being RECOMMENDED. 252 o Support for multiple synchronisation contexts. Participants that 253 send multiple simultaneous RTP media streams MAY do so as part of 254 a single synchronisation context, using a single RTCP CNAME for 255 all streams and allowing receivers to play the streams out in a 256 synchronised manner, or they MAY use different synchronisation 257 contexts, and hence different RTCP CNAMEs, for some or all of the 258 streams. Receivers MUST support reception of multiple RTCP CNAMEs 259 from each participant in an RTP session. See also Section 4.9. 261 o Support for sending and receiving RTCP SR, RR, SDES, and BYE 262 packet types, with OPTIONAL support for other RTCP packet types; 263 implementations MUST ignore unknown RTCP packet types. Note that 264 additional RTCP Packet types are needed by the RTP/SAVPF Profile 265 (Section 4.2) and the other RTCP extensions (Section 5). 267 o Support for multiple end-points in a single RTP session, and for 268 scaling the RTCP transmission interval according to the number of 269 participants in the session; support for randomised RTCP 270 transmission intervals to avoid synchronisation of RTCP reports; 271 support for RTCP timer reconsideration. 273 o Support for configuring the RTCP bandwidth as a fraction of the 274 media bandwidth, and for configuring the fraction of the RTCP 275 bandwidth allocated to senders, e.g., using the SDP "b=" line. 277 It is known that a significant number of legacy RTP implementations, 278 especially those targeted at VoIP-only systems, do not support all of 279 the above features, and in some cases do not support RTCP at all. 280 Implementers are advised to consider the requirements for graceful 281 degradation when interoperating with legacy implementations. 283 Other implementation considerations are discussed in Section 12. 285 4.2. Choice of the RTP Profile 287 The complete specification of RTP for a particular application domain 288 requires the choice of an RTP Profile. For WebRTC use, the Extended 289 Secure RTP Profile for RTCP-Based Feedback (RTP/SAVPF) [RFC5124], as 290 extended by [RFC7007], MUST be implemented. This builds on the basic 291 RTP/AVP profile [RFC3551], the RTP profile for RTCP-based feedback 292 (RTP/AVPF) [RFC4585], and the secure RTP profile (RTP/SAVP) 293 [RFC3711]. 295 The RTCP-based feedback extensions [RFC4585] are needed for the 296 improved RTCP timer model, that allows more flexible transmission of 297 RTCP packets in response to events, rather than strictly according to 298 bandwidth. This is vital for being able to report congestion events. 299 These extensions also save RTCP bandwidth, and will commonly only use 300 the full RTCP bandwidth allocation if there are many events that 301 require feedback. They are also needed to make use of the RTP 302 conferencing extensions discussed in Section 5.1. 304 Note: The enhanced RTCP timer model defined in the RTP/AVPF 305 profile is backwards compatible with legacy systems that implement 306 only the base RTP/AVP profile, given some constraints on parameter 307 configuration such as the RTCP bandwidth value and "trr-int" (the 308 most important factor for interworking with RTP/AVP end-points via 309 a gateway is to set the trr-int parameter to a value representing 310 4 seconds). 312 The secure RTP profile [RFC3711] is needed to provide media 313 encryption, integrity protection, replay protection and a limited 314 form of source authentication. WebRTC implementations MUST NOT send 315 packets using the basic RTP/AVP profile or the RTP/AVPF profile; they 316 MUST employ the full RTP/SAVPF profile to protect all RTP and RTCP 317 packets that are generated. The default and mandatory to implement 318 transforms listed in Section 5 of [RFC3711] SHALL apply. 320 The keying mechanism(s) to be used with the RTP/SAVPF profile are 321 defined in Section 5.5 of [I-D.ietf-rtcweb-security-arch] or its 322 replacement. 324 4.3. Choice of RTP Payload Formats 326 The set of mandatory to implement codecs and RTP payload formats for 327 WebRTC is not specified in this memo. Implementations can support 328 any codec for which an RTP payload format and associated signalling 329 is defined. Implementation cannot assume that the other participants 330 in an RTP session understand any RTP payload format, no matter how 331 common; the mapping between RTP payload type numbers and specific 332 configurations of particular RTP payload formats MUST be agreed 333 before those payload types/formats can be used. In an SDP context, 334 this can be done using the "a=rtpmap:" and "a=fmtp:" attributes 335 associated with an "m=" line. 337 Endpoints can signal support for multiple RTP payload formats, or 338 multiple configurations of a single RTP payload format, as long as 339 each unique RTP payload format configuration uses a different RTP 340 payload type number. As outlined in Section 4.8, the RTP payload 341 type number is sometimes used to associate an RTP media stream with a 342 signalling context. This association is possible provided unique RTP 343 payload type numbers are used in each context. For example, an RTP 344 media stream can be associated with an SDP "m=" line by comparing the 345 RTP payload type numbers used by the media stream with payload types 346 signalled in the "a=rtpmap:" lines in the media sections of the SDP. 347 If RTP media streams are being associated with signalling contexts 348 based on the RTP payload type, then the assignment of RTP payload 349 type numbers MUST be unique across signalling contexts; if the same 350 RTP payload format configuration is used in multiple contexts, then a 351 different RTP payload type number has to be assigned in each context 352 to ensure uniqueness. If the RTP payload type number is not being 353 used to associated RTP media streams with a signalling context, then 354 the same RTP payload type number can be used to indicate the exact 355 same RTP payload format configuration in multiple contexts. 357 An endpoint that has signalled support for multiple RTP payload 358 formats SHOULD accept data in any of those payload formats at any 359 time, unless it has previously signalled limitations on its decoding 360 capability. This requirement is constrained if several types of 361 media (e.g., audio and video) are sent in the same RTP session. In 362 such a case, a source (SSRC) is restricted to switching only between 363 the RTP payload formats signalled for the type of media that is being 364 sent by that source; see Section 4.4. To support rapid rate 365 adaptation by changing codec, RTP does not require advance signalling 366 for changes between RTP payload formats that were signalled during 367 session set-up. 369 An RTP sender that changes between two RTP payload types that use 370 different RTP clock rates MUST follow the recommendations in 371 Section 4.1 of [I-D.ietf-avtext-multiple-clock-rates]. RTP receivers 372 MUST follow the recommendations in Section 4.3 of 373 [I-D.ietf-avtext-multiple-clock-rates], in order to support sources 374 that switch between clock rates in an RTP session (these 375 recommendations for receivers are backwards compatible with the case 376 where senders use only a single clock rate). 378 4.4. Use of RTP Sessions 380 An association amongst a set of participants communicating using RTP 381 is known as an RTP session. A participant can be involved in several 382 RTP sessions at the same time. In a multimedia session, each type of 383 media has typically been carried in a separate RTP session (e.g., 384 using one RTP session for the audio, and a separate RTP session using 385 different transport addresses for the video). WebRTC implementations 386 of RTP are REQUIRED to implement support for multimedia sessions in 387 this way, separating each session using different transport-layer 388 addresses (e.g., different UDP ports) for compatibility with legacy 389 systems. 391 In modern day networks, however, with the widespread use of network 392 address/port translators (NAT/NAPT) and firewalls, it is desirable to 393 reduce the number of transport-layer flows used by RTP applications. 394 This can be done by sending all the RTP media streams in a single RTP 395 session, which will comprise a single transport-layer flow (this will 396 prevent the use of some quality-of-service mechanisms, as discussed 397 in Section 12.1.3). Implementations are REQUIRED to support 398 transport of all RTP media streams, independent of media type, in a 399 single RTP session according to 400 [I-D.ietf-avtcore-multi-media-rtp-session]. If multiple types of 401 media are to be used in a single RTP session, all participants in 402 that session MUST agree to this usage. In an SDP context, 403 [I-D.ietf-mmusic-sdp-bundle-negotiation] can be used to signal this. 405 It is also possible to use a shim-based approach to run multiple RTP 406 sessions on a single transport-layer flow. This gives advantages in 407 some gateway scenarios, and makes it easy to distinguish groups of 408 RTP media streams that might need distinct processing. One way of 409 doing this is described in 410 [I-D.westerlund-avtcore-transport-multiplexing]. At the time of this 411 writing, there is no consensus to use a shim-based approach in WebRTC 412 implementations. 414 Further discussion about when different RTP session structures and 415 multiplexing methods are suitable can be found in 416 [I-D.ietf-avtcore-multiplex-guidelines]. 418 4.5. RTP and RTCP Multiplexing 420 Historically, RTP and RTCP have been run on separate transport layer 421 addresses (e.g., two UDP ports for each RTP session, one port for RTP 422 and one port for RTCP). With the increased use of Network Address/ 423 Port Translation (NAPT) this has become problematic, since 424 maintaining multiple NAT bindings can be costly. It also complicates 425 firewall administration, since multiple ports need to be opened to 426 allow RTP traffic. To reduce these costs and session set-up times, 427 support for multiplexing RTP data packets and RTCP control packets on 428 a single port for each RTP session is REQUIRED, as specified in 429 [RFC5761]. For backwards compatibility, implementations are also 430 REQUIRED to support RTP and RTCP sent on separate transport-layer 431 addresses. 433 Note that the use of RTP and RTCP multiplexed onto a single transport 434 port ensures that there is occasional traffic sent on that port, even 435 if there is no active media traffic. This can be useful to keep NAT 436 bindings alive, and is the recommend method for application level 437 keep-alives of RTP sessions [RFC6263]. 439 4.6. Reduced Size RTCP 441 RTCP packets are usually sent as compound RTCP packets, and [RFC3550] 442 requires that those compound packets start with an Sender Report (SR) 443 or Receiver Report (RR) packet. When using frequent RTCP feedback 444 messages under the RTP/AVPF Profile [RFC4585] these statistics are 445 not needed in every packet, and unnecessarily increase the mean RTCP 446 packet size. This can limit the frequency at which RTCP packets can 447 be sent within the RTCP bandwidth share. 449 To avoid this problem, [RFC5506] specifies how to reduce the mean 450 RTCP message size and allow for more frequent feedback. Frequent 451 feedback, in turn, is essential to make real-time applications 452 quickly aware of changing network conditions, and to allow them to 453 adapt their transmission and encoding behaviour. Support for non- 454 compound RTCP feedback packets [RFC5506] is REQUIRED, but MUST be 455 negotiated using the signalling channel before use. For backwards 456 compatibility, implementations are also REQUIRED to support the use 457 of compound RTCP feedback packets if the remote endpoint does not 458 agree to the use of non-compound RTCP in the signalling exchange. 460 4.7. Symmetric RTP/RTCP 462 To ease traversal of NAT and firewall devices, implementations are 463 REQUIRED to implement and use Symmetric RTP [RFC4961]. The reasons 464 for using symmetric RTP is primarily to avoid issues with NAT and 465 Firewalls by ensuring that the flow is actually bi-directional and 466 thus kept alive and registered as flow the intended recipient 467 actually wants. In addition, it saves resources, specifically ports 468 at the end-points, but also in the network as NAT mappings or 469 firewall state is not unnecessary bloated. Also the amount of QoS 470 state is reduced. 472 4.8. Choice of RTP Synchronisation Source (SSRC) 474 Implementations are REQUIRED to support signalled RTP synchronisation 475 source (SSRC) identifiers, using the "a=ssrc:" SDP attribute defined 476 in Section 4.1 and Section 5 of [RFC5576]. Implementations MUST also 477 support the "previous-ssrc" source attribute defined in Section 6.2 478 of [RFC5576]. Other per-SSRC attributes defined in [RFC5576] MAY be 479 supported. 481 Use of the "a=ssrc:" attribute to signal SSRC identifiers in an RTP 482 session is OPTIONAL. Implementations MUST be prepared to accept RTP 483 and RTCP packets using SSRCs that have not been explicitly signalled 484 ahead of time. Implementations MUST support random SSRC assignment, 485 and MUST support SSRC collision detection and resolution, according 486 to [RFC3550]. When using signalled SSRC values, collision detection 487 MUST be performed as described in Section 5 of [RFC5576]. 489 It is often desirable to associate an RTP media stream with a non-RTP 490 context (e.g., to associate an RTP media stream with an "m=" line in 491 a session description formatted using SDP). If SSRCs are signalled 492 this is straightforward (in SDP the "a=ssrc:" line will be at the 493 media level, allowing a direct association with an "m=" line). If 494 SSRCs are not signalled, the RTP payload type numbers used in an RTP 495 media stream are often sufficient to associate that media stream with 496 a signalling context (e.g., if RTP payload type numbers are assigned 497 as described in Section 4.3 of this memo, the RTP payload types used 498 by an RTP media stream can be compared with values in SDP "a=rtpmap:" 499 lines, which are at the media level in SDP, and so map to an "m=" 500 line). 502 4.9. Generation of the RTCP Canonical Name (CNAME) 504 The RTCP Canonical Name (CNAME) provides a persistent transport-level 505 identifier for an RTP endpoint. While the Synchronisation Source 506 (SSRC) identifier for an RTP endpoint can change if a collision is 507 detected, or when the RTP application is restarted, its RTCP CNAME is 508 meant to stay unchanged, so that RTP endpoints can be uniquely 509 identified and associated with their RTP media streams within a set 510 of related RTP sessions. For proper functionality, each RTP endpoint 511 needs to have at least one unique RTCP CNAME value. An endpoint MAY 512 have multiple CNAMEs, as the CNAME also identifies a particular 513 synchronisation context, i.e. all SSRC associated with a CNAME share 514 a common reference clock, and if an endpoint have SSRCs associated 515 with different reference clocks it will need to use multiple CNAMEs. 516 This ought not be common, and if possible reference clocks ought to 517 be mapped to each other and one chosen to be used with RTP and RTCP. 519 The RTP specification [RFC3550] includes guidelines for choosing a 520 unique RTP CNAME, but these are not sufficient in the presence of NAT 521 devices. In addition, long-term persistent identifiers can be 522 problematic from a privacy viewpoint. Accordingly, support for 523 generating a short-term persistent RTCP CNAMEs following [RFC7022] is 524 RECOMMENDED. 526 An WebRTC end-point MUST support reception of any CNAME that matches 527 the syntax limitations specified by the RTP specification [RFC3550] 528 and cannot assume that any CNAME will be chosen according to the form 529 suggested above. 531 5. WebRTC Use of RTP: Extensions 533 There are a number of RTP extensions that are either needed to obtain 534 full functionality, or extremely useful to improve on the baseline 535 performance, in the WebRTC application context. One set of these 536 extensions is related to conferencing, while others are more generic 537 in nature. The following subsections describe the various RTP 538 extensions mandated or suggested for use within the WebRTC context. 540 5.1. Conferencing Extensions 542 RTP is inherently a group communication protocol. Groups can be 543 implemented using a centralised server, multi-unicast, or using IP 544 multicast. While IP multicast is popular in IPTV systems, overlay- 545 based topologies dominate in interactive conferencing environments. 546 Such overlay-based topologies typically use one or more central 547 servers to connect end-points in a star or flat tree topology. These 548 central servers can be implemented in a number of ways as discussed 549 in the memo on RTP Topologies 550 [I-D.ietf-avtcore-rtp-topologies-update]. 552 Not all of the possible the overlay-based topologies are suitable for 553 use in the WebRTC environment. Specifically: 555 o The use of video switching MCUs makes the use of RTCP for 556 congestion control and quality of service reports problematic (see 557 Section 3.6.2 of [I-D.ietf-avtcore-rtp-topologies-update]). 559 o The use of content modifying MCUs with RTCP termination breaks RTP 560 loop detection, and prevents receivers from identifying active 561 senders (see section 3.8 of 562 [I-D.ietf-avtcore-rtp-topologies-update]). 564 Accordingly, only Point to Point (Topo-Point-to-Point), Multiple 565 concurrent Point to Point (Mesh) and RTP Mixers (Topo-Mixer) 566 topologies are needed to achieve the use-cases to be supported in 567 WebRTC initially. These RECOMMENDED topologies are expected to be 568 supported by all WebRTC end-points (these topologies require no 569 special RTP-layer support in the end-point if the RTP features 570 mandated in this memo are implemented). 572 The RTP extensions described in Section 5.1.1 to Section 5.1.6 are 573 designed to be used with centralised conferencing, where an RTP 574 middlebox (e.g., a conference bridge) receives a participant's RTP 575 media streams and distributes them to the other participants. These 576 extensions are not necessary for interoperability; an RTP endpoint 577 that does not implement these extensions will work correctly, but 578 might offer poor performance. Support for the listed extensions will 579 greatly improve the quality of experience and, to provide a 580 reasonable baseline quality, some these extensions are mandatory to 581 be supported by WebRTC end-points. 583 The RTCP conferencing extensions are defined in Extended RTP Profile 584 for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/ 585 AVPF) [RFC4585] and the "Codec Control Messages in the RTP Audio- 586 Visual Profile with Feedback (AVPF)" (CCM) [RFC5104] and are fully 587 usable by the Secure variant of this profile (RTP/SAVPF) [RFC5124]. 589 5.1.1. Full Intra Request (FIR) 591 The Full Intra Request is defined in Sections 3.5.1 and 4.3.1 of the 592 Codec Control Messages [RFC5104]. This message is used to make the 593 mixer request a new Intra picture from a participant in the session. 594 This is used when switching between sources to ensure that the 595 receivers can decode the video or other predictive media encoding 596 with long prediction chains. WebRTC senders MUST understand and 597 react to the FIR feedback message since it greatly improves the user 598 experience when using centralised mixer-based conferencing; support 599 for sending the FIR message is OPTIONAL. 601 5.1.2. Picture Loss Indication (PLI) 603 The Picture Loss Indication is defined in Section 6.3.1 of the RTP/ 604 AVPF profile [RFC4585]. It is used by a receiver to tell the sending 605 encoder that it lost the decoder context and would like to have it 606 repaired somehow. This is semantically different from the Full Intra 607 Request above as there could be multiple ways to fulfil the request. 608 WebRTC senders MUST understand and react to this feedback message as 609 a loss tolerance mechanism; receivers MAY send PLI messages. 611 5.1.3. Slice Loss Indication (SLI) 613 The Slice Loss Indicator is defined in Section 6.3.2 of the RTP/AVPF 614 profile [RFC4585]. It is used by a receiver to tell the encoder that 615 it has detected the loss or corruption of one or more consecutive 616 macro blocks, and would like to have these repaired somehow. Support 617 for this feedback message is OPTIONAL as a loss tolerance mechanism. 619 5.1.4. Reference Picture Selection Indication (RPSI) 620 Reference Picture Selection Indication (RPSI) is defined in 621 Section 6.3.3 of the RTP/AVPF profile [RFC4585]. Some video coding 622 standards allow the use of older reference pictures than the most 623 recent one for predictive coding. If such a codec is in used, and if 624 the encoder has learned about a loss of encoder-decoder 625 synchronisation, a known-as-correct reference picture can be used for 626 future coding. The RPSI message allows this to be signalled. 627 Support for RPSI messages is OPTIONAL. 629 5.1.5. Temporal-Spatial Trade-off Request (TSTR) 631 The temporal-spatial trade-off request and notification are defined 632 in Sections 3.5.2 and 4.3.2 of [RFC5104]. This request can be used 633 to ask the video encoder to change the trade-off it makes between 634 temporal and spatial resolution, for example to prefer high spatial 635 image quality but low frame rate. Support for TSTR requests and 636 notifications is OPTIONAL. 638 5.1.6. Temporary Maximum Media Stream Bit Rate Request (TMMBR) 640 This feedback message is defined in Sections 3.5.4 and 4.2.1 of the 641 Codec Control Messages [RFC5104]. This message and its notification 642 message are used by a media receiver to inform the sending party that 643 there is a current limitation on the amount of bandwidth available to 644 this receiver. This can be various reasons for this: for example, an 645 RTP mixer can use this message to limit the media rate of the sender 646 being forwarded by the mixer (without doing media transcoding) to fit 647 the bottlenecks existing towards the other session participants. 648 WebRTC senders are REQUIRED to implement support for TMMBR messages, 649 and MUST follow bandwidth limitations set by a TMMBR message received 650 for their SSRC. The sending of TMMBR requests is OPTIONAL. 652 5.2. Header Extensions 654 The RTP specification [RFC3550] provides the capability to include 655 RTP header extensions containing in-band data, but the format and 656 semantics of the extensions are poorly specified. The use of header 657 extensions is OPTIONAL in the WebRTC context, but if they are used, 658 they MUST be formatted and signalled following the general mechanism 659 for RTP header extensions defined in [RFC5285], since this gives 660 well-defined semantics to RTP header extensions. 662 As noted in [RFC5285], the requirement from the RTP specification 663 that header extensions are "designed so that the header extension may 664 be ignored" [RFC3550] stands. To be specific, header extensions MUST 665 only be used for data that can safely be ignored by the recipient 666 without affecting interoperability, and MUST NOT be used when the 667 presence of the extension has changed the form or nature of the rest 668 of the packet in a way that is not compatible with the way the stream 669 is signalled (e.g., as defined by the payload type). Valid examples 670 might include metadata that is additional to the usual RTP 671 information. 673 5.2.1. Rapid Synchronisation 675 Many RTP sessions require synchronisation between audio, video, and 676 other content. This synchronisation is performed by receivers, using 677 information contained in RTCP SR packets, as described in the RTP 678 specification [RFC3550]. This basic mechanism can be slow, however, 679 so it is RECOMMENDED that the rapid RTP synchronisation extensions 680 described in [RFC6051] be implemented in addition to RTCP SR-based 681 synchronisation. The rapid synchronisation extensions use the 682 general RTP header extension mechanism [RFC5285], which requires 683 signalling, but are otherwise backwards compatible. 685 5.2.2. Client-to-Mixer Audio Level 687 The Client to Mixer Audio Level extension [RFC6464] is an RTP header 688 extension used by a client to inform a mixer about the level of audio 689 activity in the packet to which the header is attached. This enables 690 a central node to make mixing or selection decisions without decoding 691 or detailed inspection of the payload, reducing the complexity in 692 some types of central RTP nodes. It can also save decoding resources 693 in receivers, which can choose to decode only the most relevant RTP 694 media streams based on audio activity levels. 696 The Client-to-Mixer Audio Level [RFC6464] extension is RECOMMENDED to 697 be implemented. If it is implemented, it is REQUIRED that the header 698 extensions are encrypted according to [RFC6904] since the information 699 contained in these header extensions can be considered sensitive. 701 5.2.3. Mixer-to-Client Audio Level 703 The Mixer to Client Audio Level header extension [RFC6465] provides 704 the client with the audio level of the different sources mixed into a 705 common mix by a RTP mixer. This enables a user interface to indicate 706 the relative activity level of each session participant, rather than 707 just being included or not based on the CSRC field. This is a pure 708 optimisations of non critical functions, and is hence OPTIONAL to 709 implement. If it is implemented, it is REQUIRED that the header 710 extensions are encrypted according to [RFC6904] since the information 711 contained in these header extensions can be considered sensitive. 713 5.2.4. Associating RTP Media Streams and Signalling Contexts 714 (tbd: it seems likely that we need a mechanism to associate RTP media 715 streams with signalling contexts. The mechanism by which this is 716 done will likely be some combination of an RTP header extension, 717 periodic transmission of a new RTCP SDES item, and some signalling 718 extension. The semantics of those items are not yet settled; see 719 draft-westerlund-avtext-rtcp-sdes-srcname, draft-ietf-mmusic-msid, 720 and draft-even-mmusic-application-token for discussion). 722 6. WebRTC Use of RTP: Improving Transport Robustness 724 There are tools that can make RTP media streams robust against packet 725 loss and reduce the impact of loss on media quality. However, they 726 all add extra bits compared to a non-robust stream. The overhead of 727 these extra bits needs to be considered, and the aggregate bit-rate 728 MUST be rate controlled to avoid causing network congestion (see 729 Section 7). As a result, improving robustness might require a lower 730 base encoding quality, but has the potential to deliver that quality 731 with fewer errors. The mechanisms described in the following sub- 732 sections can be used to improve tolerance to packet loss. 734 6.1. Negative Acknowledgements and RTP Retransmission 736 As a consequence of supporting the RTP/SAVPF profile, implementations 737 can support negative acknowledgements (NACKs) for RTP data packets 738 [RFC4585]. This feedback can be used to inform a sender of the loss 739 of particular RTP packets, subject to the capacity limitations of the 740 RTCP feedback channel. A sender can use this information to optimise 741 the user experience by adapting the media encoding to compensate for 742 known lost packets, for example. 744 Senders are REQUIRED to understand the Generic NACK message defined 745 in Section 6.2.1 of [RFC4585], but MAY choose to ignore this feedback 746 (following Section 4.2 of [RFC4585]). Receivers MAY send NACKs for 747 missing RTP packets; [RFC4585] provides some guidelines on when to 748 send NACKs. It is not expected that a receiver will send a NACK for 749 every lost RTP packet, rather it needs to consider the cost of 750 sending NACK feedback, and the importance of the lost packet, to make 751 an informed decision on whether it is worth telling the sender about 752 a packet loss event. 754 The RTP Retransmission Payload Format [RFC4588] offers the ability to 755 retransmit lost packets based on NACK feedback. Retransmission needs 756 to be used with care in interactive real-time applications to ensure 757 that the retransmitted packet arrives in time to be useful, but can 758 be effective in environments with relatively low network RTT (an RTP 759 sender can estimate the RTT to the receivers using the information in 760 RTCP SR and RR packets, as described at the end of Section 6.4.1 of 761 [RFC3550]). The use of retransmissions can also increase the forward 762 RTP bandwidth, and can potentially worsen the problem if the packet 763 loss was caused by network congestion. We note, however, that 764 retransmission of an important lost packet to repair decoder state 765 can have lower cost than sending a full intra frame. It is not 766 appropriate to blindly retransmit RTP packets in response to a NACK. 767 The importance of lost packets and the likelihood of them arriving in 768 time to be useful needs to be considered before RTP retransmission is 769 used. 771 Receivers are REQUIRED to implement support for RTP retransmission 772 packets [RFC4588]. Senders MAY send RTP retransmission packets in 773 response to NACKs if the RTP retransmission payload format has been 774 negotiated for the session, and if the sender believes it is useful 775 to send a retransmission of the packet(s) referenced in the NACK. An 776 RTP sender does not need to retransmit every NACKed packet. 778 6.2. Forward Error Correction (FEC) 780 The use of Forward Error Correction (FEC) can provide an effective 781 protection against some degree of packet loss, at the cost of steady 782 bandwidth overhead. There are several FEC schemes that are defined 783 for use with RTP. Some of these schemes are specific to a particular 784 RTP payload format, others operate across RTP packets and can be used 785 with any payload format. It needs to be noted that using redundant 786 encoding or FEC will lead to increased play out delay, which needs to 787 be considered when choosing the redundancy or FEC formats and their 788 respective parameters. 790 If an RTP payload format negotiated for use in a WebRTC session 791 supports redundant transmission or FEC as a standard feature of that 792 payload format, then that support MAY be used in the WebRTC session, 793 subject to any appropriate signalling. 795 There are several block-based FEC schemes that are designed for use 796 with RTP independent of the chosen RTP payload format. At the time 797 of this writing there is no consensus on which, if any, of these FEC 798 schemes is appropriate for use in the WebRTC context. Accordingly, 799 this memo makes no recommendation on the choice of block-based FEC 800 for WebRTC use. 802 7. WebRTC Use of RTP: Rate Control and Media Adaptation 804 WebRTC will be used in heterogeneous network environments using a 805 variety set of link technologies, including both wired and wireless 806 links, to interconnect potentially large groups of users around the 807 world. As a result, the network paths between users can have widely 808 varying one-way delays, available bit-rates, load levels, and traffic 809 mixtures. Individual end-points can send one or more RTP media 810 streams to each participant in a WebRTC conference, and there can be 811 several participants. Each of these RTP media streams can contain 812 different types of media, and the type of media, bit rate, and number 813 of flows can be highly asymmetric. Non-RTP traffic can share the 814 network paths with RTP flows. Since the network environment is not 815 predictable or stable, WebRTC endpoints MUST ensure that the RTP 816 traffic they generate can adapt to match changes in the available 817 network capacity. 819 The quality of experience for users of WebRTC implementation is very 820 dependent on effective adaptation of the media to the limitations of 821 the network. End-points have to be designed so they do not transmit 822 significantly more data than the network path can support, except for 823 very short time periods, otherwise high levels of network packet loss 824 or delay spikes will occur, causing media quality degradation. The 825 limiting factor on the capacity of the network path might be the link 826 bandwidth, or it might be competition with other traffic on the link 827 (this can be non-WebRTC traffic, traffic due to other WebRTC flows, 828 or even competition with other WebRTC flows in the same session). 830 An effective media congestion control algorithm is therefore an 831 essential part of the WebRTC framework. However, at the time of this 832 writing, there is no standard congestion control algorithm that can 833 be used for interactive media applications such as WebRTC flows. 834 Some requirements for congestion control algorithms for WebRTC 835 sessions are discussed in [I-D.jesup-rtp-congestion-reqs], and it is 836 expected that a future version of this memo will mandate the use of a 837 congestion control algorithm that satisfies these requirements. 839 7.1. Boundary Conditions and Circuit Breakers 841 In the absence of a concrete congestion control algorithm, all WebRTC 842 implementations MUST implement the RTP circuit breaker algorithm that 843 is in described [I-D.ietf-avtcore-rtp-circuit-breakers]. The RTP 844 circuit breaker is designed to enable applications to recognise and 845 react to situations of extreme network congestion. However, since 846 the RTP circuit breaker might not be triggered until congestion 847 becomes extreme, it cannot be considered a substitute for congestion 848 control, and applications MUST also implement congestion control to 849 allow them to adapt to changes in network capacity. Any future RTP 850 congestion control algorithms are expected to operate within the 851 envelope allowed by the circuit breaker. 853 The session establishment signalling will also necessarily establish 854 boundaries to which the media bit-rate will conform. The choice of 855 media codecs provides upper- and lower-bounds on the supported bit- 856 rates that the application can utilise to provide useful quality, and 857 the packetization choices that exist. In addition, the signalling 858 channel can establish maximum media bit-rate boundaries using the SDP 859 "b=AS:" or "b=CT:" lines, and the RTP/AVPF Temporary Maximum Media 860 Stream Bit Rate (TMMBR) Requests (see Section 5.1.6 of this memo). 861 The combination of media codec choice and signalled bandwidth limits 862 SHOULD be used to limit traffic based on known bandwidth limitations, 863 for example the capacity of the edge links, to the extent possible. 865 7.2. RTCP Limitations for Congestion Control 867 Experience with the congestion control algorithms of TCP [RFC5681], 868 TFRC [RFC5348], and DCCP [RFC4341], [RFC4342], [RFC4828], has shown 869 that feedback on packet arrivals needs to be sent roughly once per 870 round trip time. We note that the real-time media traffic might not 871 have to adapt to changing path conditions as rapidly as needed for 872 the elastic applications TCP was designed for, but frequent feedback 873 is still needed to allow the congestion control algorithm to track 874 the path dynamics. 876 The total RTCP bandwidth is limited in its transmission rate to a 877 fraction of the RTP traffic (by default 5%). RTCP packets are larger 878 than, e.g., TCP ACKs (even when non-compound RTCP packets are used). 879 The RTP media stream bit rate thus limits the maximum feedback rate 880 as a function of the mean RTCP packet size. 882 Interactive communication might not be able to afford waiting for 883 packet losses to occur to indicate congestion, because an increase in 884 play out delay due to queuing (most prominent in wireless networks) 885 can easily lead to packets being dropped due to late arrival at the 886 receiver. Therefore, more sophisticated cues might need to be 887 reported -- to be defined in a suitable congestion control framework 888 as noted above -- which, in turn, increase the report size again. 889 For example, different RTCP XR report blocks (jointly) provide the 890 necessary details to implement a variety of congestion control 891 algorithms, but the (compound) report size grows quickly. 893 In group communication, the share of RTCP bandwidth needs to be 894 shared by all group members, reducing the capacity and thus the 895 reporting frequency per node. 897 Example: assuming 512 kbit/s video yields 3200 bytes/s RTCP 898 bandwidth, split across two entities in a point-to-point session. An 899 endpoint could thus send a report of 100 bytes about every 70ms or 900 for every other frame in a 30 fps video. 902 7.3. Congestion Control Interoperability and Legacy Systems 904 There are legacy implementations that do not implement RTCP, and 905 hence do not provide any congestion feedback. Congestion control 906 cannot be performed with these end-points. WebRTC implementations 907 that need to interwork with such end-points MUST limit their 908 transmission to a low rate, equivalent to a VoIP call using a low 909 bandwidth codec, that is unlikely to cause any significant 910 congestion. 912 When interworking with legacy implementations that support RTCP using 913 the RTP/AVP profile [RFC3551], congestion feedback is provided in 914 RTCP RR packets every few seconds. Implementations that have to 915 interwork with such end-points MUST ensure that they keep within the 916 RTP circuit breaker [I-D.ietf-avtcore-rtp-circuit-breakers] 917 constraints to limit the congestion they can cause. 919 If a legacy end-point supports RTP/AVPF, this enables negotiation of 920 important parameters for frequent reporting, such as the "trr-int" 921 parameter, and the possibility that the end-point supports some 922 useful feedback format for congestion control purpose such as TMMBR 923 [RFC5104]. Implementations that have to interwork with such end- 924 points MUST ensure that they stay within the RTP circuit breaker 925 [I-D.ietf-avtcore-rtp-circuit-breakers] constraints to limit the 926 congestion they can cause, but might find that they can achieve 927 better congestion response depending on the amount of feedback that 928 is available. 930 With proprietary congestion control algorithms issues can arise when 931 different algorithms and implementations interact in a communication 932 session. If the different implementations have made different 933 choices in regards to the type of adaptation, for example one sender 934 based, and one receiver based, then one could end up in situation 935 where one direction is dual controlled, when the other direction is 936 not controlled. This memo cannot mandate behaviour for proprietary 937 congestion control algorithms, but implementations that use such 938 algorithms ought to be aware of this issue, and try to ensure that 939 both effective congestion control is negotiated for media flowing in 940 both directions. If the IETF were to standardise both sender- and 941 receiver-based congestion control algorithms for WebRTC traffic in 942 the future, the issues of interoperability, control, and ensuring 943 that both directions of media flow are congestion controlled would 944 also need to be considered. 946 8. WebRTC Use of RTP: Performance Monitoring 948 As described in Section 4.1, implementations are REQUIRED to generate 949 RTCP Sender Report (SR) and Reception Report (RR) packets relating to 950 the RTP media streams they send and receive. These RTCP reports can 951 be used for performance monitoring purposes, since they include basic 952 packet loss and jitter statistics. 954 A large number of additional performance metrics are supported by the 955 RTCP Extended Reports (XR) framework [RFC3611][RFC6792]. It is not 956 yet clear what extended metrics are appropriate for use in the WebRTC 957 context, so there is no requirement that implementations generate 958 RTCP XR packets. However, implementations that can use detailed 959 performance monitoring data MAY generate RTCP XR packets as 960 appropriate; the use of such packets SHOULD be signalled in advance. 962 All WebRTC implementations MUST be prepared to receive RTP XR report 963 packets, whether or not they were signalled. There is no requirement 964 that the data contained in such reports be used, or exposed to the 965 Javascript application, however. 967 9. WebRTC Use of RTP: Future Extensions 969 It is possible that the core set of RTP protocols and RTP extensions 970 specified in this memo will prove insufficient for the future needs 971 of WebRTC applications. In this case, future updates to this memo 972 MUST be made following the Guidelines for Writers of RTP Payload 973 Format Specifications [RFC2736] and Guidelines for Extending the RTP 974 Control Protocol [RFC5968], and SHOULD take into account any future 975 guidelines for extending RTP and related protocols that have been 976 developed. 978 Authors of future extensions are urged to consider the wide range of 979 environments in which RTP is used when recommending extensions, since 980 extensions that are applicable in some scenarios can be problematic 981 in others. Where possible, the WebRTC framework will adopt RTP 982 extensions that are of general utility, to enable easy implementation 983 of a gateway to other applications using RTP, rather than adopt 984 mechanisms that are narrowly targeted at specific WebRTC use cases. 986 10. Signalling Considerations 988 RTP is built with the assumption that an external signalling channel 989 exists, and can be used to configure RTP sessions and their features. 990 The basic configuration of an RTP session consists of the following 991 parameters: 993 RTP Profile: The name of the RTP profile to be used in session. The 994 RTP/AVP [RFC3551] and RTP/AVPF [RFC4585] profiles can interoperate 995 on basic level, as can their secure variants RTP/SAVP [RFC3711] 996 and RTP/SAVPF [RFC5124]. The secure variants of the profiles do 997 not directly interoperate with the non-secure variants, due to the 998 presence of additional header fields for authentication in SRTP 999 packets and cryptographic transformation of the payload. WebRTC 1000 requires the use of the RTP/SAVPF profile, and this MUST be 1001 signalled if SDP is used. Interworking functions might transform 1002 this into the RTP/SAVP profile for a legacy use case, by 1003 indicating to the WebRTC end-point that the RTP/SAVPF is used, and 1004 limiting the usage of the "a=rtcp:" attribute to indicate a trr- 1005 int value of 4 seconds. 1007 Transport Information: Source and destination IP address(s) and 1008 ports for RTP and RTCP MUST be signalled for each RTP session. In 1009 WebRTC these transport addresses will be provided by ICE that 1010 signals candidates and arrives at nominated candidate address 1011 pairs. If RTP and RTCP multiplexing [RFC5761] is to be used, such 1012 that a single port is used for RTP and RTCP flows, this MUST be 1013 signalled (see Section 4.5). If several RTP sessions are to be 1014 multiplexed onto a single transport layer flow, this MUST also be 1015 signalled (see Section 4.4). 1017 RTP Payload Types, media formats, and format parameters: The mapping 1018 between media type names (and hence the RTP payload formats to be 1019 used), and the RTP payload type numbers MUST be signalled. Each 1020 media type MAY also have a number of media type parameters that 1021 MUST also be signalled to configure the codec and RTP payload 1022 format (the "a=fmtp:" line from SDP). Section 4.3 of this memo 1023 discusses requirements for uniqueness of payload types. 1025 RTP Extensions: The RTP extensions to be used SHOULD be agreed upon, 1026 including any parameters for each respective extension. At the 1027 very least, this will help avoiding using bandwidth for features 1028 that the other end-point will ignore. But for certain mechanisms 1029 there is requirement for this to happen as interoperability 1030 failure otherwise happens. 1032 RTCP Bandwidth: Support for exchanging RTCP Bandwidth values to the 1033 end-points will be necessary. This SHALL be done as described in 1034 "Session Description Protocol (SDP) Bandwidth Modifiers for RTP 1035 Control Protocol (RTCP) Bandwidth" [RFC3556], or something 1036 semantically equivalent. This also ensures that the end-points 1037 have a common view of the RTCP bandwidth, this is important as too 1038 different view of the bandwidths can lead to failure to 1039 interoperate. 1041 These parameters are often expressed in SDP messages conveyed within 1042 an offer/answer exchange. RTP does not depend on SDP or on the offer 1043 /answer model, but does require all the necessary parameters to be 1044 agreed upon, and provided to the RTP implementation. We note that in 1045 the WebRTC context it will depend on the signalling model and API how 1046 these parameters need to be configured but they will be need to 1047 either set in the API or explicitly signalled between the peers. 1049 11. WebRTC API Considerations 1051 The WebRTC API [W3C.WD-webrtc-20130910] and the Media Capture and 1052 Streams API [W3C.WD-mediacapture-streams-20130903] defines and uses 1053 the concept of a MediaStream that consists of zero or more 1054 MediaStreamTracks. A MediaStreamTrack is an individual stream of 1055 media from any type of media source like a microphone or a camera, 1056 but also conceptual sources, like a audio mix or a video composition, 1057 are possible. The MediaStreamTracks within a MediaStream need to be 1058 possible to play out synchronised. 1060 A MediaStreamTrack's realisation in RTP in the context of an 1061 RTCPeerConnection consists of a source packet stream identified with 1062 an SSRC within an RTP session part of the RTCPeerConnection. The 1063 MediaStreamTrack can also result in additional packet streams, and 1064 thus SSRCs, in the same RTP session. These can be dependent packet 1065 streams from scalable encoding of the source stream associated with 1066 the MediaStreamTrack, if such a media encoder is used. They can also 1067 be redundancy packet streams, these are created when applying Forward 1068 Error Correction (Section 6.2) or RTP retransmission (Section 6.1) to 1069 the source packet stream. 1071 Note: It is quite likely that a simulcast specification will 1072 result in multiple source packet streams, and thus SSRCs, based on 1073 the same source stream associated with the MediaStreamTrack being 1074 simulcasted. Each such source packet stream can have dependent 1075 and redundant packet streams associated with them. However, the 1076 final conclusion on this awaits the specification of simulcast. 1077 Simulcast will also require signalling to correctly separate and 1078 associate the source packet streams with their sets of dependent 1079 and/or redundant streams. 1081 It is important to note that the same media source can be feeding 1082 multiple MediaStreamTracks. As different sets of constraints or 1083 other parameters can be applied to the MediaStreamTrack, each 1084 MediaStreamTrack instance added to a RTCPeerConnection SHALL result 1085 in an independent source packet stream, with its own set of 1086 associated packet streams, and thus different SSRC(s). It will 1087 depend on applied constraints and parameters if the source stream and 1088 the encoding configuration will be identical between different 1089 MediaStreamTracks sharing the same media source. Thus it is possible 1090 for multiple source packet streams to share encoded streams (but not 1091 packet streams), but this is an implementation choice to try to 1092 utilise such optimisations. Note that such optimizations would need 1093 to take into account that the constraints for one of the 1094 MediaStreamTracks can at any moment change, meaning that the encoding 1095 configurations should no longer be identical. 1097 The same MediaStreamTrack can also be included in multiple 1098 MediaStreams, thus multiple sets of MediaStreams can implicitly need 1099 to use the same synchronisation base. To ensure that this works in 1100 all cases, and don't forces a endpoint to change synchronisation base 1101 and CNAME in the middle of a ongoing delivery of any packet streams, 1102 which would cause media disruption; all MediaStreamTracks and their 1103 associated SSRCs originating from the same endpoint MUST be sent 1104 using the same CNAME within one RTCPeerConnection as well as across 1105 all RTCPeerConnections part of the same communication session 1106 context, which for a browser are a single origin. 1108 Note: It is important that the same CNAME is not used in different 1109 communication session contexts or origins, as that could enable 1110 tracking of a user and its device usage of different services. 1111 See Section 4.4.1 of Security Considerations for WebRTC 1112 [I-D.ietf-rtcweb-security] for further discussion. 1114 The reasons to require the same CNAME across multiple 1115 RTCPeerConnections is to enable synchronisation of different 1116 MediaStreamTracks originating from one endpoint despite them being 1117 transported over different RTCPeerConnections. 1119 The above will currently force a WebRTC endpoint that receives an 1120 MediaStreamTrack on one RTCPeerConnection and adds it as an outgoing 1121 on any RTCPeerConnection to perform resynchronisation of the stream. 1122 This, as the sending party needs to change the CNAME, which implies 1123 that it has to use a locally available system clock as timebase for 1124 the synchronisation. Thus, the relative relation between the 1125 timebase of the incoming stream and the system sending out needs to 1126 defined. This relation also needs monitoring for clock drift and 1127 likely adjustments of the synchronisation. The sending entity is 1128 also responsible for congestion control for its the sent streams. In 1129 cases of packet loss the loss of incoming data also needs to be 1130 handled. This leads to the observation that the method that is least 1131 likely to cause issues or interruptions in the outgoing source packet 1132 stream is a model of full decoding, including repair etc followed by 1133 encoding of the media again into the outgoing packet stream. 1134 Optimisations of this method is clearly possible and implementation 1135 specific. 1137 A WebRTC endpoint MUST support receiving multiple MediaStreamTracks, 1138 where each of different MediaStreamTracks (and their sets of 1139 associated packet streams) uses different CNAMEs. However, 1140 MediaStreamTracks that are received with different CNAMEs have no 1141 defined synchronisation. 1143 Note: The motivation for supporting reception of multiple CNAMEs 1144 are to allow for forward compatibility with any future changes 1145 that enables more efficient stream handling when endpoints relay/ 1146 forward streams. It also ensures that endpoints can interoperate 1147 with certain types of multi-stream middleboxes or endpoints that 1148 are not WebRTC. 1150 The binding between the WebRTC MediaStreams, MediaStreamTracks and 1151 the SSRC is done as specified in "Cross Session Stream Identification 1152 in the Session Description Protocol" [I-D.ietf-mmusic-msid]. This 1153 document [I-D.ietf-mmusic-msid] also defines, in section 4.1, how to 1154 map unknown source packet stream SSRCs to MediaStreamTracks and 1155 MediaStreams. Commonly the RTP Payload Type of any incoming packets 1156 will reveal if the packet stream is a source stream or a redundancy 1157 or dependent packet stream. The association to the correct source 1158 packet stream depends on the payload format in use for the packet 1159 stream. 1161 12. RTP Implementation Considerations 1163 The following discussion provides some guidance on the implementation 1164 of the RTP features described in this memo. The focus is on a WebRTC 1165 end-point implementation perspective, and while some mention is made 1166 of the behaviour of middleboxes, that is not the focus of this memo. 1168 12.1. Configuration and Use of RTP Sessions 1170 A WebRTC end-point will be a simultaneous participant in one or more 1171 RTP sessions. Each RTP session can convey multiple media flows, and 1172 can include media data from multiple end-points. In the following, 1173 we outline some ways in which WebRTC end-points can configure and use 1174 RTP sessions. 1176 12.1.1. Use of Multiple Media Flows Within an RTP Session 1178 RTP is a group communication protocol, and in a WebRTC context every 1179 RTP session can potentially contain multiple media flows. There are 1180 several reasons why this might be desirable: 1182 Multiple media types: Outside of WebRTC, it is common to use one RTP 1183 session for each type of media (e.g., one RTP session for audio 1184 and one for video, each sent on a different UDP port). However, 1185 to reduce the number of UDP ports used, the default in WebRTC is 1186 to send all types of media in a single RTP session, as described 1187 in Section 4.4, using RTP and RTCP multiplexing (Section 4.5) to 1188 further reduce the number of UDP ports needed. This RTP session 1189 then uses only one UDP flow, but will contain multiple RTP media 1190 streams, each containing a different type of media. A common 1191 example might be an end-point with a camera and microphone that 1192 sends two RTP streams, one video and one audio, into a single RTP 1193 session. 1195 Multiple Capture Devices: A WebRTC end-point might have multiple 1196 cameras, microphones, or other media capture devices, and so might 1197 want to generate several RTP media streams of the same media type. 1198 Alternatively, it might want to send media from a single capture 1199 device in several different formats or quality settings at once. 1200 Both can result in a single end-point sending multiple RTP media 1201 streams of the same media type into a single RTP session at the 1202 same time. 1204 Associated Repair Data: An end-point might send a media stream that 1205 is somehow associated with another stream. For example, it might 1206 send an RTP stream that contains FEC or retransmission data 1207 relating to another stream. Some RTP payload formats send this 1208 sort of associated repair data as part of the original media 1209 stream, while others send it as a separate stream. 1211 Layered or Multiple Description Coding: An end-point can use a 1212 layered media codec, for example H.264 SVC, or a multiple 1213 description codec, that generates multiple media flows, each with 1214 a distinct RTP SSRC, within a single RTP session. 1216 RTP Mixers, Translators, and Other Middleboxes: An RTP session, in 1217 the WebRTC context, is a point-to-point association between an 1218 end-point and some other peer device, where those devices share a 1219 common SSRC space. The peer device might be another WebRTC end- 1220 point, or it might be an RTP mixer, translator, or some other form 1221 of media processing middlebox. In the latter cases, the middlebox 1222 might send mixed or relayed RTP streams from several participants, 1223 that the WebRTC end-point will need to render. Thus, even though 1224 a WebRTC end-point might only be a member of a single RTP session, 1225 the peer device might be extending that RTP session to incorporate 1226 other end-points. WebRTC is a group communication environment and 1227 end-points need to be capable of receiving, decoding, and playing 1228 out multiple RTP media streams at once, even in a single RTP 1229 session. 1231 12.1.2. Use of Multiple RTP Sessions 1233 In addition to sending and receiving multiple media streams within a 1234 single RTP session, a WebRTC end-point might participate in multiple 1235 RTP sessions. There are several reasons why a WebRTC end-point might 1236 choose to do this: 1238 To interoperate with legacy devices: The common practice in the non- 1239 WebRTC world is to send different types of media in separate RTP 1240 sessions, for example using one RTP session for audio and another 1241 RTP session, on a different UDP port, for video. All WebRTC end- 1242 points need to support the option of sending different types of 1243 media on different RTP sessions, so they can interwork with such 1244 legacy devices. This is discussed further in Section 4.4. 1246 To provide enhanced quality of service: Some network-based quality 1247 of service mechanisms operate on the granularity of UDP 5-tuples. 1248 If it is desired to use these mechanisms to provide differentiated 1249 quality of service for some RTP flows, then those RTP flows need 1250 to be sent in a separate RTP session using a different UDP port 1251 number, and with appropriate quality of service marking. This is 1252 discussed further in Section 12.1.3. 1254 To separate media with different purposes: An end-point might want 1255 to send media streams that have different purposes on different 1256 RTP sessions, to make it easy for the peer device to distinguish 1257 them. For example, some centralised multiparty conferencing 1258 systems display the active speaker in high resolution, but show 1259 low resolution "thumbnails" of other participants. Such systems 1260 might configure the end-points to send simulcast high- and low- 1261 resolution versions of their video using separate RTP sessions, to 1262 simplify the operation of the central mixer. In the WebRTC 1263 context this appears to be most easily accomplished by 1264 establishing multiple RTCPeerConnection all being feed the same 1265 set of WebRTC MediaStreams. Each RTCPeerConnection is then 1266 configured to deliver a particular media quality and thus media 1267 bit-rate, and will produce an independently encoded version with 1268 the codec parameters agreed specifically in the context of that 1269 RTCPeerConnection. The central mixer can always distinguish 1270 packets corresponding to the low- and high-resolution streams by 1271 inspecting their SSRC, RTP payload type, or some other information 1272 contained in RTP header extensions or RTCP packets, but it can be 1273 easier to distinguish the flows if they arrive on separate RTP 1274 sessions on separate UDP ports. 1276 To directly connect with multiple peers: A multi-party conference 1277 does not need to use a central mixer. Rather, a multi-unicast 1278 mesh can be created, comprising several distinct RTP sessions, 1279 with each participant sending RTP traffic over a separate RTP 1280 session (that is, using an independent RTCPeerConnection object) 1281 to every other participant, as shown in Figure 1. This topology 1282 has the benefit of not requiring a central mixer node that is 1283 trusted to access and manipulate the media data. The downside is 1284 that it increases the used bandwidth at each sender by requiring 1285 one copy of the RTP media streams for each participant that are 1286 part of the same session beyond the sender itself. 1288 +---+ +---+ 1289 | A |<--->| B | 1290 +---+ +---+ 1291 ^ ^ 1292 \ / 1293 \ / 1294 v v 1295 +---+ 1296 | C | 1297 +---+ 1299 Figure 1: Multi-unicast using several RTP sessions 1301 The multi-unicast topology could also be implemented as a single 1302 RTP session, spanning multiple peer-to-peer transport layer 1303 connections, or as several pairwise RTP sessions, one between each 1304 pair of peers. To maintain a coherent mapping between the 1305 relation between RTP sessions and RTCPeerConnection objects we 1306 recommend that this is implemented as several individual RTP 1307 sessions. The only downside is that end-point A will not learn of 1308 the quality of any transmission happening between B and C, since 1309 it will not see RTCP reports for the RTP session between B and C, 1310 whereas it would it all three participants were part of a single 1311 RTP session. Experience with the Mbone tools (experimental RTP- 1312 based multicast conferencing tools from the late 1990s) has showed 1313 that RTCP reception quality reports for third parties can usefully 1314 be presented to the users in a way that helps them understand 1315 asymmetric network problems, and the approach of using separate 1316 RTP sessions prevents this. However, an advantage of using 1317 separate RTP sessions is that it enables using different media 1318 bit-rates and RTP session configurations between the different 1319 peers, thus not forcing B to endure the same quality reductions if 1320 there are limitations in the transport from A to C as C will. It 1321 it believed that these advantages outweigh the limitations in 1322 debugging power. 1324 To indirectly connect with multiple peers: A common scenario in 1325 multi-party conferencing is to create indirect connections to 1326 multiple peers, using an RTP mixer, translator, or some other type 1327 of RTP middlebox. Figure 2 outlines a simple topology that might 1328 be used in a four-person centralised conference. The middlebox 1329 acts to optimise the transmission of RTP media streams from 1330 certain perspectives, either by only sending some of the received 1331 RTP media stream to any given receiver, or by providing a combined 1332 RTP media stream out of a set of contributing streams. 1334 +---+ +-------------+ +---+ 1335 | A |<---->| |<---->| B | 1336 +---+ | RTP mixer, | +---+ 1337 | translator, | 1338 | or other | 1339 +---+ | middlebox | +---+ 1340 | C |<---->| |<---->| D | 1341 +---+ +-------------+ +---+ 1343 Figure 2: RTP mixer with only unicast paths 1345 There are various methods of implementation for the middlebox. If 1346 implemented as a standard RTP mixer or translator, a single RTP 1347 session will extend across the middlebox and encompass all the 1348 end-points in one multi-party session. Other types of middlebox 1349 might use separate RTP sessions between each end-point and the 1350 middlebox. A common aspect is that these central nodes can use a 1351 number of tools to control the media encoding provided by a WebRTC 1352 end-point. This includes functions like requesting breaking the 1353 encoding chain and have the encoder produce a so called Intra 1354 frame. Another is limiting the bit-rate of a given stream to 1355 better suit the mixer view of the multiple down-streams. Others 1356 are controlling the most suitable frame-rate, picture resolution, 1357 the trade-off between frame-rate and spatial quality. The 1358 middlebox gets the significant responsibility to correctly perform 1359 congestion control, source identification, manage synchronisation 1360 while providing the application with suitable media optimizations. 1361 The middlebox is also has to be a trusted node when it comes to 1362 security, since it manipulates either the RTP header or the media 1363 itself (or both) received from one end-point, before sending it on 1364 towards the end-point(s), thus they need to be able to decrypt and 1365 then encrypt it before sending it out. 1367 RTP Mixers can create a situation where an end-point experiences a 1368 situation in-between a session with only two end-points and 1369 multiple RTP sessions. Mixers are expected to not forward RTCP 1370 reports regarding RTP media streams across themselves. This is 1371 due to the difference in the RTP media streams provided to the 1372 different end-points. The original media source lacks information 1373 about a mixer's manipulations prior to sending it the different 1374 receivers. This scenario also results in that an end-point's 1375 feedback or requests goes to the mixer. When the mixer can't act 1376 on this by itself, it is forced to go to the original media source 1377 to fulfil the receivers request. This will not necessarily be 1378 explicitly visible any RTP and RTCP traffic, but the interactions 1379 and the time to complete them will indicate such dependencies. 1381 Providing source authentication in multi-party scenarios is a 1382 challenge. In the mixer-based topologies, end-points source 1383 authentication is based on, firstly, verifying that media comes 1384 from the mixer by cryptographic verification and, secondly, trust 1385 in the mixer to correctly identify any source towards the end- 1386 point. In RTP sessions where multiple end-points are directly 1387 visible to an end-point, all end-points will have knowledge about 1388 each others' master keys, and can thus inject packets claimed to 1389 come from another end-point in the session. Any node performing 1390 relay can perform non-cryptographic mitigation by preventing 1391 forwarding of packets that have SSRC fields that came from other 1392 end-points before. For cryptographic verification of the source 1393 SRTP would require additional security mechanisms, for example 1394 TESLA for SRTP [RFC4383], that are not part of the base WebRTC 1395 standards. 1397 To forward media between multiple peers: It is sometimes desirable 1398 for an end-point that receives an RTP media stream to be able to 1399 forward that media stream to a third party. The are some obvious 1400 security and privacy implications in supporting this, but also 1401 potential uses. This is supported in the W3C API by taking the 1402 received and decoded media and using it as media source that is 1403 re-encoding and transmitted as a new stream. 1405 At the RTP layer, media forwarding acts as a back-to-back RTP 1406 receiver and RTP sender. The receiving side terminates the RTP 1407 session and decodes the media, while the sender side re-encodes 1408 and transmits the media using an entirely separate RTP session. 1409 The original sender will only see a single receiver of the media, 1410 and will not be able to tell that forwarding is happening based on 1411 RTP-layer information since the RTP session that is used to send 1412 the forwarded media is not connected to the RTP session on which 1413 the media was received by the node doing the forwarding. 1415 The end-point that is performing the forwarding is responsible for 1416 producing an RTP media stream suitable for onwards transmission. 1417 The outgoing RTP session that is used to send the forwarded media 1418 is entirely separate to the RTP session on which the media was 1419 received. This will require media transcoding for congestion 1420 control purpose to produce a suitable bit-rate for the outgoing 1421 RTP session, reducing media quality and forcing the forwarding 1422 end-point to spend the resource on the transcoding. The media 1423 transcoding does result in a separation of the two different legs 1424 removing almost all dependencies, and allowing the forwarding end- 1425 point to optimize its media transcoding operation. The cost is 1426 greatly increased computational complexity on the forwarding node. 1427 Receivers of the forwarded stream will see the forwarding device 1428 as the sender of the stream, and will not be able to tell from the 1429 RTP layer that they are receiving a forwarded stream rather than 1430 an entirely new media stream generated by the forwarding device. 1432 12.1.3. Differentiated Treatment of Flows 1434 There are use cases for differentiated treatment of RTP media 1435 streams. Such differentiation can happen at several places in the 1436 system. First of all is the prioritization within the end-point 1437 sending the media, which controls, both which RTP media streams that 1438 will be sent, and their allocation of bit-rate out of the current 1439 available aggregate as determined by the congestion control. 1441 It is expected that the WebRTC API will allow the application to 1442 indicate relative priorities for different MediaStreamTracks. These 1443 priorities can then be used to influence the local RTP processing, 1444 especially when it comes to congestion control response in how to 1445 divide the available bandwidth between the RTP flows. Any changes in 1446 relative priority will also need to be considered for RTP flows that 1447 are associated with the main RTP flows, such as RTP retransmission 1448 streams and FEC. The importance of such associated RTP traffic flows 1449 is dependent on the media type and codec used, in regards to how 1450 robust that codec is to packet loss. However, a default policy might 1451 to be to use the same priority for associated RTP flows as for the 1452 primary RTP flow. 1454 Secondly, the network can prioritize packet flows, including RTP 1455 media streams. Typically, differential treatment includes two steps, 1456 the first being identifying whether an IP packet belongs to a class 1457 that has to be treated differently, the second the actual mechanism 1458 to prioritize packets. This is done according to three methods: 1460 DiffServ: The end-point marks a packet with a DiffServ code point to 1461 indicate to the network that the packet belongs to a particular 1462 class. 1464 Flow based: Packets that need to be given a particular treatment are 1465 identified using a combination of IP and port address. 1467 Deep Packet Inspection: A network classifier (DPI) inspects the 1468 packet and tries to determine if the packet represents a 1469 particular application and type that is to be prioritized. 1471 Flow-based differentiation will provide the same treatment to all 1472 packets within a flow, i.e., relative prioritization is not possible. 1473 Moreover, if the resources are limited it might not be possible to 1474 provide differential treatment compared to best-effort for all the 1475 flows in a WebRTC application. When flow-based differentiation is 1476 available the WebRTC application needs to know about it so that it 1477 can provide the separation of the RTP media streams onto different 1478 UDP flows to enable a more granular usage of flow based 1479 differentiation. That way at least providing different 1480 prioritization of audio and video if desired by application. 1482 DiffServ assumes that either the end-point or a classifier can mark 1483 the packets with an appropriate DSCP so that the packets are treated 1484 according to that marking. If the end-point is to mark the traffic 1485 two requirements arise in the WebRTC context: 1) The WebRTC 1486 application or browser has to know which DSCP to use and that it can 1487 use them on some set of RTP media streams. 2) The information needs 1488 to be propagated to the operating system when transmitting the 1489 packet. Details of this process are outside the scope of this memo 1490 and are further discussed in "DSCP and other packet markings for 1491 RTCWeb QoS" [I-D.dhesikan-tsvwg-rtcweb-qos]. 1493 For packet based marking schemes it might be possible to mark 1494 individual RTP packets differently based on the relative priority of 1495 the RTP payload. For example video codecs that have I, P, and B 1496 pictures could prioritise any payloads carrying only B frames less, 1497 as these are less damaging to loose. As default policy all RTP 1498 packets related to a media stream ought to be provided with the same 1499 prioritization; per-packet prioritization is outside the scope of 1500 this memo, but might be specified elsewhere in future. 1502 It is also important to consider how RTCP packets associated with a 1503 particular RTP media flow need to be marked. RTCP compound packets 1504 with Sender Reports (SR), ought to be marked with the same priority 1505 as the RTP media flow itself, so the RTCP-based round-trip time (RTT) 1506 measurements are done using the same flow priority as the media flow 1507 experiences. RTCP compound packets containing RR packet ought to be 1508 sent with the priority used by the majority of the RTP media flows 1509 reported on. RTCP packets containing time-critical feedback packets 1510 can use higher priority to improve the timeliness and likelihood of 1511 delivery of such feedback. 1513 12.2. Source, Flow, and Participant Identification 1514 12.2.1. Media Streams 1516 Each RTP media stream is identified by a unique synchronisation 1517 source (SSRC) identifier. The SSRC identifier is carried in the RTP 1518 data packets comprising a media stream, and is also used to identify 1519 that stream in the corresponding RTCP reports. The SSRC is chosen as 1520 discussed in Section 4.8. The first stage in demultiplexing RTP and 1521 RTCP packets received at a WebRTC end-point is to separate the media 1522 streams based on their SSRC value; once that is done, additional 1523 demultiplexing steps can determine how and where to render the media. 1525 RTP allows a mixer, or other RTP-layer middlebox, to combine media 1526 flows from multiple sources to form a new media flow. The RTP data 1527 packets in that new flow can include a Contributing Source (CSRC) 1528 list, indicating which original SSRCs contributed to the combined 1529 packet. As described in Section 4.1, implementations need to support 1530 reception of RTP data packets containing a CSRC list and RTCP packets 1531 that relate to sources present in the CSRC list. The CSRC list can 1532 change on a packet-by-packet basis, depending on the mixing operation 1533 being performed. Knowledge of what sources contributed to a 1534 particular RTP packet can be important if the user interface 1535 indicates which participants are active in the session. Changes in 1536 the CSRC list included in packets needs to be exposed to the WebRTC 1537 application using some API, if the application is to be able to track 1538 changes in session participation. It is desirable to map CSRC values 1539 back into WebRTC MediaStream identities as they cross this API, to 1540 avoid exposing the SSRC/CSRC name space to JavaScript applications. 1542 If the mixer-to-client audio level extension [RFC6465] is being used 1543 in the session (see Section 5.2.3), the information in the CSRC list 1544 is augmented by audio level information for each contributing source. 1545 This information can usefully be exposed in the user interface. 1547 12.2.2. Media Streams: SSRC Collision Detection 1549 The RTP standard [RFC3550] requires any RTP implementation to have 1550 support for detecting and handling SSRC collisions, i.e., resolve the 1551 conflict when two different end-points use the same SSRC value. This 1552 requirement also applies to WebRTC end-points. There are several 1553 scenarios where SSRC collisions can occur. 1555 In a point-to-point session where each SSRC is associated with either 1556 of the two end-points and where the main media carrying SSRC 1557 identifier will be announced in the signalling channel, a collision 1558 is less likely to occur due to the information about used SSRCs 1559 provided by Source-Specific SDP Attributes [RFC5576]. Still if both 1560 end-points start uses an new SSRC identifier prior to having 1561 signalled it to the peer and received acknowledgement on the 1562 signalling message, there can be collisions. The Source-Specific SDP 1563 Attributes [RFC5576] contains no mechanism to resolve SSRC collisions 1564 or reject a end-points usage of an SSRC. 1566 There could also appear SSRC values that are not signalled. This is 1567 more likely than it appears as certain RTP functions need extra SSRCs 1568 to provide functionality related to another (the "main") SSRC, for 1569 example, SSRC multiplexed RTP retransmission [RFC4588]. In those 1570 cases, an end-point can create a new SSRC that strictly doesn't need 1571 to be announced over the signalling channel to function correctly on 1572 both RTP and RTCPeerConnection level. 1574 The more likely case for SSRC collision is that multiple end-points 1575 in a multiparty conference create new sources and signals those 1576 towards the central server. In cases where the SSRC/CSRC are 1577 propagated between the different end-points from the central node 1578 collisions can occur. 1580 Another scenario is when the central node manages to connect an end- 1581 point's RTCPeerConnection to another RTCPeerConnection the end-point 1582 already has, thus forming a loop where the end-point will receive its 1583 own traffic. While is is clearly considered a bug, it is important 1584 that the end-point is able to recognise and handle the case when it 1585 occurs. This case becomes even more problematic when media mixers, 1586 and so on, are involved, where the stream received is a different 1587 stream but still contains this client's input. 1589 These SSRC/CSRC collisions can only be handled on RTP level as long 1590 as the same RTP session is extended across multiple 1591 RTCPeerConnections by a RTP middlebox. To resolve the more generic 1592 case where multiple RTCPeerConnections are interconnected, then 1593 identification of the media source(s) part of a MediaStreamTrack 1594 being propagated across multiple interconnected RTCPeerConnection 1595 needs to be preserved across these interconnections. 1597 12.2.3. Media Synchronisation Context 1599 When an end-point sends media from more than one media source, it 1600 needs to consider if (and which of) these media sources are to be 1601 synchronized. In RTP/RTCP, synchronisation is provided by having a 1602 set of RTP media streams be indicated as coming from the same 1603 synchronisation context and logical end-point by using the same RTCP 1604 CNAME identifier. 1606 The next provision is that the internal clocks of all media sources, 1607 i.e., what drives the RTP timestamp, can be correlated to a system 1608 clock that is provided in RTCP Sender Reports encoded in an NTP 1609 format. By correlating all RTP timestamps to a common system clock 1610 for all sources, the timing relation of the different RTP media 1611 streams, also across multiple RTP sessions can be derived at the 1612 receiver and, if desired, the streams can be synchronized. The 1613 requirement is for the media sender to provide the correlation 1614 information; it is up to the receiver to use it or not. 1616 13. Security Considerations 1618 The overall security architecture for WebRTC is described in 1619 [I-D.ietf-rtcweb-security-arch], and security considerations for the 1620 WebRTC framework are described in [I-D.ietf-rtcweb-security]. These 1621 considerations apply to this memo also. 1623 The security considerations of the RTP specification, the RTP/SAVPF 1624 profile, and the various RTP/RTCP extensions and RTP payload formats 1625 that form the complete protocol suite described in this memo apply. 1626 We do not believe there are any new security considerations resulting 1627 from the combination of these various protocol extensions. 1629 The Extended Secure RTP Profile for Real-time Transport Control 1630 Protocol (RTCP)-Based Feedback [RFC5124] (RTP/SAVPF) provides 1631 handling of fundamental issues by offering confidentiality, integrity 1632 and partial source authentication. A mandatory to implement media 1633 security solution is created by combing this secured RTP profile and 1634 DTLS-SRTP keying [RFC5764] as defined by Section 5.5 of 1635 [I-D.ietf-rtcweb-security-arch]. 1637 RTCP packets convey a Canonical Name (CNAME) identifier that is used 1638 to associate media flows that need to be synchronised across related 1639 RTP sessions. Inappropriate choice of CNAME values can be a privacy 1640 concern, since long-term persistent CNAME identifiers can be used to 1641 track users across multiple WebRTC calls. Section 4.9 of this memo 1642 provides guidelines for generation of untraceable CNAME values that 1643 alleviate this risk. 1645 The guidelines in [RFC6562] apply when using variable bit rate (VBR) 1646 audio codecs such as Opus (see Section 4.3 for discussion of mandated 1647 audio codecs). These guidelines in [RFC6562] also apply, but are of 1648 lesser importance, when using the client-to-mixer audio level header 1649 extensions (Section 5.2.2) or the mixer-to-client audio level header 1650 extensions (Section 5.2.3). 1652 14. IANA Considerations 1654 This memo makes no request of IANA. 1656 Note to RFC Editor: this section is to be removed on publication as 1657 an RFC. 1659 15. Open Issues 1661 This section contains a summary of the open issues or to be done 1662 things noted in the document: 1664 1. tbd: The discussion at IETF 88 confirmed that there is broad 1665 agreement to support simulcast, however the method for achieving 1666 simulcast of a media source has to be decided. 1668 16. Acknowledgements 1670 The authors would like to thank Bernard Aboba, Harald Alvestrand, 1671 Cary Bran, Charles Eckel, Cullen Jennings, Dan Romascanu, and the 1672 other members of the IETF RTCWEB working group for their valuable 1673 feedback. 1675 17. References 1677 17.1. Normative References 1679 [I-D.ietf-avtcore-multi-media-rtp-session] 1680 Westerlund, M., Perkins, C., and J. Lennox, "Sending 1681 Multiple Types of Media in a Single RTP Session", draft- 1682 ietf-avtcore-multi-media-rtp-session-03 (work in 1683 progress), July 2013. 1685 [I-D.ietf-avtcore-rtp-circuit-breakers] 1686 Perkins, C. and V. Singh, "Multimedia Congestion Control: 1687 Circuit Breakers for Unicast RTP Sessions", draft-ietf- 1688 avtcore-rtp-circuit-breakers-03 (work in progress), July 1689 2013. 1691 [I-D.ietf-avtcore-rtp-multi-stream-optimisation] 1692 Lennox, J., Westerlund, M., Wu, W., and C. Perkins, 1693 "Sending Multiple Media Streams in a Single RTP Session: 1694 Grouping RTCP Reception Statistics and Other Feedback", 1695 draft-ietf-avtcore-rtp-multi-stream-optimisation-00 (work 1696 in progress), July 2013. 1698 [I-D.ietf-avtcore-rtp-multi-stream] 1699 Lennox, J., Westerlund, M., Wu, W., and C. Perkins, 1700 "Sending Multiple Media Streams in a Single RTP Session", 1701 draft-ietf-avtcore-rtp-multi-stream-01 (work in progress), 1702 July 2013. 1704 [I-D.ietf-avtext-multiple-clock-rates] 1705 Petit-Huguenin, M. and G. Zorn, "Support for Multiple 1706 Clock Rates in an RTP Session", draft-ietf-avtext- 1707 multiple-clock-rates-11 (work in progress), November 1708 2013. 1710 [I-D.ietf-mmusic-sdp-bundle-negotiation] 1711 Holmberg, C., Alvestrand, H., and C. Jennings, 1712 "Multiplexing Negotiation Using Session Description 1713 Protocol (SDP) Port Numbers", draft-ietf-mmusic-sdp- 1714 bundle-negotiation-05 (work in progress), October 2013. 1716 [I-D.ietf-rtcweb-security-arch] 1717 Rescorla, E., "WebRTC Security Architecture", draft-ietf- 1718 rtcweb-security-arch-07 (work in progress), July 2013. 1720 [I-D.ietf-rtcweb-security] 1721 Rescorla, E., "Security Considerations for WebRTC", draft- 1722 ietf-rtcweb-security-05 (work in progress), July 2013. 1724 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1725 Requirement Levels", BCP 14, RFC 2119, March 1997. 1727 [RFC2736] Handley, M. and C. Perkins, "Guidelines for Writers of RTP 1728 Payload Format Specifications", BCP 36, RFC 2736, December 1729 1999. 1731 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1732 Jacobson, "RTP: A Transport Protocol for Real-Time 1733 Applications", STD 64, RFC 3550, July 2003. 1735 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1736 Video Conferences with Minimal Control", STD 65, RFC 3551, 1737 July 2003. 1739 [RFC3556] Casner, S., "Session Description Protocol (SDP) Bandwidth 1740 Modifiers for RTP Control Protocol (RTCP) Bandwidth", RFC 1741 3556, July 2003. 1743 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1744 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1745 RFC 3711, March 2004. 1747 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 1748 "Extended RTP Profile for Real-time Transport Control 1749 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 1750 2006. 1752 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 1753 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 1754 July 2006. 1756 [RFC4961] Wing, D., "Symmetric RTP / RTP Control Protocol (RTCP)", 1757 BCP 131, RFC 4961, July 2007. 1759 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 1760 "Codec Control Messages in the RTP Audio-Visual Profile 1761 with Feedback (AVPF)", RFC 5104, February 2008. 1763 [RFC5124] Ott, J. and E. Carrara, "Extended Secure RTP Profile for 1764 Real-time Transport Control Protocol (RTCP)-Based Feedback 1765 (RTP/SAVPF)", RFC 5124, February 2008. 1767 [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP 1768 Header Extensions", RFC 5285, July 2008. 1770 [RFC5506] Johansson, I. and M. Westerlund, "Support for Reduced-Size 1771 Real-Time Transport Control Protocol (RTCP): Opportunities 1772 and Consequences", RFC 5506, April 2009. 1774 [RFC5761] Perkins, C. and M. Westerlund, "Multiplexing RTP Data and 1775 Control Packets on a Single Port", RFC 5761, April 2010. 1777 [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer 1778 Security (DTLS) Extension to Establish Keys for the Secure 1779 Real-time Transport Protocol (SRTP)", RFC 5764, May 2010. 1781 [RFC6051] Perkins, C. and T. Schierl, "Rapid Synchronisation of RTP 1782 Flows", RFC 6051, November 2010. 1784 [RFC6464] Lennox, J., Ivov, E., and E. Marocco, "A Real-time 1785 Transport Protocol (RTP) Header Extension for Client-to- 1786 Mixer Audio Level Indication", RFC 6464, December 2011. 1788 [RFC6465] Ivov, E., Marocco, E., and J. Lennox, "A Real-time 1789 Transport Protocol (RTP) Header Extension for Mixer-to- 1790 Client Audio Level Indication", RFC 6465, December 2011. 1792 [RFC6562] Perkins, C. and JM. Valin, "Guidelines for the Use of 1793 Variable Bit Rate Audio with Secure RTP", RFC 6562, March 1794 2012. 1796 [RFC6904] Lennox, J., "Encryption of Header Extensions in the Secure 1797 Real-time Transport Protocol (SRTP)", RFC 6904, April 1798 2013. 1800 [RFC7007] Terriberry, T., "Update to Remove DVI4 from the 1801 Recommended Codecs for the RTP Profile for Audio and Video 1802 Conferences with Minimal Control (RTP/AVP)", RFC 7007, 1803 August 2013. 1805 [RFC7022] Begen, A., Perkins, C., Wing, D., and E. Rescorla, 1806 "Guidelines for Choosing RTP Control Protocol (RTCP) 1807 Canonical Names (CNAMEs)", RFC 7022, September 2013. 1809 [W3C.WD-mediacapture-streams-20130903] 1810 Burnett, D., Bergkvist, A., Jennings, C., and A. 1811 Narayanan, "Media Capture and Streams", World Wide Web 1812 Consortium WD WD-mediacapture-streams-20130903, September 1813 2013, . 1816 [W3C.WD-webrtc-20130910] 1817 Bergkvist, A., Burnett, D., Jennings, C., and A. 1818 Narayanan, "WebRTC 1.0: Real-time Communication Between 1819 Browsers", World Wide Web Consortium WD WD- 1820 webrtc-20130910, September 2013, 1821 . 1823 17.2. Informative References 1825 [I-D.dhesikan-tsvwg-rtcweb-qos] 1826 Dhesikan, S., Druta, D., Jones, P., and J. Polk, "DSCP and 1827 other packet markings for RTCWeb QoS", draft-dhesikan- 1828 tsvwg-rtcweb-qos-03 (work in progress), December 2013. 1830 [I-D.ietf-avtcore-multiplex-guidelines] 1831 Westerlund, M., Perkins, C., and H. Alvestrand, 1832 "Guidelines for using the Multiplexing Features of RTP to 1833 Support Multiple Media Streams", draft-ietf-avtcore- 1834 multiplex-guidelines-01 (work in progress), July 2013. 1836 [I-D.ietf-avtcore-rtp-topologies-update] 1837 Westerlund, M. and S. Wenger, "RTP Topologies", draft- 1838 ietf-avtcore-rtp-topologies-update-01 (work in progress), 1839 October 2013. 1841 [I-D.ietf-mmusic-msid] 1842 Alvestrand, H., "Cross Session Stream Identification in 1843 the Session Description Protocol", draft-ietf-mmusic- 1844 msid-02 (work in progress), November 2013. 1846 [I-D.ietf-rtcweb-overview] 1847 Alvestrand, H., "Overview: Real Time Protocols for Brower- 1848 based Applications", draft-ietf-rtcweb-overview-08 (work 1849 in progress), September 2013. 1851 [I-D.ietf-rtcweb-use-cases-and-requirements] 1852 Holmberg, C., Hakansson, S., and G. Eriksson, "Web Real- 1853 Time Communication Use-cases and Requirements", draft- 1854 ietf-rtcweb-use-cases-and-requirements-12 (work in 1855 progress), October 2013. 1857 [I-D.jesup-rtp-congestion-reqs] 1858 Jesup, R. and H. Alvestrand, "Congestion Control 1859 Requirements For Real Time Media", draft-jesup-rtp- 1860 congestion-reqs-00 (work in progress), March 2012. 1862 [I-D.westerlund-avtcore-transport-multiplexing] 1863 Westerlund, M. and C. Perkins, "Multiplexing Multiple RTP 1864 Sessions onto a Single Lower-Layer Transport", draft- 1865 westerlund-avtcore-transport-multiplexing-07 (work in 1866 progress), October 2013. 1868 [RFC3611] Friedman, T., Caceres, R., and A. Clark, "RTP Control 1869 Protocol Extended Reports (RTCP XR)", RFC 3611, November 1870 2003. 1872 [RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion 1873 Control Protocol (DCCP) Congestion Control ID 2: TCP-like 1874 Congestion Control", RFC 4341, March 2006. 1876 [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for 1877 Datagram Congestion Control Protocol (DCCP) Congestion 1878 Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, 1879 March 2006. 1881 [RFC4383] Baugher, M. and E. Carrara, "The Use of Timed Efficient 1882 Stream Loss-Tolerant Authentication (TESLA) in the Secure 1883 Real-time Transport Protocol (SRTP)", RFC 4383, February 1884 2006. 1886 [RFC4828] Floyd, S. and E. Kohler, "TCP Friendly Rate Control 1887 (TFRC): The Small-Packet (SP) Variant", RFC 4828, April 1888 2007. 1890 [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP 1891 Friendly Rate Control (TFRC): Protocol Specification", RFC 1892 5348, September 2008. 1894 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 1895 Media Attributes in the Session Description Protocol 1896 (SDP)", RFC 5576, June 2009. 1898 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 1899 Control", RFC 5681, September 2009. 1901 [RFC5968] Ott, J. and C. Perkins, "Guidelines for Extending the RTP 1902 Control Protocol (RTCP)", RFC 5968, September 2010. 1904 [RFC6263] Marjou, X. and A. Sollaud, "Application Mechanism for 1905 Keeping Alive the NAT Mappings Associated with RTP / RTP 1906 Control Protocol (RTCP) Flows", RFC 6263, June 2011. 1908 [RFC6792] Wu, Q., Hunt, G., and P. Arden, "Guidelines for Use of the 1909 RTP Monitoring Framework", RFC 6792, November 2012. 1911 Authors' Addresses 1913 Colin Perkins 1914 University of Glasgow 1915 School of Computing Science 1916 Glasgow G12 8QQ 1917 United Kingdom 1919 Email: csp@csperkins.org 1920 URI: http://csperkins.org/ 1922 Magnus Westerlund 1923 Ericsson 1924 Farogatan 6 1925 SE-164 80 Kista 1926 Sweden 1928 Phone: +46 10 714 82 87 1929 Email: magnus.westerlund@ericsson.com 1931 Joerg Ott 1932 Aalto University 1933 School of Electrical Engineering 1934 Espoo 02150 1935 Finland 1937 Email: jorg.ott@aalto.fi