idnits 2.17.1 draft-ietf-avtcore-multiplex-guidelines-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 14, 2020) is 1533 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-13) exists of draft-ietf-perc-srtp-ekt-diet-11 -- Obsolete informational reference (is this intentional?): RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 5389 (Obsoleted by RFC 8489) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Westerlund 3 Internet-Draft B. Burman 4 Intended status: Informational Ericsson 5 Expires: August 17, 2020 C. Perkins 6 University of Glasgow 7 H. Alvestrand 8 Google 9 R. Even 10 Huawei 11 February 14, 2020 13 Guidelines for using the Multiplexing Features of RTP to Support 14 Multiple Media Streams 15 draft-ietf-avtcore-multiplex-guidelines-10 17 Abstract 19 The Real-time Transport Protocol (RTP) is a flexible protocol that 20 can be used in a wide range of applications, networks, and system 21 topologies. That flexibility makes for wide applicability, but can 22 complicate the application design process. One particular design 23 question that has received much attention is how to support multiple 24 media streams in RTP. This memo discusses the available options and 25 design trade-offs, and provides guidelines on how to use the 26 multiplexing features of RTP to support multiple media streams. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at https://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on August 17, 2020. 45 Copyright Notice 47 Copyright (c) 2020 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (https://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 63 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 64 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 65 2.2. Subjects Out of Scope . . . . . . . . . . . . . . . . . . 5 66 3. RTP Multiplexing Overview . . . . . . . . . . . . . . . . . . 5 67 3.1. Reasons for Multiplexing and Grouping RTP Streams . . . . 5 68 3.2. RTP Multiplexing Points . . . . . . . . . . . . . . . . . 6 69 3.2.1. RTP Session . . . . . . . . . . . . . . . . . . . . . 7 70 3.2.2. Synchronisation Source (SSRC) . . . . . . . . . . . . 8 71 3.2.3. Contributing Source (CSRC) . . . . . . . . . . . . . 10 72 3.2.4. RTP Payload Type . . . . . . . . . . . . . . . . . . 10 73 3.3. Issues Related to RTP Topologies . . . . . . . . . . . . 11 74 3.4. Issues Related to RTP and RTCP Protocol . . . . . . . . . 12 75 3.4.1. The RTP Specification . . . . . . . . . . . . . . . . 13 76 3.4.2. Multiple SSRCs in a Session . . . . . . . . . . . . . 14 77 3.4.3. Binding Related Sources . . . . . . . . . . . . . . . 15 78 3.4.4. Forward Error Correction . . . . . . . . . . . . . . 16 79 4. Considerations for RTP Multiplexing . . . . . . . . . . . . . 17 80 4.1. Interworking Considerations . . . . . . . . . . . . . . . 17 81 4.1.1. Application Interworking . . . . . . . . . . . . . . 17 82 4.1.2. RTP Translator Interworking . . . . . . . . . . . . . 18 83 4.1.3. Gateway Interworking . . . . . . . . . . . . . . . . 18 84 4.1.4. Multiple SSRC Legacy Considerations . . . . . . . . . 19 85 4.2. Network Considerations . . . . . . . . . . . . . . . . . 20 86 4.2.1. Quality of Service . . . . . . . . . . . . . . . . . 20 87 4.2.2. NAT and Firewall Traversal . . . . . . . . . . . . . 21 88 4.2.3. Multicast . . . . . . . . . . . . . . . . . . . . . . 22 89 4.3. Security and Key Management Considerations . . . . . . . 24 90 4.3.1. Security Context Scope . . . . . . . . . . . . . . . 24 91 4.3.2. Key Management for Multi-party Sessions . . . . . . . 25 92 4.3.3. Complexity Implications . . . . . . . . . . . . . . . 25 94 5. RTP Multiplexing Design Choices . . . . . . . . . . . . . . . 26 95 5.1. Multiple Media Types in One Session . . . . . . . . . . . 26 96 5.2. Multiple SSRCs of the Same Media Type . . . . . . . . . . 27 97 5.3. Multiple Sessions for One Media Type . . . . . . . . . . 28 98 5.4. Single SSRC per Endpoint . . . . . . . . . . . . . . . . 30 99 5.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 31 100 6. Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . 32 101 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 33 102 8. Security Considerations . . . . . . . . . . . . . . . . . . . 33 103 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 33 104 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 33 105 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 33 106 11.1. Normative References . . . . . . . . . . . . . . . . . . 33 107 11.2. Informative References . . . . . . . . . . . . . . . . . 35 108 Appendix A. Dismissing Payload Type Multiplexing . . . . . . . . 38 109 Appendix B. Signalling Considerations . . . . . . . . . . . . . 40 110 B.1. Session Oriented Properties . . . . . . . . . . . . . . . 40 111 B.2. SDP Prevents Multiple Media Types . . . . . . . . . . . . 41 112 B.3. Signalling RTP Stream Usage . . . . . . . . . . . . . . . 41 113 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 42 115 1. Introduction 117 The Real-time Transport Protocol (RTP) [RFC3550] is a commonly used 118 protocol for real-time media transport. It is a protocol that 119 provides great flexibility and can support a large set of different 120 applications. RTP was from the beginning designed for multiple 121 participants in a communication session. It supports many topology 122 paradigms and usages, as defined in [RFC7667]. RTP has several 123 multiplexing points designed for different purposes. These enable 124 support of multiple RTP streams and switching between different 125 encoding or packetization of the media. By using multiple RTP 126 sessions, sets of RTP streams can be structured for efficient 127 processing or identification. Thus, an RTP application designer 128 needs to understand how to best use the RTP session, the RTP stream 129 identifier (SSRC), and the RTP payload type to meet the application's 130 needs. 132 There have been increased interest in more advanced usage of RTP. 133 For example, multiple RTP streams can be used when a single endpoint 134 has multiple media sources (like multiple cameras or microphones) 135 that need to be sent simultaneously. Consequently, questions are 136 raised regarding the most appropriate RTP usage. The limitations in 137 some implementations, RTP/RTCP extensions, and signalling have also 138 been exposed. The authors hope that clarification on the usefulness 139 of some functionalities in RTP will result in more complete 140 implementations in the future. 142 The purpose of this document is to provide clear information about 143 the possibilities of RTP when it comes to multiplexing. The RTP 144 application designer needs to understand the implications arising 145 from a particular usage of the RTP multiplexing points. The document 146 will provide some guidelines and recommend against some usages as 147 being unsuitable, in general or for particular purposes. 149 The document starts with some definitions and then goes into the 150 existing RTP functionalities around multiplexing. Both the desired 151 behaviour and the implications of a particular behaviour depend on 152 which topologies are used, which requires some consideration. This 153 is followed by a discussion of some choices in multiplexing behaviour 154 and their impacts. Some designs of RTP usage are discussed. 155 Finally, some guidelines and examples are provided. 157 2. Definitions 159 2.1. Terminology 161 The definitions in Section 3 of [RFC3550] are referenced normatively. 163 The taxonomy defined in [RFC7656] is referenced normatively. 165 The following terms and abbreviations are used in this document: 167 Multiparty: A communication situation including multiple endpoints. 168 In this document, it will be used to refer to situations where 169 more than two endpoints communicate. 171 Multiplexing: The operation of taking multiple entities as input, 172 aggregating them onto some common resource while keeping the 173 individual entities addressable such that they can later be fully 174 and unambiguously separated (de-multiplexed) again. 176 RTP Receiver: An Endpoint or Middlebox receiving RTP streams and 177 RTCP messages. It uses at least one SSRC to send RTCP messages. 178 An RTP Receiver may also be an RTP Sender. 180 RTP Sender: An Endpoint sending one or more RTP streams, but also 181 sending RTCP messages. 183 RTP Session Group: One or more RTP sessions that are used together 184 to perform some function. Examples are multiple RTP sessions used 185 to carry different layers of a layered encoding. In an RTP 186 Session Group, CNAMEs are assumed to be valid across all RTP 187 sessions, and designate synchronisation contexts that can cross 188 RTP sessions; i.e. SSRCs that map to a common CNAME can be assumed 189 to have RTCP SR timing information derived from a common clock 190 such that they can be synchronised for playout. 192 Signalling: The process of configuring endpoints to participate in 193 one or more RTP sessions. 195 Note: The above definitions of RTP Receiver and RTP Sender are 196 consistent with the usage in [RFC3550]. 198 2.2. Subjects Out of Scope 200 This document is focused on issues that affect RTP. Thus, issues 201 that involve signalling protocols, such as whether SIP [RFC3261], 202 Jingle [JINGLE] or some other protocol is in use for session 203 configuration, the particular syntaxes used to define RTP session 204 properties, or the constraints imposed by particular choices in the 205 signalling protocols, are mentioned only as examples in order to 206 describe the RTP issues more precisely. 208 This document assumes the applications will use RTCP. While there 209 are applications that don't send RTCP, they do not conform to the RTP 210 specification, and thus can be regarded as reusing the RTP packet 211 format but not implementing the RTP protocol. 213 3. RTP Multiplexing Overview 215 3.1. Reasons for Multiplexing and Grouping RTP Streams 217 There are several reasons why an endpoint might choose to send 218 multiple media streams. In the below discussion, please keep in mind 219 that the reasons for having multiple RTP streams vary and include but 220 are not limited to the following: 222 o Multiple media sources 224 o Multiple RTP streams might be needed to represent one media source 225 (for instance when using simulcast or scalable video coding) 227 o A retransmission stream might repeat some parts of the content of 228 another RTP stream 230 o A Forward Error Correction (FEC) stream might provide material 231 that can be used to repair another RTP stream 233 o Alternative encodings during simulcast, for instance using 234 different codecs for the same audio stream 236 o Alternative formats during simulcast, for instance multiple 237 resolutions of the same video stream 239 For each of these reasons, it is necessary to decide if each 240 additional RTP stream is sent within the same RTP session as the 241 other RTP streams, or if it is necessary to use additional RTP 242 sessions to group the RTP streams. The choice suitable for one 243 reason, might not be the choice suitable for another reason. The 244 clearest understanding is associated with multiplexing multiple media 245 sources of the same media type. However, all reasons warrant 246 discussion and clarification on how to deal with them. As the 247 discussion below will show, in reality we cannot choose a single one 248 of SSRC or RTP session multiplexing solutions for all purposes. To 249 utilise RTP well and as efficiently as possible, both are needed. 250 The real issue is finding the right guidance on when to create 251 additional RTP sessions and when additional RTP streams in the same 252 RTP session is the right choice. 254 3.2. RTP Multiplexing Points 256 This section describes the multiplexing points present in the RTP 257 protocol that can be used to distinguish RTP streams and groups of 258 RTP streams. Figure 1 outlines the process of demultiplexing 259 incoming RTP streams starting already at the socket representing 260 reception of one or transport flows, e.g. an UDP destination port. 261 It also demultiplexes RTP/RTCP from any other protocols, such as STUN 262 [RFC5389] and DTLS-SRTP [RFC5764] on the same transport as described 263 in [RFC7983]. The Processing and Buffering (PB) step of Figure 1 264 terminates the RTP/RTCP protocol and prepares the RTP payload for 265 input to the decoder. 267 | 268 | packets 269 +-- v 270 | +------------+ 271 | | Socket | Transport Protocol Demultiplexing 272 | +------------+ 273 | || || 274 RTP | RTP/ || |+-----> DTLS (SRTP Keying, SCTP, etc) 275 Session | RTCP || +------> STUN (multiplexed using same port) 276 +-- || 277 +-- || 278 | (split by MID/RID and/or SSRC) 279 | || || || || 280 | || || || || 281 RTP | +--+ +--+ +--+ +--+ Jitter buffer, 282 Streams | |PB| |PB| |PB| |PB| process RTCP, etc. 283 | +--+ +--+ +--+ +--+ 284 +-- | | | | 285 (select decoder based on PT) 286 +-- | / | / 287 | +----+ | / 288 | / | | 289 Payload | +---+ +---+ +---+ 290 Formats | |Dec| |Dec| |Dec| Decoders 291 | +---+ +---+ +---+ 292 +-- 294 Figure 1: RTP Demultiplexing Process 296 3.2.1. RTP Session 298 An RTP session is the highest semantic layer in the RTP protocol, and 299 represents an association between a group of communicating endpoints. 300 RTP does not contain a session identifier, yet different RTP sessions 301 must be possible to identify both across different endpoints and 302 within a single endpoint. 304 For RTP session separation across endpoints, the set of participants 305 that form an RTP session is defined as those that share a single 306 synchronisation source space [RFC3550]. That is, if a group of 307 participants are each aware of the synchronisation source identifiers 308 belonging to the other participants, then those participants are in a 309 single RTP session. A participant can become aware of a 310 synchronisation source identifier by receiving an RTP packet 311 containing it in the SSRC field or CSRC list, by receiving an RTCP 312 packet mentioning it in an SSRC field, or through signalling (e.g., 313 the Session Description Protocol (SDP) [RFC4566] "a=ssrc:" attribute 315 [RFC5576]). Thus, the scope of an RTP session is determined by the 316 participants' network interconnection topology, in combination with 317 RTP and RTCP forwarding strategies deployed by the endpoints and any 318 middleboxes, and by the signalling. 320 For RTP session separation within a single endpoint, RTP relies on 321 the underlying transport layer, and on the signalling to identify RTP 322 sessions in a manner that is meaningful to the application. A single 323 endpoint can have one or more transport flows for the same RTP 324 session, and a single RTP session can therefore span multiple 325 transport layer flows even if all endpoints use a single transport 326 layer flow per endpoint for that RTP session. The signalling layer 327 might give RTP sessions an explicit identifier, or the identification 328 might be implicit based on the addresses and ports used. 329 Accordingly, a single RTP session can have multiple associated 330 identifiers, explicit and implicit, belonging to different contexts. 331 For example, when running RTP on top of UDP/IP, an endpoint can 332 identify and delimit an RTP session from other RTP sessions by their 333 UDP source and destination IP addresses and UDP port numbers. 334 Independently if an endpoint has one or more IP addresses, a single 335 RTP session can be using multiple IP/UDP flows for receiving and/or 336 sending RTP packets to other endpoints or middleboxes. Another 337 example is SDP media descriptions (the "m=" line and the following 338 associated lines) that signals the transport flow and RTP session 339 configuration for the endpoint's part of the RTP session. The SDP 340 grouping framework [RFC5888] allows labeling of the media 341 descriptions to be used so that RTP Session Groups can be created. 342 Through use of Negotiating Media Multiplexing Using the Session 343 Description Protocol (SDP) [I-D.ietf-mmusic-sdp-bundle-negotiation], 344 multiple media descriptions become part of a common RTP session where 345 each media description represents the RTP streams sent or received 346 for a media source. 348 The RTP protocol makes no normative statements about the relationship 349 between different RTP sessions, however the applications that use 350 more than one RTP session will have some higher layer understanding 351 of the relationship between the sessions they create. 353 3.2.2. Synchronisation Source (SSRC) 355 A synchronisation source (SSRC) identifies a source of an RTP stream, 356 or an RTP receiver when sending RTCP. Every endpoint has at least 357 one SSRC identifier, even if it does not send RTP packets. RTP 358 endpoints that are only RTP receivers still send RTCP and use their 359 SSRC identifiers in the RTCP packets they send. An endpoint can have 360 multiple SSRC identifiers if it sends multiple RTP streams. 361 Endpoints that are both RTP sender and RTP receiver use the same SSRC 362 in both roles. 364 The SSRC is a 32-bit identifier. It is present in every RTP and RTCP 365 packet header, and in the payload of some RTCP packet types. It can 366 also be present in SDP signalling. Unless pre-signalled, e.g. using 367 the SDP "a=ssrc:" attribute [RFC5576], the SSRC is chosen at random. 368 It is not dependent on the network address of the endpoint, and is 369 intended to be unique within an RTP session. SSRC collisions can 370 occur, and are handled as specified in [RFC3550] and [RFC5576], 371 resulting in the SSRC of the colliding RTP streams or receivers 372 changing. An endpoint that changes its network transport address 373 during a session has to choose a new SSRC identifier to avoid being 374 interpreted as looped source, unless a mechanism providing a virtual 375 transport (such as ICE [RFC8445]) abstracts the changes. 377 SSRC identifiers that belong to the same synchronisation context 378 (i.e., that represent RTP streams that can be synchronised using 379 information in RTCP SR packets) use identical CNAME chunks in 380 corresponding RTCP SDES packets. SDP signalling can also be used to 381 provide explicit SSRC grouping [RFC5576]. 383 In some cases, the same SSRC identifier value is used to relate 384 streams in two different RTP sessions, such as in RTP retransmission 385 [RFC4588]. This is to be avoided since there is no guarantee that 386 SSRC values are unique across RTP sessions. For the RTP 387 retransmission [RFC4588] case it is recommended to use explicit 388 binding of the source RTP stream and the redundancy stream, e.g. 389 using the RepairedRtpStreamId RTCP SDES item [I-D.ietf-avtext-rid]. 391 Note that RTP sequence number and RTP timestamp are scoped by the 392 SSRC and thus specific per RTP stream. 394 Different types of entities use an SSRC to identify themselves, as 395 follows: 397 A real media source: Uses the SSRC to identify a "physical" media 398 source. 400 A conceptual media source: Uses the SSRC to identify the result of 401 applying some filtering function in a network node, for example a 402 filtering function in an RTP mixer that provides the most active 403 speaker based on some criteria, or a mix representing a set of 404 other sources. 406 An RTP receiver: Uses the SSRC to identify itself as the source of 407 its RTCP reports. 409 An endpoint that generates more than one media type, e.g. a 410 conference participant sending both audio and video, need not (and, 411 indeed, should not) use the same SSRC value across RTP sessions. 413 RTCP compound packets containing the CNAME SDES item is the 414 designated method to bind an SSRC to a CNAME, effectively cross- 415 correlating SSRCs within and between RTP Sessions as coming from the 416 same endpoint. The main property attributed to SSRCs associated with 417 the same CNAME is that they are from a particular synchronisation 418 context and can be synchronised at playback. 420 An RTP receiver receiving a previously unseen SSRC value will 421 interpret it as a new source. It might in fact be a previously 422 existing source that had to change SSRC number due to an SSRC 423 conflict. Use of the MID extension 424 [I-D.ietf-mmusic-sdp-bundle-negotiation] helps to identify which 425 media source the apparently new source belongs to and use of the RID 426 extension [I-D.ietf-mmusic-rid] helps to identify what encoding or 427 redundancy stream it represents, even though the SSRC changed. 428 However, the originator of the previous SSRC ought to have ended the 429 conflicting source by sending an RTCP BYE for it prior to starting to 430 send with the new SSRC, making the new SSRC a new source. 432 3.2.3. Contributing Source (CSRC) 434 The Contributing Source (CSRC) is not a separate identifier. Rather 435 an SSRC identifier is listed as a CSRC in the RTP header of a packet 436 generated by an RTP mixer or video MCU/switch, if the corresponding 437 SSRC was in the header of one of the packets that contributed to the 438 output. 440 It is not possible, in general, to extract media represented by an 441 individual CSRC since it is typically the result of a media merge 442 (e.g. mix) operation on the individual media streams corresponding to 443 the CSRC identifiers. The exception is the case when only a single 444 CSRC is indicated as this represent forwarding of an RTP stream, 445 possibly modified. The RTP header extension for Mixer-to-Client 446 Audio Level Indication [RFC6465] expands on the receiver's 447 information about a packet with a CSRC list. Due to these 448 restrictions, CSRC will not be considered a fully qualified 449 multiplexing point and will be disregarded in the rest of this 450 document. 452 3.2.4. RTP Payload Type 454 Each RTP stream utilises one or more RTP payload formats. An RTP 455 payload format describes how the output of a particular media codec 456 is framed and encoded into RTP packets. The payload format is 457 identified by the payload type (PT) field in the RTP packet header. 458 The combination of SSRC and PT therefore identifies a specific RTP 459 stream in a specific encoding format. The format definition can be 460 taken from [RFC3551] for statically allocated payload types, but 461 ought to be explicitly defined in signalling, such as SDP, both for 462 static and dynamic payload types. The term "format" here includes 463 those aspects described by out-of-band signalling means; in SDP, the 464 term "format" includes media type, RTP timestamp sampling rate, 465 codec, codec configuration, payload format configurations, and 466 various robustness mechanisms such as redundant encodings [RFC2198]. 468 The RTP payload type is scoped by the sending endpoint within an RTP 469 session. PT has the same meaning across all RTP streams in an RTP 470 session. All SSRCs sent from a single endpoint share the same 471 payload type definitions. The RTP payload type is designed such that 472 only a single payload type is valid at any time instant in the RTP 473 stream's timestamp time line, effectively time-multiplexing different 474 payload types if any change occurs. The payload type can change on a 475 per-packet basis for an SSRC, for example a speech codec making use 476 of generic comfort noise [RFC3389]. If there is a true need to send 477 multiple payload types for the same SSRC that are valid for the same 478 instant, then redundant encodings [RFC2198] can be used. Several 479 additional constraints than the ones mentioned above need to be met 480 to enable this use, one of which is that the combined payload sizes 481 of the different payload types ought not exceed the transport MTU. 483 Other aspects of RTP payload format use are described in How to Write 484 an RTP Payload Format [RFC8088]. 486 The payload type is not a multiplexing point at the RTP layer (see 487 Appendix A for a detailed discussion of why using the payload type as 488 an RTP multiplexing point does not work). The RTP payload type is, 489 however, used to determine how to consume and decode an RTP stream. 490 The RTP payload type number is sometimes used to associate an RTP 491 stream with the signalling, which in general requires that unique RTP 492 payload type numbers are used in each context. Use of MID, e.g. when 493 bundling "m=" sections [I-D.ietf-mmusic-sdp-bundle-negotiation], can 494 replace the payload type as signalling association and unique RTP 495 payload types are then no longer required for that purpose. 497 3.3. Issues Related to RTP Topologies 499 The impact of how RTP multiplexing is performed will in general vary 500 with how the RTP session participants are interconnected, described 501 by RTP Topology [RFC7667]. 503 Even the most basic use case, denoted Topo-Point-to-Point in 504 [RFC7667], raises a number of considerations that are discussed in 505 detail in following sections. They range over such aspects as: 507 o Does my communication peer support RTP as defined with multiple 508 SSRCs per RTP session? 510 o Do I need network differentiation in form of QoS ( Section 4.2.1)? 512 o Can the application more easily process and handle the media 513 streams if they are in different RTP sessions? 515 o Do I need to use additional RTP streams for RTP retransmission or 516 FEC? 518 For some point to multi-point topologies (e.g. Topo-ASM and Topo-SSM 519 in [RFC7667]), multicast is used to interconnect the session 520 participants. Special considerations (documented in Section 4.2.3) 521 are then needed as multicast is a one-to-many distribution system. 523 Sometimes an RTP communication can end up in a situation when the 524 communicating peers are not compatible for various reasons: 526 o No common media codec for a media type thus requiring transcoding. 528 o Different support for multiple RTP streams and RTP sessions. 530 o Usage of different media transport protocols, i.e., RTP or other. 532 o Usage of different transport protocols, e.g., UDP, DCCP, or TCP. 534 o Different security solutions, e.g., IPsec, TLS, DTLS, or SRTP with 535 different keying mechanisms. 537 In many situations this is resolved by the inclusion of a translator 538 between the two peers, as described by Topo-PtP-Translator in 539 [RFC7667]. The translator's main purpose is to make the peers look 540 compatible to each other. There can also be other reasons than 541 compatibility to insert a translator in the form of a middlebox or 542 gateway, for example a need to monitor the RTP streams. Beware that 543 changing the stream transport characteristics in the translator, can 544 require thorough understanding of the application logic, specifically 545 any congestion control or media adaptation to ensure appropriate 546 media handling. 548 Within the uses enabled by the RTP standard the point to point 549 topology can contain one to many RTP sessions with one to many media 550 sources per session, each having one or more RTP streams per media 551 source. 553 3.4. Issues Related to RTP and RTCP Protocol 555 Using multiple RTP streams is a well-supported feature of RTP. 556 However, for most implementers or people writing RTP/RTCP 557 applications or extensions attempting to apply multiple streams, it 558 can be unclear when it is most appropriate to add an additional RTP 559 stream in an existing RTP session and when it is better to use 560 multiple RTP sessions. This section discusses the various 561 considerations needed. 563 3.4.1. The RTP Specification 565 RFC 3550 contains some recommendations and a bullet list with 5 566 arguments for different aspects of RTP multiplexing. Please review 567 Section 5.2 of [RFC3550]. Five important aspects are quoted below. 569 1. If, say, two audio streams shared the same RTP session and the 570 same SSRC value, and one were to change encodings and thus acquire 571 a different RTP payload type, there would be no general way of 572 identifying which stream had changed encodings. 574 The first argument is to use different SSRC for each individual RTP 575 stream, which is fundamental to RTP operation. 577 2. An SSRC is defined to identify a single timing and sequence number 578 space. Interleaving multiple payload types would require 579 different timing spaces if the media clock rates differ and would 580 require different sequence number spaces to tell which payload 581 type suffered packet loss. 583 The second argument is advocating against demultiplexing RTP streams 584 within a session based only on their RTP payload type numbers, which 585 still stands as can been seen by the extensive list of issues found 586 in Appendix A. 588 3. The RTCP sender and receiver reports (see Section 6.4) can only 589 describe one timing and sequence number space per SSRC and do not 590 carry a payload type field. 592 The third argument is yet another argument against payload type 593 multiplexing. 595 4. An RTP mixer would not be able to combine interleaved streams of 596 incompatible media into one stream. 598 The fourth argument is against multiplexing RTP packets that require 599 different handling into the same session. In most cases the RTP 600 mixer must embed application logic to handle streams; the separation 601 of streams according to stream type is just another piece of 602 application logic, which might or might not be appropriate for a 603 particular application. One type of application that can mix 604 different media sources blindly is the audio-only telephone bridge, 605 although the ability to do that comes from the well-defined scenario 606 that is aided by use of a single media type, even though individual 607 streams may use incompatible codec types; most other types of 608 applications need application-specific logic to perform the mix 609 correctly. 611 5. Carrying multiple media in one RTP session precludes: the use of 612 different network paths or network resource allocations if 613 appropriate; reception of a subset of the media if desired, for 614 example just audio if video would exceed the available bandwidth; 615 and receiver implementations that use separate processes for the 616 different media, whereas using separate RTP sessions permits 617 either single- or multiple-process implementations. 619 The fifth argument discusses network aspects that are described in 620 Section 4.2. It also goes into aspects of implementation, like Split 621 Component Terminal (see Section 3.10 of [RFC7667]) endpoints where 622 different processes or inter-connected devices handle different 623 aspects of the whole multi-media session. 625 A summary of RFC 3550's view on multiplexing is to use unique SSRCs 626 for anything that is its own media/packet stream, and to use 627 different RTP sessions for media streams that don't share a media 628 type. This document supports the first point; it is very valid. The 629 latter needs further discussion, as imposing a single solution on all 630 usages of RTP is inappropriate. Multiple Media Types in an RTP 631 Session specification [I-D.ietf-avtcore-multi-media-rtp-session] 632 provides a detailed analysis of the potential issues in having 633 multiple media types in the same RTP session. This document provides 634 a wider scope for an RTP session and considers multiple media types 635 in one RTP session as a possible choice for the RTP application 636 designer. 638 3.4.2. Multiple SSRCs in a Session 640 Using multiple SSRCs at one endpoint in an RTP session requires 641 resolving some unclear aspects of the RTP specification. These could 642 potentially lead to some interoperability issues as well as some 643 potential significant inefficiencies, as further discussed in "RTP 644 Considerations for Endpoints Sending Multiple Media Streams" 645 [RFC8108]. An RTP application designer should consider these issues 646 and the possible application impact from lack of appropriate RTP 647 handling or optimization in the peer endpoints. 649 Using multiple RTP sessions can potentially mitigate application 650 issues caused by multiple SSRCs in an RTP session. 652 3.4.3. Binding Related Sources 654 A common problem in a number of various RTP extensions has been how 655 to bind related RTP streams together. This issue is common to both 656 using additional SSRCs and multiple RTP sessions. 658 The solutions can be divided into a few groups: 660 o RTP/RTCP based 662 o Signalling based (SDP) 664 o Grouping related RTP sessions 666 o Grouping SSRCs within an RTP session 668 Most solutions are explicit, but some implicit methods have also been 669 applied to the problem. 671 The SDP-based signalling solutions are: 673 SDP Media Description Grouping: The SDP Grouping Framework [RFC5888] 674 uses various semantics to group any number of media descriptions. 675 This has primarily been grouping RTP sessions, but in combination 676 with [I-D.ietf-mmusic-sdp-bundle-negotiation] it can also group 677 multiple media descriptions within a single RTP session. 679 SDP Media Multiplexing: Negotiating Media Multiplexing Using the 680 Session Description Protocol (SDP) 681 [I-D.ietf-mmusic-sdp-bundle-negotiation] 682 uses both SDP and RTCP information to associate RTP streams to SDP 683 media descriptions. This allows both to group RTP streams 684 belonging to an SDP media description, and to group multiple SDP 685 media descriptions into a single RTP session. 687 SDP SSRC grouping: Source-Specific Media Attributes in SDP [RFC5576] 688 includes a solution for grouping SSRCs the same way as the 689 Grouping framework groups Media Descriptions. 691 The above grouping constructs support many use cases. Those 692 solutions have shortcomings in cases where the session's dynamic 693 properties are such that it is difficult or a drain on resources to 694 keep the list of related SSRCs up to date. 696 An RTP/RTCP-based grouping solution is to use the RTCP SDES CNAME to 697 bind related RTP streams to an endpoint or to a synchronization 698 context. For applications with a single RTP stream per type (media, 699 source or redundancy stream), CNAME is sufficient for that purpose 700 independent if one or more RTP sessions are used. However, some 701 applications choose not to use CNAME because of perceived complexity 702 or a desire not to implement RTCP and instead use the same SSRC value 703 to bind related RTP streams across multiple RTP sessions. RTP 704 Retransmission [RFC4588] in multiple RTP session mode and Generic FEC 705 [RFC5109] both use the CNAME method to relate the RTP streams, which 706 may work but might have some downsides in RTP sessions with many 707 participating SSRCs. It is not recommended to use identical SSRC 708 values across RTP sessions to relate RTP streams; When an SSRC 709 collision occurs, this will force change of that SSRC in all RTP 710 sessions and thus resynchronize all of them instead of only the 711 single media stream having the collision. 713 Another method to implicitly bind SSRCs is used by RTP Retransmission 714 [RFC4588] when using the same RTP session as the source RTP stream 715 for retransmissions. The receiver missing a packet issues an RTP 716 retransmission request, and then awaits a new SSRC carrying the RTP 717 retransmission payload and where that SSRC is from the same CNAME. 718 This limits a requester to having only one outstanding retransmission 719 request on any new source SSRCs per endpoint. 721 RTP Payload Format Restrictions [I-D.ietf-mmusic-rid] provides an 722 RTP/RTCP based mechanism to unambiguously identify the RTP streams 723 within an RTP session and restrict the streams' payload format 724 parameters in a codec-agnostic way beyond what is provided with the 725 regular payload types. The mapping is done by specifying an "a=rid" 726 value in the SDP offer/answer signalling and having the corresponding 727 RtpStreamId value as an SDES item and an RTP header extension. The 728 RID solution also includes a solution for binding redundancy RTP 729 streams to their original source RTP streams, given that those use 730 RID identifiers. 732 Section 8.3 of the RTP Specification [RFC3550] recommends using a 733 single SSRC space across all RTP sessions for layered coding. Based 734 on the experience so far however, we recommend to use a solution with 735 explicit binding between the RTP streams that is agnostic to the used 736 SSRC values. That way, solutions using multiple RTP streams in a 737 single RTP session and in multiple RTP sessions will use the same 738 type of binding. 740 3.4.4. Forward Error Correction 742 There exist a number of Forward Error Correction (FEC) based schemes 743 for how to reduce the packet loss of the original streams. Most of 744 the FEC schemes protects a single source flow. The protection is 745 achieved by transmitting a certain amount of redundant information 746 that is encoded such that it can repair one or more packet losses 747 over the set of packets the redundant information protects. This 748 sequence of redundant information needs to be transmitted as its own 749 media stream, or in some cases, instead of the original media stream. 750 Thus, many of these schemes create a need for binding related flows 751 as discussed above. Looking at the history of these schemes, there 752 are schemes using multiple SSRCs and schemes using multiple RTP 753 sessions, and some schemes that support both modes of operation. 755 Using multiple RTP sessions supports the case where some set of 756 receivers might not be able to utilise the FEC information. By 757 placing it in a separate RTP session and if separating RTP sessions 758 on transport level, FEC can easily be ignored already on transport 759 level, without considering any RTP layer information. 761 In usages involving multicast, having the FEC information on its own 762 multicast group allows for similar flexibility. This is especially 763 useful when receivers see heterogeneous packet loss rates. A 764 receiver can based on measurment of experienced packet loss decide to 765 join a multicast group with the suitable FEC data repair 766 capabilities. 768 4. Considerations for RTP Multiplexing 770 4.1. Interworking Considerations 772 There are several different kinds of interworking, and this section 773 discusses two; interworking directly between different applications, 774 and interworking of applications through an RTP Translator. The 775 discussion includes the implications of potentially different RTP 776 multiplexing point choices and limitations that have to be considered 777 when working with some legacy applications. 779 4.1.1. Application Interworking 781 It is not uncommon that applications or services of similar but not 782 identical usage, especially the ones intended for interactive 783 communication, encounter a situation where one want to interconnect 784 two or more of these applications. 786 In these cases, one ends up in a situation where one might use a 787 gateway to interconnect applications. This gateway must then either 788 change the multiplexing structure or adhere to the respective 789 limitations in each application. 791 There are two fundamental approaches to building a gateway: using RTP 792 Translator interworking (RTP bridging), where the gateway acts as an 793 RTP Translator with the two interconnected applications being members 794 of the same RTP session; or using Gateway Interworking with RTP 795 termination, where there are independent RTP sessions between each 796 interconnected application and the gateway. 798 4.1.2. RTP Translator Interworking 800 From an RTP perspective, the RTP Translator approach could work if 801 all the applications are using the same codecs with the same payload 802 types, have made the same multiplexing choices, and have the same 803 capabilities in number of simultaneous RTP streams combined with the 804 same set of RTP/RTCP extensions being supported. Unfortunately, this 805 might not always be true. 807 When a gateway is implemented via an RTP Translator, an important 808 consideration is if the two applications being interconnected need to 809 use the same approach to multiplexing. If one side is using RTP 810 session multiplexing and the other is using SSRC multiplexing with 811 BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation], it is possible for 812 the RTP translator to map the RTP streams between both sides using 813 some method, e.g. if the number and order of SDP "m=" lines between 814 both sides are the same. There are also challenges with SSRC 815 collision handling since, unless SSRC translation is applied on the 816 RTP translator, there may be a collision on the SSRC multiplexing 817 side that the RTP session multiplexing side will not be aware of. 818 Furthermore, if one of the applications is capable of working in 819 several modes (such as being able to use additional RTP streams in 820 one RTP session or multiple RTP sessions at will), and the other one 821 is not, successful interconnection depends on locking the more 822 flexible application into the operating mode where interconnection 823 can be successful, even if none of the participants are using the 824 less flexible application when the RTP sessions are being created. 826 4.1.3. Gateway Interworking 828 When one terminates RTP sessions at the gateway, there are certain 829 tasks that the gateway has to carry out: 831 o Generating appropriate RTCP reports for all RTP streams (possibly 832 based on incoming RTCP reports), originating from SSRCs controlled 833 by the gateway. 835 o Handling SSRC collision resolution in each application's RTP 836 sessions. 838 o Signalling, choosing and policing appropriate bit-rates for each 839 session. 841 For applications that use any security mechanism, e.g., in the form 842 of SRTP, the gateway needs to be able to decrypt and verify source 843 integrity of the incoming packets, and re-encrypt, integrity protect, 844 and sign the packets as peer in the other application's security 845 context. This is necessary even if all that's needed is a simple 846 remapping of SSRC numbers. If this is done, the gateway also needs 847 to be a member of the security contexts of both sides, of course. 849 The gateway might also need to apply transcoding (for incompatible 850 codec types), media-level adaptations that cannot be solved through 851 media negotiation (such as rescaling for incompatible video size 852 requirements), suppression of content that is known not to be handled 853 in the destination application, or the addition or removal of 854 redundancy coding or scalability layers to fit the needs of the 855 destination domain. 857 From the above, we can see that the gateway needs to have an intimate 858 knowledge of the application requirements; a gateway is by its nature 859 application specific, not a commodity product. 861 These gateways might therefore potentially block application 862 evolution by blocking RTP and RTCP extensions that the applications 863 have been extended with but that are unknown to the gateway. 865 If one uses security functions, like SRTP, and as can be seen from 866 above, they incur both additional risk due to the requirement to have 867 the gateway in the security association between the endpoints (unless 868 the gateway is on the transport level), and additional complexities 869 in form of the decrypt-encrypt cycles needed for each forwarded 870 packet. SRTP, due to its keying structure, also requires that each 871 RTP session needs different master keys, as use of the same key in 872 two RTP sessions can for some ciphers result in two-time pads that 873 completely breaks the confidentiality of the packets. 875 4.1.4. Multiple SSRC Legacy Considerations 877 Historically, the most common RTP use cases have been point-to-point 878 Voice over IP (VoIP) or streaming applications, commonly with no more 879 than one media source per endpoint and media type (typically audio or 880 video). Even in conferencing applications, especially voice-only, 881 the conference focus or bridge has provided a single stream to each 882 participant containing a mix of the other participants. It is also 883 common to have individual RTP sessions between each endpoint and the 884 RTP mixer, meaning that the mixer functions as an RTP-terminating 885 gateway. 887 Endpoints that aren't updated to handle multiple streams following 888 these recommendations can have issues with participating in RTP 889 sessions containing multiple SSRCs within a single session, such as: 891 1. Need to handle more than one stream simultaneously rather than 892 replacing an already existing stream with a new one. 894 2. Be capable of decoding multiple streams simultaneously. 896 3. Be capable of rendering multiple streams simultaneously. 898 This indicates that gateways attempting to interconnect to this class 899 of devices have to make sure that only one RTP stream of each media 900 type gets delivered to the endpoint if it's expecting only one, and 901 that the multiplexing format is what the device expects. It is 902 highly unlikely that RTP translator-based interworking can be made to 903 function successfully in such a context. 905 4.2. Network Considerations 907 The RTP implementer needs to consider that the RTP multiplexing 908 choice also impacts network level mechanisms. 910 4.2.1. Quality of Service 912 Quality of Service mechanisms are either flow based or packet marking 913 based. RSVP [RFC2205] is an example of a flow based mechanism, while 914 Diff-Serv [RFC2474] is an example of a packet marking based one. 916 For a flow based scheme, additional SSRC will receive the same QoS as 917 all other RTP streams being part of the same 5-tuple (protocol, 918 source address, destination address, source port, destination port), 919 which is the most common selector for flow based QoS. 921 For a packet marking based scheme, the method of multiplexing will 922 not affect the possibility to use QoS. Different Differentiated 923 Services Code Points (DSCP) can be assigned to different packets 924 within a flow as well as within an RTP stream. However, care must be 925 taken when considering which forwarding behaviours that are applied 926 on path due to these DSCPs. In some cases the forwarding behaviour 927 can result in packet reordering. For more discussion of this see 928 [RFC7657]. 930 The method for assigning marking to packets can impact what number of 931 RTP sessions to choose. If this marking is done using a network 932 ingress function, it can have issues discriminating the different RTP 933 streams. The network API on the endpoint also needs to be capable of 934 setting the marking on a per-packet basis to reach the full 935 functionality. 937 4.2.2. NAT and Firewall Traversal 939 In today's networks there exist a large number of middleboxes. The 940 ones that normally have most impact on RTP are Network Address 941 Translators (NAT) and Firewalls (FW). 943 Below we analyse and comment on the impact of requiring more 944 underlying transport flows in the presence of NATs and Firewalls: 946 End-Point Port Consumption: A given IP address only has 65536 947 available local ports per transport protocol for all consumers of 948 ports that exist on the machine. This is normally never an issue 949 for an end-user machine. It can become an issue for servers that 950 handle large number of simultaneous streams. However, if the 951 application uses ICE to authenticate STUN requests, a server can 952 serve multiple endpoints from the same local port, and use the 953 whole 5-tuple (source and destination address, source and 954 destination port, protocol) as identifier of flows after having 955 securely bound them to the remote endpoint address using the STUN 956 request. In theory, the minimum number of media server ports 957 needed are the maximum number of simultaneous RTP sessions a 958 single endpoint can use. In practice, implementation will 959 probably benefit from using more server ports to simplify 960 implementation or avoid performance bottlenecks. 962 NAT State: If an endpoint sits behind a NAT, each flow it generates 963 to an external address will result in a state that has to be kept 964 in the NAT. That state is a limited resource. In home or Small 965 Office/Home Office (SOHO) NATs, memory or processing are usually 966 the most limited resources. For large scale NATs serving many 967 internal endpoints, available external ports are likely the scarce 968 resource. Port limitations is primarily a problem for larger 969 centralised NATs where endpoint independent mapping requires each 970 flow to use one port for the external IP address. This affects 971 the maximum number of internal users per external IP address. 972 However, as a comparison, a real-time video conference session 973 with audio and video likely uses less than 10 UDP flows, compared 974 to certain web applications that can use 100+ TCP flows to various 975 servers from a single browser instance. 977 NAT Traversal Extra Delay: Performing the NAT/FW traversal takes a 978 certain amount of time for each flow. It also takes time in a 979 phase of communication between accepting to communicate and the 980 media path being established, which is fairly critical. The best 981 case scenario for additional NAT/FW traversal time after finding 982 the first valid candidate pair following the specified ICE 983 procedures is 1.5*RTT + Ta*(Additional_Flows-1), where Ta is the 984 pacing timer. That assumes a message in one direction, 985 immediately followed by a check back. The reason it isn't more, 986 is that ICE first finds one candidate pair that works prior to 987 attempting to establish multiple flows. Thus, there is no extra 988 time until one has found a working candidate pair. Based on that 989 working pair, the extra time is needed to in parallel establish 990 the, in most cases 2-3, additional flows. However, packet loss 991 causes extra delays, at least 100 ms, which is the minimal 992 retransmission timer for ICE. 994 NAT Traversal Failure Rate: Due to the need to establish more than a 995 single flow through the NAT, there is some risk that establishing 996 the first flow succeeds but that one or more of the additional 997 flows fail. The risk that this happens is hard to quantify, but 998 ought to be fairly low as one flow from the same interfaces has 999 just been successfully established. Thus only rare events such as 1000 NAT resource overload, or selecting particular port numbers that 1001 are filtered etc., ought to be reasons for failure. 1003 Deep Packet Inspection and Multiple Streams: Firewalls differ in how 1004 deeply they inspect packets. There exist some risk that deeply 1005 inspecting firewalls will have similar legacy issues with multiple 1006 SSRCs as some RTP stack implementations. 1008 Using additional RTP streams in the same RTP session and transport 1009 flow does not introduce any additional NAT traversal complexities per 1010 RTP stream. This can be compared with normally one or two additional 1011 transport flows per RTP session when using multiple RTP sessions. 1012 Additional lower layer transport flows will be needed, unless an 1013 explicit de-multiplexing layer is added between RTP and the transport 1014 protocol. At time of writing no such mechanism was defined. 1016 4.2.3. Multicast 1018 Multicast groups provides a powerful tool for a number of real-time 1019 applications, especially the ones that desire broadcast-like 1020 behaviours with one endpoint transmitting to a large number of 1021 receivers, like in IPTV. There is also the RTP/RTCP extension to 1022 better support Source Specific Multicast (SSM) [RFC5760]. Many-to- 1023 many communication, which RTP [RFC3550] was originally built to 1024 support, has several limitations in common with multicast. 1026 One limitation is that, for any group, sender side adaptation with 1027 the intent to suit all receivers would have to adapt to the most 1028 limited receiver experiencing the worst conditions among the group 1029 participants, which imposes degradation for all participants. For 1030 broadcast-type applications with a large number of receivers, this is 1031 not acceptable. Instead, various receiver-based solutions are 1032 employed to ensure that the receivers achieve best possible 1033 performance. By using scalable encoding and placing each scalability 1034 layer in a different multicast group, the receiver can control the 1035 amount of traffic it receives. To have each scalability layer on a 1036 different multicast group, one RTP session per multicast group is 1037 used. 1039 In addition, the transport flow considerations in multicast are a bit 1040 different from unicast; NATs with port translation are not useful in 1041 the multicast environment, meaning that the entire port range of each 1042 multicast address is available for distinguishing between RTP 1043 sessions. 1045 Thus, when using broadcast applications it appears easiest and most 1046 straightforward to use multiple RTP sessions for sending different 1047 media flows used for adapting to network conditions. It is also 1048 common that streams improving transport robustness are sent in their 1049 own multicast group to allow for interworking with legacy or to 1050 support different levels of protection. 1052 Many-to-many applications have different needs and the most 1053 appropriate multiplexing choice will depend on how the actual 1054 application is realized. Multicast applications that are capable of 1055 using sender side congestion control can avoid the use of multiple 1056 multicast sessions and RTP sessions that result from use of receiver 1057 side congestion control. 1059 The properties of a broadcast application using RTP multicast: 1061 1. Uses a group of RTP sessions, not just one. Each endpoint will 1062 need to be a member of a number of RTP sessions in order to 1063 perform well. 1065 2. Within each RTP session, the number of RTP receivers is likely to 1066 be much larger than the number of RTP senders. 1068 3. The applications need signalling functions to identify the 1069 relationships between RTP sessions. 1071 4. The applications need signalling or RTP/RTCP functions to 1072 identify the relationships between SSRCs in different RTP 1073 sessions when needs beyond CNAME exist. 1075 Both broadcast and many-to-many multicast applications share a 1076 signalling requirement; all of the participants need the same RTP and 1077 payload type configuration. Otherwise, A could for example be using 1078 payload type 97 as the video codec H.264 while B thinks it is MPEG-2. 1079 SDP offer/answer [RFC3264] is not appropriate for ensuring this 1080 property in broadcast/multicast context. The signalling aspects of 1081 broadcast/multicast are not explored further in this memo. 1083 Security solutions for this type of group communication are also 1084 challenging. First, the key-management and the security protocol 1085 need to support group communication. Second, source authentication 1086 requires special solutions. For more discussion on this please 1087 review Options for Securing RTP Sessions [RFC7201]. 1089 4.3. Security and Key Management Considerations 1091 When dealing with point-to-point, 2-member RTP sessions only, there 1092 are few security issues that are relevant to the choice of having one 1093 RTP session or multiple RTP sessions. However, there are a few 1094 aspects of multiparty sessions that might warrant consideration. For 1095 general information of possible methods of securing RTP, please 1096 review RTP Security Options [RFC7201]. 1098 4.3.1. Security Context Scope 1100 When using SRTP [RFC3711], the security context scope is important 1101 and can be a necessary differentiation in some applications. As 1102 SRTP's crypto suites are (so far) built around symmetric keys, the 1103 receiver will need to have the same key as the sender. This results 1104 in that no one in a multi-party session can be certain that a 1105 received packet really was sent by the claimed sender and not by 1106 another party having access to the key. The single SRTP algorithm 1107 not having this propery is the TESLA source authentication [RFC4383]. 1108 However, TESLA adds delay to achieve source authentication. In most 1109 cases, symmetric ciphers provide sufficient security properties but 1110 create issues in a few cases. 1112 The first case is when someone leaves a multi-party session and one 1113 wants to ensure that the party that left can no longer access the RTP 1114 streams. This requires that everyone re-keys without disclosing the 1115 new keys to the excluded party. 1117 A second case is when using security as an enforcing mechanism for 1118 stream access differentiation between different receivers. Take for 1119 example a scalable layer or a high quality simulcast version that 1120 only premium users are allowed to access. The mechanism preventing a 1121 receiver from getting the high quality stream can be based on the 1122 stream being encrypted with a key that user can't access without 1123 paying premium, using the key-management to limit access to the key. 1125 SRTP [RFC3711] has no special functions for dealing with different 1126 sets of master keys for different SSRCs. The key-management 1127 functions have different capabilities to establish different sets of 1128 keys, normally on a per-endpoint basis. For example, DTLS-SRTP 1129 [RFC5764] and Security Descriptions [RFC4568] establish different 1130 keys for outgoing and incoming traffic from an endpoint. This key 1131 usage has to be written into the cryptographic context, possibly 1132 associated with different SSRCs. 1134 4.3.2. Key Management for Multi-party Sessions 1136 The capabilities of the key-management combined with the RTP 1137 multiplexing choices affects the resulting security properties, 1138 control over the secured media, and who have access to it. 1140 Multi-party sessions contain at least one RTP stream from each active 1141 participant. Depending on the multi-party topology [RFC7667], each 1142 participant can both send and receive multiple RTP streams. 1143 Transport translator-based sessions and multicast sessions, can 1144 neither use Security Description [RFC4568] nor DTLS-SRTP [RFC5764] 1145 without an extension as each endpoint provides its set of keys. In 1146 centralised conferences, the signalling counterpart is a conference 1147 server, and the transport translator is the media plane unicast 1148 counterpart (to which DTLS messages would be sent). Thus, an 1149 extension like Encrypted Key Transport [I-D.ietf-perc-srtp-ekt-diet] 1150 or a MIKEY [RFC3830] based solution that allows for keying all 1151 session participants with the same master key is needed. 1153 Privacy Enchanced RTP Conferencing (PERC) also enables a different 1154 trust model with semi-trusted media switching RTP middleboxes 1155 [I-D.ietf-perc-private-media-framework]. 1157 4.3.3. Complexity Implications 1159 The usage of security functions can surface complexity implications 1160 from the choice of multiplexing and topology. This becomes 1161 especially evident in RTP topologies having any type of middlebox 1162 that processes or modifies RTP/RTCP packets. While there is very 1163 small overhead for an RTP translator or mixer to rewrite an SSRC 1164 value in the RTP packet of an unencrypted session, the cost is higher 1165 when using cryptographic security functions. For example, if using 1166 SRTP [RFC3711], the actual security context and exact crypto key are 1167 determined by the SSRC field value. If one changes SSRC, the 1168 encryption and authentication must use another key. Thus, changing 1169 the SSRC value implies a decryption using the old SSRC and its 1170 security context, followed by an encryption using the new one. 1172 5. RTP Multiplexing Design Choices 1174 This section discusses how some RTP multiplexing design choices can 1175 be used in applications to achieve certain goals, and a summary of 1176 the implications of such choices. For each design there is 1177 discussion of benefits and downsides. 1179 5.1. Multiple Media Types in One Session 1181 This design uses a single RTP session for multiple different media 1182 types, like audio and video, and possibly also transport robustness 1183 mechanisms like FEC or retransmission. An endpoint can send zero, 1184 one or more media sources per media type, resulting in a number of 1185 RTP streams of various media types for both source and redundancy 1186 streams. 1188 The Advantages: 1190 1. Only a single RTP session is used, which implies: 1192 * Minimal need to keep NAT/FW state. 1194 * Minimal NAT/FW-traversal cost. 1196 * Fate-sharing for all media flows. 1198 * Minimal overhead for security association establishment. 1200 2. Dynamic allocation of RTP streams can be handled almost entirely 1201 at RTP level. How localized this can be kept to RTP level 1202 depends on the application's needs for explicit indication of the 1203 stream usage and how timely that can be signalled. 1205 The Disadvantages: 1207 a. It is less suitable for interworking with other applications that 1208 use individual RTP sessions per media type or multiple sessions 1209 for a single media type, due to the risk of SSRC collision and 1210 thus potential need for SSRC translation. 1212 b. Negotiation of individual bandwidths for the different media 1213 types is currently only possible in SDP when using RID 1214 [I-D.ietf-mmusic-rid]. 1216 c. It is not suitable for Split Component Terminal (see Section 3.10 1217 of [RFC7667]). 1219 d. Flow-based QoS cannot be used to provide separate treatment of 1220 RTP streams compared to others in the single RTP session. 1222 e. If there is significant asymmetry between the RTP streams' RTCP 1223 reporting needs, there are some challenges in configuration and 1224 usage to avoid wasting RTCP reporting on the RTP stream that does 1225 not need that frequent reporting. 1227 f. It is not suitable for applications where some receivers like to 1228 receive only a subset of the RTP streams, especially if multicast 1229 or transport translator is being used. 1231 g. There is some additional concern with legacy implementations that 1232 do not support the RTP specification fully when it comes to 1233 handling multiple SSRC per endpoint, as multiple simultaneous 1234 media types are sent as separate SSRC in the same RTP session. 1236 h. If the applications need finer control over which session 1237 participants that are included in different sets of security 1238 associations, most key-management will have difficulties 1239 establishing such a session. 1241 5.2. Multiple SSRCs of the Same Media Type 1243 In this design, each RTP session serves only a single media type. 1244 The RTP session can contain multiple RTP streams, either from a 1245 single endpoint or from multiple endpoints. This commonly creates a 1246 low number of RTP sessions, typically only one for audio and one for 1247 video, with a corresponding need for two listening ports when using 1248 RTP/RTCP multiplexing [RFC5761]. 1250 The Advantages 1252 1. It works well with Split Component Terminal (see Section 3.10 of 1253 [RFC7667]) where the split is per media type. 1255 2. It enables flow-based QoS with different prioritisation between 1256 media types. 1258 3. For applications with dynamic usage of RTP streams, i.e. 1259 frequently added and removed, having much of the state associated 1260 with the RTP session rather than per individual SSRC can avoid 1261 the need for in-session signalling of meta-information about each 1262 SSRC. In the simple cases this allows for unsignalled RTP 1263 streams where session level information and RTCP SDES item (e.g. 1264 CNAME) are suffient. In the more complex cases where more 1265 source-specific metadata needs to be signalled the SSRC can be 1266 associated with an intermediate identifier, e.g. the MID conveyed 1267 as an SDES item as defined in Section 15 of 1268 [I-D.ietf-mmusic-sdp-bundle-negotiation]. 1270 4. There is low overhead for security association establishment. 1272 The Disadvantages 1274 a. There are a slightly higher number of RTP sessions needed 1275 compared to Multiple Media Types in one Session Section 5.1. 1276 This implies: 1278 * More NAT/FW state is needed. 1280 * There is increased NAT/FW-traversal cost in both processing 1281 and delay. 1283 b. There is some potential for concern with legacy implementations 1284 that don't support the RTP specification fully when it comes to 1285 handling multiple SSRC per endpoint. 1287 c. It is not possible to control security association for sets of 1288 RTP streams within the same media type with today's key- 1289 management mechanisms, unless these are split into different RTP 1290 sessions (Section 5.3). 1292 For RTP applications where all RTP streams of the same media type 1293 share same usage, this structure provides efficiency gains in amount 1294 of network state used and provides more fate sharing with other media 1295 flows of the same type. At the same time, it is still maintaining 1296 almost all functionalities for the negotiation signaling of 1297 properties per individual media type, and also enables flow based QoS 1298 prioritisation between media types. It handles multi-party sessions 1299 well, independently of multicast or centralised transport 1300 distribution, as additional sources can dynamically enter and leave 1301 the session. 1303 5.3. Multiple Sessions for One Media Type 1305 This design goes one step further than above (Section 5.2) by using 1306 multiple RTP sessions also for a single media type. The main reason 1307 for going in this direction is that the RTP application needs 1308 separation of the RTP streams due to their usage, such as e.g. 1309 scalability over multicast, simulcast, need for extended QoS 1310 prioritisation, or the need for fine-grained signalling using RTP 1311 session-focused signalling tools. 1313 The Advantages: 1315 1. This is more suitable for multicast usage where receivers can 1316 individually select which RTP sessions they want to participate 1317 in, assuming each RTP session has its own multicast group. 1319 2. The application can indicate its usage of the RTP streams on RTP 1320 session level, in case multiple different usages exist. 1322 3. There is less need for SSRC-specific explicit signalling for each 1323 media stream and thus reduced need for explicit and timely 1324 signalling when RTP streams are added or removed. 1326 4. It enables detailed QoS prioritisation for flow-based mechanisms. 1328 5. It works well with Split Component Terminal (see Section 3.10 of 1329 [RFC7667]). 1331 6. The scope for who is included in a security association can be 1332 structured around the different RTP sessions, thus enabling such 1333 functionality with existing key-management. 1335 The Disadvantages: 1337 a. There is an increased amount of session configuration state 1338 compared to Multiple SSRCs of the Same Media Type, due to the 1339 increased amount of RTP sessions. 1341 b. For RTP streams that are part of scalability, simulcast or 1342 transport robustness, a method to bind sources across multiple 1343 RTP sessions is needed. 1345 c. There is some potential for concern with legacy implementations 1346 that don't support the RTP specification fully when it comes to 1347 handling multiple SSRC per endpoint. 1349 d. There is higher overhead for security association establishment, 1350 due to the increased number of RTP sessions. 1352 e. If the applications need more fine-grained control than per RTP 1353 session over which participants that are included in different 1354 sets of security associations, most of today's key-management 1355 will have difficulties establishing such a session. 1357 For more complex RTP applications that have several different usages 1358 for RTP streams of the same media type, or uses scalability or 1359 simulcast, this solution can enable those functions at the cost of 1360 increased overhead associated with the additional sessions. This 1361 type of structure is suitable for more advanced applications as well 1362 as multicast-based applications requiring differentiation to 1363 different participants. 1365 5.4. Single SSRC per Endpoint 1367 In this design each endpoint in a point-to-point session has only a 1368 single SSRC, thus the RTP session contains only two SSRCs, one local 1369 and one remote. This session can be used both unidirectional, i.e. 1370 only a single RTP stream, or bi-directional, i.e. both endpoints have 1371 one RTP stream each. If the application needs additional media flows 1372 between the endpoints, it will have to establish additional RTP 1373 sessions. 1375 The Advantages: 1377 1. This design has great legacy interoperability potential as it 1378 will not tax any RTP stack implementations. 1380 2. The signalling has good possibilities to negotiate and describe 1381 the exact formats and bitrates for each RTP stream, especially 1382 using today's tools in SDP. 1384 3. It is possible to control security association per RTP stream 1385 with current key-management, since each RTP stream is directly 1386 related to an RTP session, and the most used keying mechanisms 1387 operates on a per-session basis. 1389 The Disadvantages: 1391 a. There is a linear growth of the amount of NAT/FW state with 1392 number of RTP streams. 1394 b. There is increased delay and resource consumption from NAT/FW- 1395 traversal. 1397 c. There are likely larger signalling message and signalling 1398 processing requirements due to the increased amount of session- 1399 related information. 1401 d. There is higher potential for a single RTP stream to fail during 1402 transport between the endpoints, due to the need for separate 1403 NAT/FW- traversal for every RTP stream since there is only one 1404 stream per session. 1406 e. The amount of explicit state for relating RTP streams grows, 1407 depending on how the application relates RTP streams. 1409 f. The port consumption might become a problem for centralised 1410 services, where the central node's port or 5-tuple filter 1411 consumption grows rapidly with the number of sessions. 1413 g. For applications where the RTP stream usage is highly dynamic, 1414 i.e. entering and leaving, the amount of signalling can become 1415 high. Issues can also arise from the need for timely 1416 establishment of additional RTP sessions. 1418 h. If, against the recommendation, the same SSRC value is reused in 1419 multiple RTP sessions rather than being randomly chosen, 1420 interworking with applications that use a different multiplexing 1421 structure will require SSRC translation. 1423 RTP applications with a strong need to interwork with legacy RTP 1424 applications can potentially benefit from this structure. However, a 1425 large number of media descriptions in SDP can also run into issues 1426 with existing implementations. For any application needing a larger 1427 number of media flows, the overhead can become very significant. 1428 This structure is also not suitable for non-mixed multi-party 1429 sessions, as any given RTP stream from each participant, although 1430 having same usage in the application, needs its own RTP session. In 1431 addition, the dynamic behaviour that can arise in multi-party 1432 applications can tax the signalling system and make timely media 1433 establishment more difficult. 1435 5.5. Summary 1437 Both the "Single SSRC per Endpoint" and the "Multiple Media Types in 1438 One Session" are cases that require full explicit signalling of the 1439 media stream relations. However, they operate on two different 1440 levels where the first primarily enables session level binding, and 1441 the second needs SSRC level binding. From another perspective, the 1442 two solutions are the two extreme points when it comes to number of 1443 RTP sessions needed. 1445 The two other designs, "Multiple SSRCs of the Same Media Type" and 1446 "Multiple Sessions for One Media Type", are two examples that 1447 primarily allows for some implicit mapping of the role or usage of 1448 the RTP streams based on which RTP session they appear in. It thus 1449 potentially allows for less signalling and in particular reduces the 1450 need for real-time signalling in sessions with dynamically changing 1451 number of RTP streams. They also represent points in-between the 1452 first two designs when it comes to amount of RTP sessions 1453 established, i.e. representing an attempt to balance the amount of 1454 RTP sessions with the functionality the communication session 1455 provides both on network level and on signalling level. 1457 6. Guidelines 1459 This section contains a number of multi-stream guidelines for 1460 implementers or specification writers. 1462 Do not require use of the same SSRC value across RTP sessions: 1463 As discussed in Section 3.4.3 there exist drawbacks in using the 1464 same SSRC in multiple RTP sessions as a mechanism to bind related 1465 RTP streams together. It is instead recommended to use a 1466 mechanism to explicitly signal the relation, either in RTP/RTCP or 1467 in the signalling mechanism used to establish the RTP session(s). 1469 Use additional RTP streams for additional media sources: In the 1470 cases where an RTP endpoint needs to transmit additional RTP 1471 streams of the same media type in the application, with the same 1472 processing requirements at the network and RTP layers, it is 1473 suggested to send them in the same RTP session. For example a 1474 telepresence room where there are three cameras, and each camera 1475 captures 2 persons sitting at the table, sending each camera as 1476 its own RTP stream within a single RTP session is suggested. 1478 Use additional RTP sessions for streams with different requirements: 1480 When RTP streams have different processing requirements from the 1481 network or the RTP layer at the endpoints, it is suggested that 1482 the different types of streams are put in different RTP sessions. 1483 This includes the case where different participants want different 1484 subsets of the set of RTP streams. 1486 When using multiple RTP sessions, use grouping: When using multiple 1487 RTP session solutions, it is suggested to explicitly group the 1488 involved RTP sessions when needed using a signalling mechanism, 1489 for example The Session Description Protocol (SDP) Grouping 1490 Framework [RFC5888], using some appropriate grouping semantics. 1492 RTP/RTCP Extensions Support Multiple RTP Streams as Well as Multiple 1493 RTP Sessions: 1494 When defining an RTP or RTCP extension, the creator needs to 1495 consider if this extension is applicable to use with additional 1496 SSRCs and multiple RTP sessions. Any extension intended to be 1497 generic must support both. Extensions that are not as generally 1498 applicable will have to consider if interoperability is better 1499 served by defining a single solution or providing both options. 1501 Transport Support Extensions: When defining new RTP/RTCP extensions 1502 intended for transport support, like the retransmission or FEC 1503 mechanisms, they must include support for both multiple RTP 1504 streams in the same RTP session and multiple RTP sessions, such 1505 that application developers can choose freely from the set of 1506 mechanisms without concerning themselves with which of the 1507 multiplexing choices a particular solution supports. 1509 7. IANA Considerations 1511 This document makes no request of IANA. 1513 Note to RFC Editor: this section can be removed on publication as an 1514 RFC. 1516 8. Security Considerations 1518 The security considerations of the RTP specification [RFC3550], any 1519 applicable RTP profile [RFC3551],[RFC4585],[RFC3711], and the 1520 extensions for sending multiple media types in a single RTP session 1521 [I-D.ietf-avtcore-multi-media-rtp-session], RID 1522 [I-D.ietf-mmusic-rid], BUNDLE 1523 [I-D.ietf-mmusic-sdp-bundle-negotiation], [RFC5760], [RFC5761], apply 1524 if selected and thus need to be considered in the evaluation. 1526 There is discussion of the security implications of choosing multiple 1527 SSRC vs multiple RTP sessions in Section 4.3. 1529 9. Contributors 1531 Hui Zheng (Marvin) contributed to WG draft versions -04 and -05 of 1532 the document. 1534 10. Acknowledgments 1536 The Authors like to acknowledge and thank Cullen Jennings, Dale R 1537 Worley, Huang Yihong (Rachel), and Vijay Gurbani for review and 1538 comments. 1540 11. References 1542 11.1. Normative References 1544 [I-D.ietf-avtcore-multi-media-rtp-session] 1545 Westerlund, M., Perkins, C., and J. Lennox, "Sending 1546 Multiple Types of Media in a Single RTP Session", draft- 1547 ietf-avtcore-multi-media-rtp-session-13 (work in 1548 progress), December 2015. 1550 [I-D.ietf-mmusic-rid] 1551 Roach, A., "RTP Payload Format Restrictions", draft-ietf- 1552 mmusic-rid-15 (work in progress), May 2018. 1554 [I-D.ietf-mmusic-sdp-bundle-negotiation] 1555 Holmberg, C., Alvestrand, H., and C. Jennings, 1556 "Negotiating Media Multiplexing Using the Session 1557 Description Protocol (SDP)", draft-ietf-mmusic-sdp-bundle- 1558 negotiation-54 (work in progress), December 2018. 1560 [I-D.ietf-perc-srtp-ekt-diet] 1561 Jennings, C., Mattsson, J., McGrew, D., Wing, D., and F. 1562 Andreasen, "Encrypted Key Transport for DTLS and Secure 1563 RTP", draft-ietf-perc-srtp-ekt-diet-11 (work in progress), 1564 January 2020. 1566 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1567 Jacobson, "RTP: A Transport Protocol for Real-Time 1568 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 1569 July 2003, . 1571 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1572 Video Conferences with Minimal Control", STD 65, RFC 3551, 1573 DOI 10.17487/RFC3551, July 2003, 1574 . 1576 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1577 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1578 RFC 3711, DOI 10.17487/RFC3711, March 2004, 1579 . 1581 [RFC3830] Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K. 1582 Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830, 1583 DOI 10.17487/RFC3830, August 2004, 1584 . 1586 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 1587 "Extended RTP Profile for Real-time Transport Control 1588 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, 1589 DOI 10.17487/RFC4585, July 2006, 1590 . 1592 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 1593 Media Attributes in the Session Description Protocol 1594 (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009, 1595 . 1597 [RFC5760] Ott, J., Chesterfield, J., and E. Schooler, "RTP Control 1598 Protocol (RTCP) Extensions for Single-Source Multicast 1599 Sessions with Unicast Feedback", RFC 5760, 1600 DOI 10.17487/RFC5760, February 2010, 1601 . 1603 [RFC5761] Perkins, C. and M. Westerlund, "Multiplexing RTP Data and 1604 Control Packets on a Single Port", RFC 5761, 1605 DOI 10.17487/RFC5761, April 2010, 1606 . 1608 [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and 1609 B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms 1610 for Real-Time Transport Protocol (RTP) Sources", RFC 7656, 1611 DOI 10.17487/RFC7656, November 2015, 1612 . 1614 [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, 1615 DOI 10.17487/RFC7667, November 2015, 1616 . 1618 11.2. Informative References 1620 [I-D.ietf-avtext-rid] 1621 Roach, A., Nandakumar, S., and P. Thatcher, "RTP Stream 1622 Identifier Source Description (SDES)", draft-ietf-avtext- 1623 rid-09 (work in progress), October 2016. 1625 [I-D.ietf-perc-private-media-framework] 1626 Jones, P., Benham, D., and C. Groves, "A Solution 1627 Framework for Private Media in Privacy Enhanced RTP 1628 Conferencing (PERC)", draft-ietf-perc-private-media- 1629 framework-12 (work in progress), June 2019. 1631 [JINGLE] Ludwig, S., Beda, J., Saint-Andre, P., McQueen, R., Egan, 1632 S., and J. Hildebrand, "XEP-0166: Jingle", XMPP.org 1633 https://xmpp.org/extensions/xep-0166.html, September 2018. 1635 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1636 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 1637 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1638 DOI 10.17487/RFC2198, September 1997, 1639 . 1641 [RFC2205] Braden, R., Ed., Zhang, L., Berson, S., Herzog, S., and S. 1642 Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 1643 Functional Specification", RFC 2205, DOI 10.17487/RFC2205, 1644 September 1997, . 1646 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1647 "Definition of the Differentiated Services Field (DS 1648 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1649 DOI 10.17487/RFC2474, December 1998, 1650 . 1652 [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session 1653 Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974, 1654 October 2000, . 1656 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1657 A., Peterson, J., Sparks, R., Handley, M., and E. 1658 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1659 DOI 10.17487/RFC3261, June 2002, 1660 . 1662 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1663 with Session Description Protocol (SDP)", RFC 3264, 1664 DOI 10.17487/RFC3264, June 2002, 1665 . 1667 [RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for 1668 Comfort Noise (CN)", RFC 3389, DOI 10.17487/RFC3389, 1669 September 2002, . 1671 [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text 1672 Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005, 1673 . 1675 [RFC4383] Baugher, M. and E. Carrara, "The Use of Timed Efficient 1676 Stream Loss-Tolerant Authentication (TESLA) in the Secure 1677 Real-time Transport Protocol (SRTP)", RFC 4383, 1678 DOI 10.17487/RFC4383, February 2006, 1679 . 1681 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1682 Description Protocol", RFC 4566, DOI 10.17487/RFC4566, 1683 July 2006, . 1685 [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session 1686 Description Protocol (SDP) Security Descriptions for Media 1687 Streams", RFC 4568, DOI 10.17487/RFC4568, July 2006, 1688 . 1690 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 1691 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 1692 DOI 10.17487/RFC4588, July 2006, 1693 . 1695 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 1696 "Codec Control Messages in the RTP Audio-Visual Profile 1697 with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, 1698 February 2008, . 1700 [RFC5109] Li, A., Ed., "RTP Payload Format for Generic Forward Error 1701 Correction", RFC 5109, DOI 10.17487/RFC5109, December 1702 2007, . 1704 [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, 1705 "Session Traversal Utilities for NAT (STUN)", RFC 5389, 1706 DOI 10.17487/RFC5389, October 2008, 1707 . 1709 [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer 1710 Security (DTLS) Extension to Establish Keys for the Secure 1711 Real-time Transport Protocol (SRTP)", RFC 5764, 1712 DOI 10.17487/RFC5764, May 2010, 1713 . 1715 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 1716 Protocol (SDP) Grouping Framework", RFC 5888, 1717 DOI 10.17487/RFC5888, June 2010, 1718 . 1720 [RFC6465] Ivov, E., Ed., Marocco, E., Ed., and J. Lennox, "A Real- 1721 time Transport Protocol (RTP) Header Extension for Mixer- 1722 to-Client Audio Level Indication", RFC 6465, 1723 DOI 10.17487/RFC6465, December 2011, 1724 . 1726 [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP 1727 Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014, 1728 . 1730 [RFC7657] Black, D., Ed. and P. Jones, "Differentiated Services 1731 (Diffserv) and Real-Time Communication", RFC 7657, 1732 DOI 10.17487/RFC7657, November 2015, 1733 . 1735 [RFC7826] Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M., 1736 and M. Stiemerling, Ed., "Real-Time Streaming Protocol 1737 Version 2.0", RFC 7826, DOI 10.17487/RFC7826, December 1738 2016, . 1740 [RFC7983] Petit-Huguenin, M. and G. Salgueiro, "Multiplexing Scheme 1741 Updates for Secure Real-time Transport Protocol (SRTP) 1742 Extension for Datagram Transport Layer Security (DTLS)", 1743 RFC 7983, DOI 10.17487/RFC7983, September 2016, 1744 . 1746 [RFC8088] Westerlund, M., "How to Write an RTP Payload Format", 1747 RFC 8088, DOI 10.17487/RFC8088, May 2017, 1748 . 1750 [RFC8108] Lennox, J., Westerlund, M., Wu, Q., and C. Perkins, 1751 "Sending Multiple RTP Streams in a Single RTP Session", 1752 RFC 8108, DOI 10.17487/RFC8108, March 2017, 1753 . 1755 [RFC8445] Keranen, A., Holmberg, C., and J. Rosenberg, "Interactive 1756 Connectivity Establishment (ICE): A Protocol for Network 1757 Address Translator (NAT) Traversal", RFC 8445, 1758 DOI 10.17487/RFC8445, July 2018, 1759 . 1761 Appendix A. Dismissing Payload Type Multiplexing 1763 This section documents a number of reasons why using the payload type 1764 as a multiplexing point is unsuitable for most issues related to 1765 multiple RTP streams. Attempting to use Payload type multiplexing 1766 beyond its defined usage has well known negative effects on RTP 1767 discussed below. To use payload type as the single discriminator for 1768 multiple streams implies that all the different RTP streams are being 1769 sent with the same SSRC, thus using the same timestamp and sequence 1770 number space. This has many effects: 1772 1. Putting constraints on RTP timestamp rate for the multiplexed 1773 media. For example, RTP streams that use different RTP 1774 timestamp rates cannot be combined, as the timestamp values need 1775 to be consistent across all multiplexed media frames. Thus 1776 streams are forced to use the same RTP timestamp rate. When 1777 this is not possible, payload type multiplexing cannot be used. 1779 2. Many RTP payload formats can fragment a media object over 1780 multiple RTP packets, like parts of a video frame. These 1781 payload formats need to determine the order of the fragments to 1782 correctly decode them. Thus, it is important to ensure that all 1783 fragments related to a frame or a similar media object are 1784 transmitted in sequence and without interruptions within the 1785 object. This can relatively simple be solved on the sender side 1786 by ensuring that the fragments of each RTP stream are sent in 1787 sequence. 1789 3. Some media formats require uninterrupted sequence number space 1790 between media parts. These are media formats where any missing 1791 RTP sequence number will result in decoding failure or invoking 1792 a repair mechanism within a single media context. The text/ 1793 T140 payload format [RFC4103] is an example of such a format. 1795 These formats will need a sequence numbering abstraction 1796 function between RTP and the individual RTP stream before being 1797 used with payload type multiplexing. 1799 4. Sending multiple streams in the same sequence number space makes 1800 it impossible to determine which payload type, which stream a 1801 packet loss relates to, and thus to which stream to potentially 1802 apply packet loss concealment or other stream-specific loss 1803 mitigation mechanisms. 1805 5. If RTP Retransmission [RFC4588] is used and there is a loss, it 1806 is possible to ask for the missing packet(s) by SSRC and 1807 sequence number, not by payload type. If only some of the 1808 payload type multiplexed streams are of interest, there is no 1809 way of telling which missing packet(s) belong to the interesting 1810 stream(s) and all lost packets need be requested, wasting 1811 bandwidth. 1813 6. The current RTCP feedback mechanisms are built around providing 1814 feedback on RTP streams based on stream ID (SSRC), packet 1815 (sequence numbers) and time interval (RTP timestamps). There is 1816 almost never a field to indicate which payload type is reported, 1817 so sending feedback for a specific RTP payload type is difficult 1818 without extending existing RTCP reporting. 1820 7. The current RTCP media control messages [RFC5104] specification 1821 is oriented around controlling particular media flows, i.e. 1822 requests are done addressing a particular SSRC. Such mechanisms 1823 would need to be redefined to support payload type multiplexing. 1825 8. The number of payload types are inherently limited. 1826 Accordingly, using payload type multiplexing limits the number 1827 of streams that can be multiplexed and does not scale. This 1828 limitation is exacerbated if one uses solutions like RTP and 1829 RTCP multiplexing [RFC5761] where a number of payload types are 1830 blocked due to the overlap between RTP and RTCP. 1832 9. At times, there is a need to group multiplexed streams and this 1833 is currently possible for RTP sessions and for SSRC, but there 1834 is no defined way to group payload types. 1836 10. It is currently not possible to signal bandwidth requirements 1837 per RTP stream when using payload type multiplexing. 1839 11. Most existing SDP media level attributes cannot be applied on a 1840 per payload type level and would require re-definition in that 1841 context. 1843 12. A legacy endpoint that does not understand the indication that 1844 different RTP payload types are different RTP streams might be 1845 slightly confused by the large amount of possibly overlapping or 1846 identically defined RTP payload types. 1848 Appendix B. Signalling Considerations 1850 Signalling is not an architectural consideration for RTP itself, so 1851 this discussion has been moved to an appendix. However, it is 1852 extremely important for anyone building complete applications, so it 1853 is deserving of discussion. 1855 We document salient issues here that need to be addressed by the WGs 1856 that use some form of signaling to establish RTP sessions. These 1857 issues cannot simply be addressed by tweaking, extending, or 1858 profiling RTP, but require a dedicated and indepth look at the 1859 signaling primitives that set up the RTP sessions. 1861 There exist various signalling solutions for establishing RTP 1862 sessions. Many are SDP [RFC4566] based, however SDP functionality is 1863 also dependent on the signalling protocols carrying the SDP. RTSP 1864 [RFC7826] and SAP [RFC2974] both use SDP in a declarative fashion, 1865 while SIP [RFC3261] uses SDP with the additional definition of Offer/ 1866 Answer [RFC3264]. The impact on signalling and especially SDP needs 1867 to be considered as it can greatly affect how to deploy a certain 1868 multiplexing point choice. 1870 B.1. Session Oriented Properties 1872 One aspect of the existing signalling is that it is focused on RTP 1873 sessions, or at least in the case of SDP the media description. 1874 There are a number of things that are signalled on media description 1875 level but those are not necessarily strictly bound to an RTP session 1876 and could be of interest to signal specifically for a particular RTP 1877 stream (SSRC) within the session. The following properties have been 1878 identified as being potentially useful to signal not only on RTP 1879 session level: 1881 o Bitrate/Bandwidth exist today only at aggregate or as a common 1882 "any RTP stream" limit, unless either codec-specific bandwidth 1883 limiting or RTCP signalling using TMMBR is used. 1885 o Which SSRC that will use which RTP payload type (this will be 1886 visible from the first media packet, but is sometimes useful to 1887 know before packet arrival). 1889 Some of these issues are clearly SDP's problem rather than RTP 1890 limitations. However, if the aim is to deploy an solution using 1891 additional SSRCs that contains several sets of RTP streams with 1892 different properties (encoding/packetization parameter, bit-rate, 1893 etc.), putting each set in a different RTP session would directly 1894 enable negotiation of the parameters for each set. If insisting on 1895 additional SSRC only, a number of signalling extensions are needed to 1896 clarify that there are multiple sets of RTP streams with different 1897 properties and that they need in fact be kept different, since a 1898 single set will not satisfy the application's requirements. 1900 For some parameters, such as RTP payload type, resolution and 1901 framerate, a SSRC-linked mechanism has been proposed in 1902 [I-D.ietf-mmusic-rid] 1904 B.2. SDP Prevents Multiple Media Types 1906 SDP chose to use the m= line both to delineate an RTP session and to 1907 specify the top level of the MIME media type; audio, video, text, 1908 image, application. This media type is used as the top-level media 1909 type for identifying the actual payload format and is bound to a 1910 particular payload type using the rtpmap attribute. This binding has 1911 to be loosened in order to use SDP to describe RTP sessions 1912 containing multiple MIME top level types. 1914 [I-D.ietf-mmusic-sdp-bundle-negotiation] describes how to let 1915 multiple SDP media descriptions use a single underlying transport in 1916 SDP, which allows to define one RTP session with media types having 1917 different MIME top level types. 1919 B.3. Signalling RTP Stream Usage 1921 RTP streams being transported in RTP has some particular usage in an 1922 RTP application. This usage of the RTP stream is in many 1923 applications so far implicitly signalled. For example, an 1924 application might choose to take all incoming audio RTP streams, mix 1925 them and play them out. However, in more advanced applications that 1926 use multiple RTP streams there will be more than a single usage or 1927 purpose among the set of RTP streams being sent or received. RTP 1928 applications will need to signal this usage somehow. The signalling 1929 used will have to identify the RTP streams affected by their RTP- 1930 level identifiers, which means that they have to be identified either 1931 by their session or by their SSRC + session. 1933 In some applications, the receiver cannot utilise the RTP stream at 1934 all before it has received the signalling message describing the RTP 1935 stream and its usage. In other applications, there exists a default 1936 handling that is appropriate. 1938 If all RTP streams in an RTP session are to be treated in the same 1939 way, identifying the session is enough. If SSRCs in a session are to 1940 be treated differently, signalling needs to identify both the session 1941 and the SSRC. 1943 If this signalling affects how any RTP central node, like an RTP 1944 mixer or translator that selects, mixes or processes streams, treats 1945 the streams, the node will also need to receive the same signalling 1946 to know how to treat RTP streams with different usage in the right 1947 fashion. 1949 Authors' Addresses 1951 Magnus Westerlund 1952 Ericsson 1953 Torshamnsgatan 23 1954 SE-164 80 Kista 1955 Sweden 1957 Phone: +46 10 714 82 87 1958 Email: magnus.westerlund@ericsson.com 1960 Bo Burman 1961 Ericsson 1962 Gronlandsgatan 31 1963 SE-164 60 Kista 1964 Sweden 1966 Email: bo.burman@ericsson.com 1968 Colin Perkins 1969 University of Glasgow 1970 School of Computing Science 1971 Glasgow G12 8QQ 1972 United Kingdom 1974 Email: csp@csperkins.org 1976 Harald Tveit Alvestrand 1977 Google 1978 Kungsbron 2 1979 Stockholm 11122 1980 Sweden 1982 Email: harald@alvestrand.no 1983 Roni Even 1984 Huawei 1986 Email: roni.even@huawei.com