idnits 2.17.1 draft-ietf-avtcore-multiplex-guidelines-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 599: '... SHOULD be carried in a separate RTP...' RFC 2119 keyword, line 602: '...nd video streams SHOULD NOT be carried...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 30, 2017) is 2367 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-17) exists of draft-ietf-mmusic-msid-16 == Outdated reference: A later version (-15) exists of draft-ietf-mmusic-rid-11 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-39 -- Obsolete informational reference (is this intentional?): RFC 4566 (Obsoleted by RFC 8866) Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Westerlund 3 Internet-Draft B. Burman 4 Intended status: Informational Ericsson 5 Expires: May 3, 2018 C. Perkins 6 University of Glasgow 7 H. Alvestrand 8 Google 9 R. Even 10 H. Zheng 11 Huawei 12 October 30, 2017 14 Guidelines for using the Multiplexing Features of RTP to Support 15 Multiple Media Streams 16 draft-ietf-avtcore-multiplex-guidelines-05 18 Abstract 20 The Real-time Transport Protocol (RTP) is a flexible protocol that 21 can be used in a wide range of applications, networks, and system 22 topologies. That flexibility makes for wide applicability, but can 23 complicate the application design process. One particular design 24 question that has received much attention is how to support multiple 25 media streams in RTP. This memo discusses the available options and 26 design trade-offs, and provides guidelines on how to use the 27 multiplexing features of RTP to support multiple media streams. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at https://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on May 3, 2018. 46 Copyright Notice 48 Copyright (c) 2017 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (https://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 65 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 66 2.2. Subjects Out of Scope . . . . . . . . . . . . . . . . . . 5 67 3. RTP Multiplexing Overview . . . . . . . . . . . . . . . . . . 5 68 3.1. Reasons for Multiplexing and Grouping RTP Media Streams . 5 69 3.2. RTP Multiplexing Points . . . . . . . . . . . . . . . . . 6 70 3.2.1. RTP Session . . . . . . . . . . . . . . . . . . . . . 7 71 3.2.2. Synchronisation Source (SSRC) . . . . . . . . . . . . 8 72 3.2.3. Contributing Source (CSRC) . . . . . . . . . . . . . 10 73 3.2.4. RTP Payload Type . . . . . . . . . . . . . . . . . . 10 74 3.3. Issues Related to RTP Topologies . . . . . . . . . . . . 11 75 3.4. Issues Related to RTP and RTCP Protocol . . . . . . . . . 13 76 3.4.1. The RTP Specification . . . . . . . . . . . . . . . . 13 77 3.4.2. Multiple SSRCs in a Session . . . . . . . . . . . . . 15 78 3.4.3. Binding Related Sources . . . . . . . . . . . . . . . 15 79 3.4.4. Forward Error Correction . . . . . . . . . . . . . . 17 80 4. Particular Considerations for RTP Multiplexing . . . . . . . 17 81 4.1. Interworking Considerations . . . . . . . . . . . . . . . 17 82 4.1.1. Types of Interworking . . . . . . . . . . . . . . . . 17 83 4.1.2. RTP Translator Interworking . . . . . . . . . . . . . 18 84 4.1.3. Gateway Interworking . . . . . . . . . . . . . . . . 18 85 4.1.4. Multiple SSRC Legacy Considerations . . . . . . . . . 19 86 4.2. Network Considerations . . . . . . . . . . . . . . . . . 20 87 4.2.1. Quality of Service . . . . . . . . . . . . . . . . . 20 88 4.2.2. NAT and Firewall Traversal . . . . . . . . . . . . . 20 89 4.2.3. Multicast . . . . . . . . . . . . . . . . . . . . . . 22 90 4.3. Security and Key Management Considerations . . . . . . . 23 91 4.3.1. Security Context Scope . . . . . . . . . . . . . . . 24 92 4.3.2. Key Management for Multi-party session . . . . . . . 24 93 4.3.3. Complexity Implications . . . . . . . . . . . . . . . 25 95 5. Archetypes . . . . . . . . . . . . . . . . . . . . . . . . . 25 96 5.1. Single SSRC per Session . . . . . . . . . . . . . . . . . 25 97 5.2. Multiple SSRCs of the Same Media Type . . . . . . . . . . 27 98 5.3. Multiple Sessions for one Media type . . . . . . . . . . 28 99 5.4. Multiple Media Types in one Session . . . . . . . . . . . 30 100 5.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 31 101 6. Summary considerations and guidelines . . . . . . . . . . . . 31 102 6.1. Guidelines . . . . . . . . . . . . . . . . . . . . . . . 32 103 7. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 33 104 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 33 105 9. Security Considerations . . . . . . . . . . . . . . . . . . . 34 106 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 34 107 10.1. Normative References . . . . . . . . . . . . . . . . . . 34 108 10.2. Informative References . . . . . . . . . . . . . . . . . 34 109 Appendix A. Dismissing Payload Type Multiplexing . . . . . . . . 38 110 Appendix B. Signalling considerations . . . . . . . . . . . . . 40 111 B.1. Signalling Aspects . . . . . . . . . . . . . . . . . . . 40 112 B.1.1. Session Oriented Properties . . . . . . . . . . . . . 40 113 B.1.2. SDP Prevents Multiple Media Types . . . . . . . . . . 41 114 B.1.3. Signalling Media Stream Usage . . . . . . . . . . . . 41 115 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 42 117 1. Introduction 119 The Real-time Transport Protocol (RTP) [RFC3550] is a commonly used 120 protocol for real-time media transport. It is a protocol that 121 provides great flexibility and can support a large set of different 122 applications. RTP was from the beginning designed for multiple 123 participants in a communication session. It supports many paradigms 124 of topologies and usages, as defined in [RFC7667]. RTP has several 125 multiplexing points designed for different purposes. These enable 126 support of multiple media streams and switching between different 127 encoding or packetization of the media. By using multiple RTP 128 sessions, sets of media streams can be structured for efficient 129 processing or identification. Thus the question for any RTP 130 application designer is how to best use the RTP session, the SSRC and 131 the payload type to meet the application's needs. 133 There have been increased interest in more advanced usage of RTP, for 134 example, multiple streams can occur when a single endpoint have 135 multiple media sources, like multiple cameras or microphones that 136 need to be sent simultaneously. Consequently, questions are raised 137 regarding the most appropriate RTP usage. The limitations in some 138 implementations, RTP/RTCP extensions, and signalling has also been 139 exposed. The authors also hope that clarification on the usefulness 140 of some functionalities in RTP will result in more complete 141 implementations in the future. 143 The purpose of this document is to provide clear information about 144 the possibilities of RTP when it comes to multiplexing. The RTP 145 application designer needs to understand the implications that come 146 from a particular usage of the RTP multiplexing points. The document 147 will recommend against some usages as being unsuitable, in general or 148 for particular purposes. 150 The document starts with some definitions and then goes into the 151 existing RTP functionalities around multiplexing. Both the desired 152 behaviour and the implications of a particular behaviour depend on 153 which topologies are used, which requires some consideration. This 154 is followed by a discussion of some choices in multiplexing behaviour 155 and their impacts. Some archetypes of RTP usage are discussed. 156 Finally, some recommendations and examples are provided. 158 2. Definitions 160 2.1. Terminology 162 The definitions in Section 3 of [RFC3550] are referenced normatively. 164 The taxonomy defined in [RFC7656] is referenced normatively. 166 The following terms and abbreviations are used in this document: 168 Multiparty: A communication situation including multiple endpoints. 169 In this document it will be used to refer to situations where more 170 than two endpoints communicate. 172 RTP Source: The originator or source of a particular Media Stream. 173 Identified using an SSRC in a particular RTP session. An RTP 174 source is the source of a single media stream, and is associated 175 with a single endpoint and a single Media Source. An RTP Source 176 is just called a Source in RFC 3550. 178 RTP Sink: A recipient of a Media Stream. The Media Sink is 179 identified using one or more SSRCs. There can be more than one 180 RTP Sink for one RTP source. 182 Multiplexing: The operation of taking multiple entities as input, 183 aggregating them onto some common resource while keeping the 184 individual entities addressable such that they can later be fully 185 and unambiguously separated (de-multiplexed) again. 187 RTP Session Group: One or more RTP sessions that are used together 188 to perform some function. Examples are multiple RTP sessions used 189 to carry different layers of a layered encoding. In an RTP 190 Session Group, CNAMEs are assumed to be valid across all RTP 191 sessions, and designate synchronisation contexts that can cross 192 RTP sessions. 194 Signalling: The process of configuring endpoints to participate in 195 one or more RTP sessions. 197 2.2. Subjects Out of Scope 199 This document is focused on issues that affect RTP. Thus, issues 200 that involve signalling protocols, such as whether SIP, Jingle or 201 some other protocol is in use for session configuration, the 202 particular syntaxes used to define RTP session properties, or the 203 constraints imposed by particular choices in the signalling 204 protocols, are mentioned only as examples in order to describe the 205 RTP issues more precisely. 207 This document assumes the applications will use RTCP. While there 208 are such applications that don't send RTCP, they do not conform to 209 the RTP specification, and thus can be regarded as reusing the RTP 210 packet format but not implementing the RTP protocol. 212 3. RTP Multiplexing Overview 214 3.1. Reasons for Multiplexing and Grouping RTP Media Streams 216 The reasons why an endpoint might choose to send multiple media 217 streams are widespread. In the below discussion, please keep in mind 218 that the reasons for having multiple media streams vary and include 219 but are not limited to the following: 221 o Multiple Media Sources 223 o Multiple Media Streams might be needed to represent one Media 224 Source (for instance when using layered encodings) 226 o A Retransmission stream might repeat the content of another Media 227 Stream 229 o An FEC stream might provide material that can be used to repair 230 another Media Stream 232 o Alternative Encodings, for instance different codecs for the same 233 audio stream 235 o Alternative formats, for instance multiple resolutions of the same 236 video stream 238 For each of these, it is necessary to decide if each additional media 239 stream gets its own SSRC multiplexed within a RTP Session, or if it 240 is necessary to use additional RTP sessions to group the media 241 streams. The choice between these made due to one reason might not 242 be the choice suitable for another reason. The clearest 243 understanding is associated with multiple media sources of the same 244 media type. However, all warrant discussion and clarification on how 245 to deal with them. As the discussion below will show, in reality we 246 cannot choose a single one of the two solutions. To utilise RTP well 247 and as efficiently as possible, both are needed. The real issue is 248 finding the right guidance on when to create RTP sessions and when 249 additional SSRCs in an RTP session is the right choice. 251 3.2. RTP Multiplexing Points 253 This section describes the multiplexing points present in the RTP 254 protocol that can be used to distinguish media streams and groups of 255 media streams. Figure 1 outlines the process of demultiplexing 256 incoming RTP streams: 258 | 260 | packets 262 +-- v 264 | +------------+ 266 | | Socket | 268 | +------------+ 270 | || || 272 RTP | RTP/ || |+-----> SCTP ( ...and any other protocols) 274 Session | RTCP || +------> STUN (multiplexed using same port) 276 +-- || 278 +-- || 280 | (split by SSRC) 282 | || || || 284 | || || || 286 Media | +--+ +--+ +--+ 288 Streams | |PB| |PB| |PB| Jitter buffer, process RTCP, FEC, etc. 290 | +--+ +--+ +--+ 292 +-- | | | 294 (pick rendering context based on PT) 296 +-- | / | 298 | +---+ | 300 | / | | 302 Payload | +--+ +--+ +--+ 304 Formats | |CR| |CR| |CR| Codecs and rendering 306 | +--+ +--+ +--+ 308 +-- 310 Figure 1: RTP Demultiplexing Process 312 3.2.1. RTP Session 314 An RTP Session is the highest semantic layer in the RTP protocol, and 315 represents an association between a group of communicating endpoints. 316 The set of participants that form an RTP session is defined as those 317 that share a single synchronisation source space [RFC3550]. That is, 318 if a group of participants are each aware of the synchronisation 319 source identifiers belonging to the other participants, then those 320 participants are in a single RTP session. A participant can become 321 aware of a synchronisation source identifier by receiving an RTP 322 packet containing it in the SSRC field or CSRC list, by receiving an 323 RTCP packet mentioning it in an SSRC field, or through signalling 324 (e.g., the SDP "a=ssrc:" attribute). Thus, the scope of an RTP 325 session is determined by the participants' network interconnection 326 topology, in combination with RTP and RTCP forwarding strategies 327 deployed by the endpoints and any middleboxes, and by the signalling. 329 RTP does not contain a session identifier. Rather, it relies on the 330 underlying transport layer to separate different sessions, and on the 331 signalling to identify sessions in a manner that is meaningful to the 332 application. The signalling layer might give sessions an explicit 333 identifier, or their identification might be implicit based on the 334 addresses and ports used. Accordingly, a single RTP Session can have 335 multiple associated identifiers, explicit and implicit, belonging to 336 different contexts. For example, when running RTP on top of UDP/IP, 337 an RTP endpoint can identify and delimit an RTP Session from other 338 RTP Sessions using the UDP source and destination IP addresses and 339 UDP port numbers. Another example is when using SDP grouping 340 framework [RFC5888] which uses an identifier per "m="-line; if there 341 is a one-to-one mapping between "m="-lines and RTP sessions, that 342 grouping framework identifier will identify an RTP Session. 343 [I-D.ietf-mmusic-sdp-bundle-negotiation] extends the "m-"-line for 344 bundled media, which adds complexity to demultiplexing media stream. 345 Section 10.2 of [I-D.ietf-mmusic-sdp-bundle-negotiation] provides 346 information about how RTP/RTCP streams are associated with SDP media 347 description. 349 RTP sessions are globally unique, but their identity can only be 350 determined by the communication context at an endpoint of the 351 session, or by a middlebox that is aware of the session context. The 352 relationship between RTP sessions depending on the underlying 353 application, transport, and signalling protocol. The RTP protocol 354 makes no normative statements about the relationship between 355 different RTP sessions, however the applications that use more than 356 one RTP session will have some higher layer understanding of the 357 relationship between the sessions they create. 359 3.2.2. Synchronisation Source (SSRC) 361 A synchronisation source (SSRC) identifies an RTP source or an RTP 362 sink. Every endpoint will have at least one synchronisation source 363 identifier, even if it does not send media (endpoints that are only 364 RTP sinks still send RTCP, and use their synchronisation source 365 identifier in the RTCP packets they send). An endpoint can have 366 multiple synchronisation sources identifiers if it contains multiple 367 RTP sources (i.e., if it sends multiple media streams). Endpoints 368 that are both RTP sources and RTP sinks use the same synchronisation 369 sources in both roles. At any given time, a RTP source has one and 370 only one SSRC - although that can change over the lifetime of the RTP 371 source or sink. 373 The synchronisation Source identifier is a 32-bit unsigned integer. 374 It is present in every RTP and RTCP packet header, and in the payload 375 of some RTCP packet types. It can also be present in SDP signalling. 376 Unless pre-signalled using the SDP "a=ssrc:" attribute [RFC5576], the 377 synchronisation source identifier is chosen at random. It is not 378 dependent on the network address of the endpoint, and is intended to 379 be unique within an RTP session. Synchronisation source identifier 380 collisions can occur, and are handled as specified in [RFC3550] and 382 [RFC5576], resulting in the synchronisation source identifier of the 383 affecting RTP sources and/or sinks changing. An RTP source that 384 changes its RTP Session identifier (e.g. source transport address) 385 during a session has to choose a new SSRC identifier to avoid being 386 interpreted as looped source. 388 Synchronisation source identifiers that belong to the same 389 synchronisation context (i.e., that represent media streams that can 390 be synchronised using information in RTCP SR packets) are indicated 391 by use of identical CNAME chunks in corresponding RTCP SDES packets. 392 SDP signalling can also be used to provide explicit grouping of 393 synchronisation sources [RFC5576]. 395 In some cases, the same SSRC Identifier value is used to relate 396 streams in two different RTP Sessions, such as in Multi-Session 397 Transmission of scalable video [RFC6190]. This is to be avoided 398 since there is no guarantee of uniqueness in SSRC values across 399 RTP sessions. 401 Note that RTP sequence number and RTP timestamp are scoped by the 402 synchronisation source. Each RTP source will have a different 403 synchronisation source, and the corresponding media stream will have 404 a separate RTP sequence number and timestamp space. 406 An SSRC identifier is used by different type of sources as well as 407 sinks: 409 Real Media Source: Connected to a "physical" media source, for 410 example a camera or microphone. 412 Processed Media Source: A source with some attributed property 413 generated by some network node, for example a filtering function 414 in an RTP mixer that provides the most active speaker based on 415 some criteria, or a mix representing a set of other sources. 417 RTP Sink: A source that does not generate any RTP media stream in 418 itself (e.g. an endpoint or middlebox only receiving in an RTP 419 session). It still needs a sender SSRC for use as source in RTCP 420 reports. 422 Note that an endpoint that generates more than one media type, e.g. 423 a conference participant sending both audio and video, need not (and 424 commonly does not) use the same SSRC value across RTP sessions. RTCP 425 Compound packets containing the CNAME SDES item is the designated 426 method to bind an SSRC to a CNAME, effectively cross-correlating 427 SSRCs within and between RTP Sessions as coming from the same 428 endpoint. The main property attributed to SSRCs associated with the 429 same CNAME is that they are from a particular synchronisation context 430 and can be synchronised at playback. 432 An RTP receiver receiving a previously unseen SSRC value will 433 interpret it as a new source. It might in fact be a previously 434 existing source that had to change SSRC number due to an SSRC 435 conflict. However, the originator of the previous SSRC ought to have 436 ended the conflicting source by sending an RTCP BYE for it prior to 437 starting to send with the new SSRC, so the new SSRC is anyway 438 effectively a new source. 440 3.2.3. Contributing Source (CSRC) 442 The Contributing Source (CSRC) is not a separate identifier. Rather 443 a synchronisation source identifier is listed as a CSRC in the RTP 444 header of a packet generated by an RTP mixer if the corresponding 445 SSRC was in the header of one of the packets that contributed to the 446 mix. 448 It is not possible, in general, to extract media represented by an 449 individual CSRC since it is typically the result of a media mixing 450 (merge) operation by an RTP mixer on the individual media streams 451 corresponding to the CSRC identifiers. The exception is the case 452 when only a single CSRC is indicated as this represent forwarding of 453 a media stream, possibly modified. The RTP header extension for 454 Mixer-to-Client Audio Level Indication [RFC6465] expands on the 455 receivers information about a packet with a CSRC list. Due to these 456 restrictions, CSRC will not be considered a fully qualified 457 multiplexing point and will be disregarded in the rest of this 458 document. 460 3.2.4. RTP Payload Type 462 Each Media Stream utilises one or more RTP payload formats. An RTP 463 payload format describes how the output of a particular media codec 464 is framed and encoded into RTP packets. The payload format used is 465 identified by the payload type field in the RTP data packet header. 466 The combination therefore identifies a specific Media Stream encoding 467 format. The format definition can be taken from [RFC3551] for 468 statically allocated payload types, but ought to be explicitly 469 defined in signalling, such as SDP, both for static and dynamic 470 Payload Types. The term "format" here includes whatever can be 471 described by out-of-band signalling means. In SDP, the term "format" 472 includes media type, RTP timestamp sampling rate, codec, codec 473 configuration, payload format configurations, and various robustness 474 mechanisms such as redundant encodings [RFC2198]. 476 The payload type is scoped by sending endpoint within an RTP Session. 477 All synchronisation sources sent from a single endpoint share the 478 same payload types definitions. The RTP Payload Type is designed 479 such that only a single Payload Type is valid at any time instant in 480 the RTP source's RTP timestamp time line, effectively time- 481 multiplexing different Payload Types if any change occurs. The 482 payload type used can change on a per-packet basis for an SSRC, for 483 example a speech codec making use of generic comfort noise [RFC3389]. 484 If there is a true need to send multiple Payload Types for the same 485 SSRC that are valid for the same instant, then redundant encodings 486 [RFC2198] can be used. Several additional constraints than the ones 487 mentioned above need to be met to enable this use, one of which is 488 that the combined payload sizes of the different Payload Types ought 489 not exceed the transport MTU. 491 Other aspects of RTP payload format use are described in RTP Payload 492 HowTo [RFC8088]. 494 The payload type is not a multiplexing point at the RTP layer (see 495 Appendix A for a detailed discussion of why using the payload type as 496 an RTP multiplexing point does not work). The RTP payload type is, 497 however, used to determine how to render a media stream, and so can 498 be viewed as selecting a rendering context. The rendering context 499 can be defined by the signalling, and the RTP payload type number is 500 sometimes used to associate an RTP media stream with the signalling. 501 This association is possible provided unique RTP payload type numbers 502 are used in each context. For example, an RTP media stream can be 503 associated with an SDP "m=" line by comparing the RTP payload type 504 numbers used by the media stream with payload types signalled in the 505 "a=rtpmap:" lines in the media sections of the SDP. If RTP media 506 streams are being associated with signalling contexts based on the 507 RTP payload type, then the assignment of RTP payload type numbers 508 needs to be unique across signalling contexts; if the same RTP 509 payload format configuration is used in multiple contexts, then a 510 different RTP payload type number has to be assigned in each context 511 to ensure uniqueness. If the RTP payload type number is not being 512 used to associated RTP media streams with a signalling context, then 513 the same RTP payload type number can be used to indicate the exact 514 same RTP payload format configuration in multiple contexts. In case 515 of bundled media, Section 10.2 of 516 [I-D.ietf-mmusic-sdp-bundle-negotiation] provides more information on 517 SDP signalling. 519 3.3. Issues Related to RTP Topologies 521 The impact of how RTP multiplexing is performed will in general vary 522 with how the RTP Session participants are interconnected, described 523 by RTP Topology [RFC7667]. 525 Even the most basic use case, denoted Topo-Point-to-Point in 526 [RFC7667], raises a number of considerations that are discussed in 527 detail in following sections. They range over such aspects as: 529 o Does my communication peer support RTP as defined with multiple 530 SSRCs? 532 o Do I need network differentiation in form of QoS? 534 o Can the application more easily process and handle the media 535 streams if they are in different RTP sessions? 537 o Do I need to use additional media streams for RTP retransmission 538 or FEC. 540 o etc. 542 For some Point to Multi-point topologies (e.g. Topo-ASM and Topo-SSM 543 in [RFC7667]), multicast is used to interconnect the session 544 participants. Special considerations (documented in Section 4.2.3) 545 need to be made as multicast is a one to many distribution system. 547 Sometimes an RTP communication can end up in a situation when the 548 peer it is communicating with is not compatible with the other peer 549 for various reasons: 551 o No common media codec for a media type thus requiring transcoding 553 o Different support for multiple RTP sources and RTP sessions 555 o Usage of different media transport protocols, i.e RTP or other. 557 o Usage of different transport protocols, e.g. UDP, DCCP, TCP 559 o Different security solutions, e.g. IPsec, TLS, DTLS, SRTP with 560 different keying mechanisms. 562 In many situations this is resolved by the inclusion of a translator 563 between the two peers, as described by Topo-PtP-Translator in 564 [RFC7667]. The translator's main purpose is to make the peer look to 565 the other peer like something it is compatible with. There can also 566 be other reasons than compatibility to insert a translator in the 567 form of a middlebox or gateway, for example a need to monitor the 568 media streams. If the stream transport characteristics are changed 569 by the translator, appropriate media handling can require thorough 570 understanding of the application logic, specifically any congestion 571 control or media adaptation. 573 The point to point topology can contain one to many RTP sessions with 574 one to many media sources per session, each having one or more RTP 575 sources per media source. 577 3.4. Issues Related to RTP and RTCP Protocol 579 Using multiple media streams is a well supported feature of RTP. 580 However, it can be unclear for most implementers or people writing 581 RTP/RTCP applications or extensions attempting to apply multiple 582 streams when it is most appropriate to add an additional SSRC in an 583 existing RTP session and when it is better to use multiple RTP 584 sessions. This section tries to discuss the various considerations 585 needed. 587 3.4.1. The RTP Specification 589 RFC 3550 contains some recommendations and a bullet list with 5 590 arguments for different aspects of RTP multiplexing. Let's review 591 Section 5.2 of [RFC3550], reproduced below: 593 "For efficient protocol processing, the number of multiplexing points 594 should be minimised, as described in the integrated layer processing 595 design principle [ALF]. In RTP, multiplexing is provided by the 596 destination transport address (network address and port number) which 597 is different for each RTP session. For example, in a teleconference 598 composed of audio and video media encoded separately, each medium 599 SHOULD be carried in a separate RTP session with its own destination 600 transport address. 602 Separate audio and video streams SHOULD NOT be carried in a single 603 RTP session and demultiplexed based on the payload type or SSRC 604 fields. Interleaving packets with different RTP media types but 605 using the same SSRC would introduce several problems: 607 1. If, say, two audio streams shared the same RTP session and the 608 same SSRC value, and one were to change encodings and thus 609 acquire a different RTP payload type, there would be no general 610 way of identifying which stream had changed encodings. 612 2. An SSRC is defined to identify a single timing and sequence 613 number space. Interleaving multiple payload types would require 614 different timing spaces if the media clock rates differ and would 615 require different sequence number spaces to tell which payload 616 type suffered packet loss. 618 3. The RTCP sender and receiver reports (see Section 6.4) can only 619 describe one timing and sequence number space per SSRC and do not 620 carry a payload type field. 622 4. An RTP mixer would not be able to combine interleaved streams of 623 incompatible media into one stream. 625 5. Carrying multiple media in one RTP session precludes: the use of 626 different network paths or network resource allocations if 627 appropriate; reception of a subset of the media if desired, for 628 example just audio if video would exceed the available bandwidth; 629 and receiver implementations that use separate processes for the 630 different media, whereas using separate RTP sessions permits 631 either single- or multiple-process implementations. 633 Using a different SSRC for each medium but sending them in the same 634 RTP session would avoid the first three problems but not the last 635 two. 637 On the other hand, multiplexing multiple related sources of the same 638 medium in one RTP session using different SSRC values is the norm for 639 multicast sessions. The problems listed above don't apply: an RTP 640 mixer can combine multiple audio sources, for example, and the same 641 treatment is applicable for all of them. It might also be 642 appropriate to multiplex streams of the same medium using different 643 SSRC values in other scenarios where the last two problems do not 644 apply." 646 Let's consider one argument at a time. The first is an argument for 647 using different SSRC for each individual media stream, which is very 648 applicable. 650 The second argument is advocating against using payload type 651 multiplexing, which still stands as can been seen by the extensive 652 list of issues found in Appendix A. 654 The third argument is yet another argument against payload type 655 multiplexing. 657 The fourth is an argument against multiplexing media streams that 658 require different handling into the same session. As we saw in the 659 discussion of RTP mixers, the RTP mixer has to embed application 660 logic in order to handle streams anyway; the separation of streams 661 according to stream type is just another piece of application logic, 662 which might or might not be appropriate for a particular application. 663 A type of application that can mix different media sources "blindly" 664 is the audio only "telephone" bridge; most other type of application 665 needs application-specific logic to perform the mix correctly. 667 The fifth argument discusses network aspects that we will discuss 668 more below in Section 4.2. It also goes into aspects of 669 implementation, like decomposed endpoints where different processes 670 or inter-connected devices handle different aspects of the whole 671 multi-media session. 673 A summary of RFC 3550's view on multiplexing is to use unique SSRCs 674 for anything that is its own media/packet stream, and to use 675 different RTP sessions for media streams that don't share a media 676 type. This document supports the first point; it is very valid. The 677 later is one thing which needs to be further discussed, as imposing a 678 single solution on all usages of RTP is inappropriate. Multiple 679 Media Types in an RTP Session specification 680 [I-D.ietf-avtcore-multi-media-rtp-session] provides a detailed 681 analysis of the potential issues in having multiple media types in 682 the same RTP session. This document tries to provide an wider scoped 683 consideration regarding the usage of RTP session and considers 684 multiple media types in one RTP session as possible choice for the 685 RTP application designer. 687 3.4.2. Multiple SSRCs in a Session 689 Using multiple SSRCs in an RTP session at one endpoint requires 690 resolving some unclear aspects of the RTP specification. These could 691 potentially lead to some interoperability issues as well as some 692 potential significant inefficiencies. These are further discussed in 693 "RTP Considerations for Endpoints Sending Multiple Media Streams" 694 [RFC8108]. A application designer needs to consider these issues and 695 the impact availability or lack of the optimization in the endpoints 696 has on their application. 698 If an application will become affected by the issues described, using 699 Multiple RTP sessions can mitigate these issues. 701 3.4.3. Binding Related Sources 703 A common problem in a number of various RTP extensions has been how 704 to bind related RTP sources and their media streams together. This 705 issue is common to both using additional SSRCs and Multiple RTP 706 sessions. 708 The solutions can be divided into some groups, RTP/RTCP based, 709 Signalling based (SDP), grouping related RTP sessions, and grouping 710 SSRCs within an RTP session. Most solutions are explicit, but some 711 implicit methods have also been applied to the problem. 713 The SDP-based signalling solutions are: 715 SDP Media Description Grouping: The SDP Grouping Framework [RFC5888] 716 uses various semantics to group any number of media descriptions. 717 These has previously been considered primarily as grouping RTP 718 sessions, [I-D.ietf-mmusic-sdp-bundle-negotiation] groups multiple 719 media descriptors as a single RTP session. 721 SDP SSRC grouping: Source-Specific Media Attributes in SDP [RFC5576] 722 includes a solution for grouping SSRCs the same way as the 723 Grouping framework groups Media Descriptions. 725 SDP MSID grouping: Media Stream Identifiers [I-D.ietf-mmusic-msid] 726 includes a solution for grouping SSRCs that is independent of 727 their allocation to RTP sessions. 729 This supports a lot of use cases. All these solutions have 730 shortcomings in cases where the session's dynamic properties are such 731 that it is difficult or resource consuming to keep the list of 732 related SSRCs up to date. 734 Within RTP/RTCP based solutions when binding to an endpoint or 735 synchronization context, i.e. the CNAME has not been sufficient and 736 one way to bind related streams in multiple RTP sessions has been to 737 use the same SSRC value across all the RTP sessions. RTP 738 Retransmission [RFC4588] is multiple RTP session mode, Generic FEC 739 [RFC5109], as well as the RTP payload format for Scalable Video 740 Coding [RFC6190] in Multi Session Transmission (MST) mode uses this 741 method. This method clearly works but might have some downside in 742 RTP sessions with many participating SSRCs. The birthday paradox 743 ensures that if you populate a single session with 9292 SSRCs at 744 random, the chances are approximately 1% that at least one collision 745 will occur. When a collision occur this will force one to change 746 SSRC in all RTP sessions and thus resynchronizing all of them instead 747 of only the single media stream having the collision. Therefore it 748 is not recommended to use such method. Using [RFC7656] streams from 749 the same media source should use the same RTP session. 751 It can be noted that Section 8.3 of the RTP Specification [RFC3550] 752 recommends using a single SSRC space across all RTP sessions for 753 layered coding. 755 Another solution that has been applied to binding SSRCs has been an 756 implicit method used by RTP Retransmission [RFC4588] when doing 757 retransmissions in the same RTP session as the source RTP media 758 stream. This issues an RTP retransmission request, and then await a 759 new SSRC carrying the RTP retransmission payload and where that SSRC 760 is from the same CNAME. This limits a requestor to having only one 761 outstanding request on any new source SSRCs per endpoint. 763 [I-D.ietf-mmusic-rid] provides an RTP/RTCP based mechanism capable of 764 supporting explicit association within an RTP session. 766 3.4.4. Forward Error Correction 768 There exist a number of Forward Error Correction (FEC) based schemes 769 for how to reduce the packet loss of the original streams. Most of 770 the FEC schemes will protect a single source flow. The protection is 771 achieved by transmitting a certain amount of redundant information 772 that is encoded such that it can repair one or more packet losses 773 over the set of packets they protect. This sequence of redundant 774 information also needs to be transmitted as its own media stream, or 775 in some cases instead of the original media stream. Thus many of 776 these schemes create a need for binding related flows as discussed 777 above. Looking at the history of these schemes, there are schemes 778 using multiple SSRCs and schemes using multiple RTP sessions, and 779 some schemes that support both modes of operation. 781 Using multiple RTP sessions supports the case where some set of 782 receivers might not be able to utilise the FEC information. By 783 placing it in a separate RTP session, it can easily be ignored. 785 In usages involving multicast, having the FEC information on its own 786 multicast group allows for flexibility. This is especially useful 787 when receivers see very heterogeneous packet loss rates. Those 788 receivers that are not seeing packet loss don't need to join the 789 multicast group with the FEC data, and so avoid the overhead of 790 receiving unnecessary FEC packets, for example. 792 4. Particular Considerations for RTP Multiplexing 794 4.1. Interworking Considerations 796 There are several different kinds of interworking, and this section 797 discusses two related ones. The interworking between different 798 applications and the implications of potentially different choices of 799 usage of RTP's multiplexing points. The second topic relates to what 800 limitations have to be considered working with some legacy 801 applications. 803 4.1.1. Types of Interworking 805 It is not uncommon that applications or services of similar usage, 806 especially the ones intended for interactive communication, encounter 807 a situation where one want to interconnect two or more of these 808 applications. 810 In these cases one ends up in a situation where one might use a 811 gateway to interconnect applications. This gateway then needs to 812 change the multiplexing structure or adhere to limitations in each 813 application. 815 There are two fundamental approaches to gatewaying: RTP Translator 816 interworking (RTP bridging), where the gateway acts as an RTP 817 Translator, and the two applications are members of the same RTP 818 session, and Gateway Interworking (with RTP termination), where there 819 are independent RTP sessions running from each interconnected 820 application to the gateway. 822 4.1.2. RTP Translator Interworking 824 From an RTP perspective the RTP Translator approach could work if all 825 the applications are using the same codecs with the same payload 826 types, have made the same multiplexing choices, have the same 827 capabilities in number of simultaneous media streams combined with 828 the same set of RTP/RTCP extensions being supported. Unfortunately 829 this might not always be true. 831 When one is gatewaying via an RTP Translator, a natural requirement 832 is that the two applications being interconnected need to use the 833 same approach to multiplexing. Furthermore, if one of the 834 applications is capable of working in several modes (such as being 835 able to use Additional SSRCs or Multiple RTP sessions at will), and 836 the other one is not, successful interconnection depends on locking 837 the more flexible application into the operating mode where 838 interconnection can be successful, even if no participants using the 839 less flexible application are present when the RTP sessions are being 840 created. 842 4.1.3. Gateway Interworking 844 When one terminates RTP sessions at the gateway, there are certain 845 tasks that the gateway has to carry out: 847 o Generating appropriate RTCP reports for all media streams 848 (possibly based on incoming RTCP reports), originating from SSRCs 849 controlled by the gateway. 851 o Handling SSRC collision resolution in each application's RTP 852 sessions. 854 o Signalling, choosing and policing appropriate bit-rates for each 855 session. 857 For applications that uses any security mechanism, e.g. in the form 858 of SRTP, then the gateway needs to be able to decrypt incoming 859 packets and re-encrypt them in the other application's security 860 context. This is necessary even if all that's needed is a simple 861 remapping of SSRC numbers. If this is done, the gateway also needs 862 to be a member of the security contexts of both sides, of course. 864 Other tasks a gateway might need to apply include transcoding (for 865 incompatible codec types), rescaling (for incompatible video size 866 requirements), suppression of content that is known not to be handled 867 in the destination application, or the addition or removal of 868 redundancy coding or scalability layers to fit the need of the 869 destination domain. 871 From the above, we can see that the gateway needs to have an intimate 872 knowledge of the application requirements; a gateway is by its nature 873 application specific, not a commodity product. 875 This fact reveals the potential for these gateways to block evolution 876 of the applications by blocking unknown RTP and RTCP extensions that 877 the regular application has been extended with. 879 If one uses security functions, like SRTP, they can as seen above 880 incur both additional risk due to the gateway needing to be in 881 security association between the endpoints, unless the gateway is on 882 the transport level, and additional complexities in form of the 883 decrypt-encrypt cycles needed for each forwarded packet. SRTP, due 884 to its keying structure, also requires that each RTP session needs 885 different master keys, as use of the same key in two RTP sessions for 886 some ciphers can result in two-time pads that completely breaks the 887 confidentiality of the packets. 889 4.1.4. Multiple SSRC Legacy Considerations 891 Historically, the most common RTP use cases have been point to point 892 Voice over IP (VoIP) or streaming applications, commonly with no more 893 than one media source per endpoint and media type (typically audio 894 and video). Even in conferencing applications, especially voice 895 only, the conference focus or bridge has provided a single stream 896 with a mix of the other participants to each participant. It is also 897 common to have individual RTP sessions between each endpoint and the 898 RTP mixer, meaning that the mixer functions as an RTP-terminating 899 gateway. 901 When establishing RTP sessions that can contain endpoints that aren't 902 updated to handle multiple streams following these recommendations, a 903 particular application can have issues with multiple SSRCs within a 904 single session. These issues include: 906 1. Need to handle more than one stream simultaneously rather than 907 replacing an already existing stream with a new one. 909 2. Be capable of decoding multiple streams simultaneously. 911 3. Be capable of rendering multiple streams simultaneously. 913 This indicates that gateways attempting to interconnect to this class 914 of devices has to make sure that only one media stream of each type 915 gets delivered to the endpoint if it's expecting only one, and that 916 the multiplexing format is what the device expects. It is highly 917 unlikely that RTP translator-based interworking can be made to 918 function successfully in such a context. 920 4.2. Network Considerations 922 The multiplexing choice has impact on network level mechanisms that 923 need to be considered by the implementer. 925 4.2.1. Quality of Service 927 When it comes to Quality of Service mechanisms, they are either flow 928 based or packet marking based. RSVP [RFC2205] is an example of a 929 flow based mechanism, while Diff-Serv [RFC2474] is an example of a 930 packet marking based one. For a packet marking based scheme, the 931 method of multiplexing will not affect the possibility to use QoS. 933 However, for a flow based scheme there is a clear difference between 934 the methods. Additional SSRC will result in all media streams being 935 part of the same 5-tuple (protocol, source address, destination 936 address, source port, destination port) which is the most common 937 selector for flow based QoS. 939 It also needs to be noted that packet marking based QoS mechanisms 940 can have limitations. A general observation is that different DSCP 941 can be assigned to different packets within a flow as well as within 942 an RTP Media Stream. However, care needs to be taken when 943 considering which forwarding behaviours that are applied on path due 944 to these DSCPs. In some cases the forwarding behaviour can result in 945 packet reordering. For more discussion of this see [RFC7657]. 947 More specific to the choice between using one or more RTP session can 948 be the method for assigning marking to packets. If this is done 949 using a network ingress function, it can have issues discriminating 950 the different RTP media streams. The network API on the endpoint 951 also needs to be capable of setting the marking on a per packet basis 952 to reach the full functionality. 954 4.2.2. NAT and Firewall Traversal 956 In today's network there exist a large number of middleboxes. The 957 ones that normally have most impact on RTP are Network Address 958 Translators (NAT) and Firewalls (FW). 960 Below we analyse and comment on the impact of requiring more 961 underlying transport flows in the presence of NATs and Firewalls: 963 End-Point Port Consumption: A given IP address only has 65536 964 available local ports per transport protocol for all consumers of 965 ports that exist on the machine. This is normally never an issue 966 for an end-user machine. It can become an issue for servers that 967 handle large number of simultaneous streams. However, if the 968 application uses ICE to authenticate STUN requests, a server can 969 serve multiple endpoints from the same local port, and use the 970 whole 5-tuple (source and destination address, source and 971 destination port, protocol) as identifier of flows after having 972 securely bound them to the remote endpoint address using the STUN 973 request. In theory the minimum number of media server ports 974 needed are the maximum number of simultaneous RTP Sessions a 975 single endpoint can use. In practice, implementation will 976 probably benefit from using more server ports to simplify 977 implementation or avoid performance bottlenecks. 979 NAT State: If an endpoint sits behind a NAT, each flow it generates 980 to an external address will result in a state that has to be kept 981 in the NAT. That state is a limited resource. In home or Small 982 Office/Home Office (SOHO) NATs, memory or processing are usually 983 the most limited resources. For large scale NATs serving many 984 internal endpoints, available external ports are likely the scarce 985 resource. Port limitations is primarily a problem for larger 986 centralised NATs where endpoint independent mapping requires each 987 flow to use one port for the external IP address. This affects 988 the maximum number of internal users per external IP address. 989 However, it is worth pointing out that a real-time video 990 conference session with audio and video is likely using less than 991 10 UDP flows, compared to certain web applications that can use 992 100+ TCP flows to various servers from a single browser instance. 994 NAT Traversal Excess Time: Performing the NAT/FW traversal takes a 995 certain amount of time for each flow. It also takes time in a 996 phase of communication between accepting to communicate and the 997 media path being established which is fairly critical. The best 998 case scenario for how much extra time it takes after finding the 999 first valid candidate pair following the specified ICE procedures 1000 are: 1.5*RTT + Ta*(Additional_Flows-1), where Ta is the pacing 1001 timer, which ICE specifies to be no smaller than 20 ms. That 1002 assumes a message in one direction, and then an immediate 1003 triggered check back. The reason it isn't more, is that ICE first 1004 finds one candidate pair that works prior to attempting to 1005 establish multiple flows. Thus, there is no extra time until one 1006 has found a working candidate pair. Based on that working pair 1007 the needed extra time is to in parallel establish the, in most 1008 cases 2-3, additional flows. However, packet loss causes extra 1009 delays, at least 100 ms, which is the minimal retransmission timer 1010 for ICE. 1012 NAT Traversal Failure Rate: Due to the need to establish more than a 1013 single flow through the NAT, there is some risk that establishing 1014 the first flow succeeds but that one or more of the additional 1015 flows fail. The risk that this happens is hard to quantify, but 1016 ought to be fairly low as one flow from the same interfaces has 1017 just been successfully established. Thus only rare events such as 1018 NAT resource overload, or selecting particular port numbers that 1019 are filtered etc., ought to be reasons for failure. 1021 Deep Packet Inspection and Multiple Streams: Firewalls differ in how 1022 deeply they inspect packets. There exist some potential that 1023 deeply inspecting firewalls will have similar legacy issues with 1024 multiple SSRCs as some stack implementations. 1026 Additional SSRC keeps the additional media streams within one RTP 1027 Session and transport flow and does not introduce any additional NAT 1028 traversal complexities per media stream. This can be compared with 1029 normally one or two additional transport flows per RTP session when 1030 using multiple RTP sessions. Additional lower layer transport flows 1031 will be needed, unless an explicit de-multiplexing layer is added 1032 between RTP and the transport protocol. At time of writing no such 1033 mechanism was defined. 1035 4.2.3. Multicast 1037 Multicast groups provides a powerful semantics for a number of real- 1038 time applications, especially the ones that desire broadcast-like 1039 behaviours with one endpoint transmitting to a large number of 1040 receivers, like in IPTV. But that same semantics do result in a 1041 certain number of limitations. 1043 One limitation is that for any group, sender side adaptation to the 1044 actual receiver properties causes degradation for all participants to 1045 what is supported by the receiver with the worst conditions among the 1046 group participants. In most cases this is not acceptable. Instead 1047 various receiver based solutions are employed to ensure that the 1048 receivers achieve best possible performance. By using scalable 1049 encoding and placing each scalability layer in a different multicast 1050 group, the receiver can control the amount of traffic it receives. 1051 To have each scalability layer on a different multicast group, one 1052 RTP session per multicast group is used. 1054 In addition, the transport flow considerations in multicast are a bit 1055 different from unicast; NATs with port translation are not useful in 1056 the multicast environment, meaning that the entire port range of each 1057 multicast address is available for distinguishing between RTP 1058 sessions. 1060 Thus it appears easiest and most straightforward to use multiple RTP 1061 sessions for sending different media flows used for adapting to 1062 network conditions. It is also common that streams that improve 1063 transport robustness are sent in their own multicast group to allow 1064 for interworking with legacy or to support different levels of 1065 protection. 1067 Here are some common behaviours for RTP multicast: 1069 1. Multicast applications use a group of RTP sessions, not one. 1070 Each endpoint will need to be a member of a number of RTP 1071 sessions in order to perform well. 1073 2. Within each RTP session, the number of RTP Sinks is likely to be 1074 much larger than the number of RTP sources. 1076 3. Multicast applications need signalling functions to identify the 1077 relationships between RTP sessions. 1079 4. Multicast applications need signalling functions to identify the 1080 relationships between SSRCs in different RTP sessions. 1082 All multicast configurations share a signalling requirement; all of 1083 the participants will need to have the same RTP and payload type 1084 configuration. Otherwise, A could for example be using payload type 1085 97 as the video codec H.264 while B thinks it is MPEG-2. It is to be 1086 noted that SDP offer/answer [RFC3264] is not appropriate for ensuring 1087 this property. The signalling aspects of multicast are not explored 1088 further in this memo. 1090 Security solutions for this type of group communications are also 1091 challenging. First of all the key-management and the security 1092 protocol needs to support group communication. Source authentication 1093 requires special solutions. For more discussion on this please 1094 review Options for Securing RTP Sessions [RFC7201]. 1096 4.3. Security and Key Management Considerations 1098 When dealing with point-to-point, 2-member RTP sessions only, there 1099 are few security issues that are relevant to the choice of having one 1100 RTP session or multiple RTP sessions. However, there are a few 1101 aspects of multiparty sessions that might warrant consideration. For 1102 general information of possible methods of securing RTP, please 1103 review RTP Security Options [RFC7201]. 1105 4.3.1. Security Context Scope 1107 When using SRTP [RFC3711] the security context scope is important and 1108 can be a necessary differentiation in some applications. As SRTP's 1109 crypto suites (so far) are built around symmetric keys, the receiver 1110 will need to have the same key as the sender. This results in that 1111 no one in a multi-party session can be certain that a received packet 1112 really was sent by the claimed sender or by another party having 1113 access to the key. In most cases this is a sufficient security 1114 property, but there are a few cases where this does create issues. 1116 The first case is when someone leaves a multi-party session and one 1117 wants to ensure that the party that left can no longer access the 1118 media streams. This requires that everyone re-keys without 1119 disclosing the keys to the excluded party. 1121 A second case is when using security as an enforcing mechanism for 1122 differentiation. Take for example a scalable layer or a high quality 1123 simulcast version which only premium users are allowed to access. 1124 The mechanism preventing a receiver from getting the high quality 1125 stream can be based on the stream being encrypted with a key that 1126 user can't access without paying premium, having the key-management 1127 limit access to the key. 1129 SRTP [RFC3711] has no special functions for dealing with different 1130 sets of master keys for different SSRCs. The key-management 1131 functions have different capabilities to establish different set of 1132 keys, normally on a per endpoint basis. For example, DTLS-SRTP 1133 [RFC5764] and Security Descriptions [RFC4568] establish different 1134 keys for outgoing and incoming traffic from an endpoint. This key 1135 usage has to be written into the cryptographic context, possibly 1136 associated with different SSRCs. 1138 4.3.2. Key Management for Multi-party session 1140 Performing key-management for multi-party session can be a challenge. 1141 This section considers some of the issues. 1143 Multi-party sessions, such as transport translator based sessions and 1144 multicast sessions, cannot use Security Description [RFC4568] nor 1145 DTLS-SRTP [RFC5764] without an extension as each endpoint provides 1146 its set of keys. In centralised conferences, the signalling 1147 counterpart is a conference server and the media plane unicast 1148 counterpart (to which DTLS messages would be sent) is the transport 1149 translator. Thus an extension like Encrypted Key Transport 1150 [I-D.ietf-avt-srtp-ekt] is needed or a MIKEY [RFC3830] based solution 1151 that allows for keying all session participants with the same master 1152 key. 1154 4.3.3. Complexity Implications 1156 The usage of security functions can surface complexity implications 1157 of the choice of multiplexing and topology. This becomes especially 1158 evident in RTP topologies having any type of middlebox that processes 1159 or modifies RTP/RTCP packets. Where there is very small overhead for 1160 an RTP translator or mixer to rewrite an SSRC value in the RTP packet 1161 of an unencrypted session, the cost of doing it when using 1162 cryptographic security functions is higher. For example if using 1163 SRTP [RFC3711], the actual security context and exact crypto key are 1164 determined by the SSRC field value. If one changes it, the 1165 encryption and authentication tag needs to be performed using another 1166 key. Thus changing the SSRC value implies a decryption using the old 1167 SSRC and its security context followed by an encryption using the new 1168 one. 1170 5. Archetypes 1172 This section discusses some archetypes of how RTP multiplexing can be 1173 used in applications to achieve certain goals and a summary of their 1174 implications. For each archetype there is discussion of benefits and 1175 downsides. 1177 5.1. Single SSRC per Session 1179 In this archetype each endpoint in a point-to-point session has only 1180 a single SSRC, thus the RTP session contains only two SSRCs, one 1181 local and one remote. This session can be used both unidirectional, 1182 i.e. only a single media stream or bi-directional, i.e. both 1183 endpoints have one media stream each. If the application needs 1184 additional media flows between the endpoints, they will have to 1185 establish additional RTP sessions. 1187 The Pros: 1189 1. This archetype has great legacy interoperability potential as it 1190 will not tax any RTP stack implementations. 1192 2. The signalling has good possibilities to negotiate and describe 1193 the exact formats and bit-rates for each media stream, especially 1194 using today's tools in SDP. 1196 3. It does not matter if usage or purpose of the media stream is 1197 signalled on media stream level or session level as there is no 1198 difference. 1200 4. It is possible to control security association per RTP media 1201 stream with current key-management, since each media stream is 1202 directly related to an RTP session, and the keying operates on a 1203 per-session basis. 1205 The Cons: 1207 a. The number of RTP sessions grows directly in proportion with the 1208 number of media streams, which has the implications: 1210 * Linear growth of the amount of NAT/FW state with number of 1211 media streams. 1213 * Increased delay and resource consumption from NAT/FW 1214 traversal. 1216 * Likely larger signalling message and signalling processing 1217 requirement due to the amount of session related information. 1219 * Higher potential for a single media stream to fail during 1220 transport between the endpoints. 1222 b. When the number of RTP sessions grows, the amount of explicit 1223 state for relating media stream also grows, linearly or possibly 1224 exponentially, depending on how the application needs to relate 1225 media streams. 1227 c. The port consumption might become a problem for centralised 1228 services, where the central node's port consumption grows rapidly 1229 with the number of sessions. 1231 d. For applications where the media streams are highly dynamic in 1232 their usage, i.e. entering and leaving, the amount of signalling 1233 can grow high. Issues arising from the timely establishment of 1234 additional RTP sessions can also arise. 1236 e. Cross session RTCP requests might be needed, and the fact that 1237 they're impossible can cause issues. 1239 f. If the same SSRC value is reused in multiple RTP sessions rather 1240 than being randomly chosen, interworking with applications that 1241 uses another multiplexing structure than this application will 1242 require SSRC translation. 1244 g. Cannot be used with Any Source Multicast (ASM) as one cannot 1245 guarantee that only two endpoints participate as packet senders. 1246 Using SSM, it is possible to restrict to these requirements if no 1247 RTCP feedback is injected back into the SSM group. 1249 h. For most security mechanisms, each RTP session or transport flow 1250 requires individual key-management and security association 1251 establishment thus increasing the overhead. 1253 RTP applications that need to inter-work with legacy RTP 1254 applications, like most deployed VoIP and video conferencing 1255 solutions, can potentially benefit from this structure. However, a 1256 large number of media descriptions in SDP can also run into issues 1257 with existing implementations. For any application needing a larger 1258 number of media flows, the overhead can become very significant. 1259 This structure is also not suitable for multi-party sessions, as any 1260 given media stream from each participant, although having same usage 1261 in the application, needs its own RTP session. In addition, the 1262 dynamic behaviour that can arise in multi-party applications can tax 1263 the signalling system and make timely media establishment more 1264 difficult. 1266 5.2. Multiple SSRCs of the Same Media Type 1268 In this archetype, each RTP session serves only a single media type. 1269 The RTP session can contain multiple media streams, either from a 1270 single endpoint or from multiple endpoints. This commonly creates a 1271 low number of RTP sessions, typically only one for audio and one for 1272 video, with a corresponding need for two listening ports when using 1273 RTP/RTCP multiplexing. 1275 The Pros: 1277 1. Low number of RTP sessions needed compared to single SSRC case. 1278 This implies: 1280 * Reduced NAT/FW state 1282 * Lower NAT/FW Traversal Cost in both processing and delay. 1284 2. Allows for early de-multiplexing in the processing chain in RTP 1285 applications where all media streams of the same type have the 1286 same usage in the application. 1288 3. Works well with media type de-composite endpoints. 1290 4. Enables Flow-based QoS with different prioritisation between 1291 media types. 1293 5. For applications with dynamic usage of media streams, i.e. they 1294 come and go frequently, having much of the state associated with 1295 the RTP session rather than an individual SSRC can avoid the need 1296 for in-session signalling of meta-information about each SSRC. 1298 6. Low overhead for security association establishment. 1300 The Cons: 1302 a. May have some need for cross session RTCP requests for things 1303 that affect both media types in an asynchronous way. 1305 b. Some potential for concern with legacy implementations that does 1306 not support the RTP specification fully when it comes to handling 1307 multiple SSRC per endpoint. 1309 c. Will not be able to control security association for sets of 1310 media streams within the same media type with today's key- 1311 management mechanisms, unless these are split into different RTP 1312 sessions. 1314 For RTP applications where all media streams of the same media type 1315 share same usage, this structure provides efficiency gains in amount 1316 of network state used and provides more fate sharing with other media 1317 flows of the same type. At the same time, it is still maintaining 1318 almost all functionalities when it comes to negotiation in the 1319 signalling of the properties for the individual media type and also 1320 enabling flow based QoS prioritisation between media types. It 1321 handles multi-party session well, independently of multicast or 1322 centralised transport distribution, as additional sources can 1323 dynamically enter and leave the session. 1325 5.3. Multiple Sessions for one Media type 1327 In this archetype one goes one step further than in the above 1328 (Section 5.2) by using multiple RTP sessions also for a single media 1329 type, but still not as far as having a single SSRC per RTP session. 1330 The main reason for going in this direction is that the RTP 1331 application needs separation of the media streams due to their usage. 1332 Some typical reasons for going to this archetype are scalability over 1333 multicast, simulcast, need for extended QoS prioritisation of media 1334 streams due to their usage in the application, or the need for fine- 1335 grained signalling using today's tools. 1337 The Pros: 1339 1. More suitable for Multicast usage where receivers can 1340 individually select which RTP sessions they want to participate 1341 in, assuming each RTP session has its own multicast group. 1343 2. Indication of the application's usage of the media stream, where 1344 multiple different usages exist. 1346 3. Less need for SSRC specific explicit signalling for each media 1347 stream and thus reduced need for explicit and timely signalling. 1349 4. Enables detailed QoS prioritisation for flow based mechanisms. 1351 5. Works well with de-composite endpoints. 1353 6. Handles dynamic usage of media streams well. 1355 7. For transport translator based multi-party sessions, this 1356 structure allows for improved control of which type of media 1357 streams an endpoint receives. 1359 8. The scope for who is included in a security association can be 1360 structured around the different RTP sessions, thus enabling such 1361 functionality with existing key-management. 1363 The Cons: 1365 a. Increases the amount of RTP sessions compared to Multiple SSRCs 1366 of the Same Media Type. 1368 b. Increased amount of session configuration state. 1370 c. May need synchronised cross-session RTCP requests and require 1371 some consideration due to this. 1373 d. For media streams that are part of scalability, simulcast or 1374 transport robustness it will be needed to bind sources, which 1375 need to support multiple RTP sessions. 1377 e. Some potential for concern with legacy implementations that does 1378 not support the RTP specification fully when it comes to handling 1379 multiple SSRC per endpoint. 1381 f. Higher overhead for security association establishment. 1383 g. If the applications need finer control than on media type level 1384 over which session participants that are included in different 1385 sets of security associations, most of today's key-management 1386 will have difficulties establishing such a session. 1388 For more complex RTP applications that have several different usages 1389 for media streams of the same media type and / or uses scalability or 1390 simulcast, this solution can enable those functions at the cost of 1391 increased overhead associated with the additional sessions. This 1392 type of structure is suitable for more advanced applications as well 1393 as multicast based applications requiring differentiation to 1394 different participants. 1396 5.4. Multiple Media Types in one Session 1398 This archetype is to use a single RTP session for multiple different 1399 media types, like audio and video, and possibly also transport 1400 robustness mechanisms like FEC or Retransmission. Each media stream 1401 will use its own SSRC and a given SSRC value from a particular 1402 endpoint will never use the SSRC for more than a single media type. 1404 The Pros: 1406 1. Single RTP session which implies: 1408 * Minimal NAT/FW state. 1410 * Minimal NAT/FW Traversal Cost. 1412 * Fate-sharing for all media flows. 1414 2. Enables separation of the different media types based on the 1415 payload types so media type specific endpoint or central 1416 processing can still be supported despite single session. 1418 3. Can handle dynamic allocations of media streams well on an RTP 1419 level. Depends on the application's needs for explicit 1420 indication of the stream usage and how timely that can be 1421 signalled. 1423 4. Minimal overhead for security association establishment. 1425 The Cons: 1427 a. Less suitable for interworking with other applications that uses 1428 individual RTP sessions per media type or multiple sessions for a 1429 single media type, due to need of SSRC translation. 1431 b. Negotiation of bandwidth for the different media types is 1432 currently not possible in SDP. This requires SDP extensions to 1433 enable payload or source specific bandwidth. Likely to be a 1434 problem due to media type asymmetry in needed bandwidth. 1436 c. Not suitable for de-composite endpoints. 1438 d. Flow based QoS cannot provide separate treatment to some media 1439 streams compared to others in the single RTP session. 1441 e. If there is significant asymmetry between the media streams' RTCP 1442 reporting needs, there are some challenges in configuration and 1443 usage to avoid wasting RTCP reporting on the media stream that 1444 does not need that frequent reporting. 1446 f. Not suitable for applications where some receivers like to 1447 receive only a subset of the media streams, especially if 1448 multicast or transport translator is being used. 1450 g. Additional concern with legacy implementations that do not 1451 support the RTP specification fully when it comes to handling 1452 multiple SSRC per endpoint, as also multiple simultaneous media 1453 types needs to be handled. 1455 h. If the applications need finer control over which session 1456 participants that are included in different sets of security 1457 associations, most key-management will have difficulties 1458 establishing such a session. 1460 5.5. Summary 1462 There are some clear relations between these archetypes. Both the 1463 "single SSRC per RTP session" and the "multiple media types in one 1464 session" are cases which require full explicit signalling of the 1465 media stream relations. However, they operate on two different 1466 levels where the first primarily enables session level binding, and 1467 the second needs to do it all on SSRC level. From another 1468 perspective, the two solutions are the two extreme points when it 1469 comes to number of RTP sessions needed. 1471 The two other archetypes "Multiple SSRCs of the Same Media Type" and 1472 "Multiple Sessions for one Media Type" are examples of two other 1473 cases that first of all allows for some implicit mapping of the role 1474 or usage of the media streams based on which RTP session they appear 1475 in. It thus potentially allows for less signalling and in particular 1476 reduced need for real-time signalling in dynamic sessions. They also 1477 represent points in between the first two when it comes to amount of 1478 RTP sessions established, i.e. representing an attempt to reduce the 1479 amount of sessions as much as possible without compromising the 1480 functionality the session provides both on network level and on 1481 signalling level. 1483 6. Summary considerations and guidelines 1484 6.1. Guidelines 1486 This section contains a number of recommendations for implementers or 1487 specification writers when it comes to handling multi-stream. 1489 Do not Require the same SSRC across Sessions: As discussed in 1490 Section 3.4.3 there exist drawbacks in using the same SSRC in 1491 multiple RTP sessions as a mechanism to bind related media streams 1492 together. It is instead suggested that a mechanism to explicitly 1493 signal the relation is used, either in RTP/RTCP or in the used 1494 signalling mechanism that establishes the RTP session(s). 1496 Use additional SSRCs for additional Media Sources: In the cases 1497 where an RTP endpoint needs to transmit additional media streams 1498 of the same media type in the application, with the same 1499 processing requirements at the network and RTP layers, it is 1500 suggested to send them as additional SSRCs in the same RTP 1501 session. For example a telepresence room where there are three 1502 cameras, and each camera captures 2 persons sitting at the table, 1503 sending each camera as its own SSRC within a single RTP session is 1504 suggested. 1506 Use additional RTP sessions for streams with different requirements: 1508 When media streams have different processing requirements from the 1509 network or the RTP layer at the endpoints, it is suggested that 1510 the different types of streams are put in different RTP sessions. 1511 This includes the case where different participants want different 1512 subsets of the set of RTP streams. 1514 When using multiple RTP Sessions use grouping: When using Multiple 1515 RTP session solutions, it is suggested to explicitly group the 1516 involved RTP sessions when needed using the signalling mechanism, 1517 for example The Session Description Protocol (SDP) Grouping 1518 Framework. [RFC5888], using some appropriate grouping semantics. 1520 RTP/RTCP Extensions May Support Additional SSRCs as well as Multiple 1521 RTP sessions: 1522 When defining an RTP or RTCP extension, the creator needs to 1523 consider if this extension is applicable to usage with additional 1524 SSRCs and Multiple RTP sessions. Any extension intended to be 1525 generic is suggested to support both. Applications that are not 1526 as generally applicable will have to consider if interoperability 1527 is better served by defining a single solution or providing both 1528 options. 1530 Transport Support Extensions: When defining new RTP/RTCP extensions 1531 intended for transport support, like the retransmission or FEC 1532 mechanisms, they are expected to include support for both 1533 additional SSRCs and multiple RTP sessions so that application 1534 developers can choose freely from the set of mechanisms without 1535 concerning themselves with which of the multiplexing choices a 1536 particular solution supports. 1538 7. Open Issues 1540 There are currently some issues that needs to be resolved before this 1541 document is ready to be published: 1543 1. Use of RFC 2119 language is section on SSRC (3.2.2) 1545 2. Better align source and sink terminolgy with Taxonomy 1546 (Section 3.2.2) 1548 3. Section on Binding Related Sources (Section 3.4.3) needs more 1549 text on usage of the RID and other SDES based mechanisms created. 1551 4. Does the MSID text need to be updated and clarified based on the 1552 evoulsion of MSID since previous version. Section 3.4.3. 1554 5. Section 4.1.2 (RTP Translator Interworking) needs to be updated. 1555 It is not obvious that it is a natural requirement that the same 1556 multiplexing is used. This needs better discussion. 1558 6. Refernce to Ta for ICE being 20 ms will need to be updated due to 1559 ICE update. 1561 7. In Section 4.3.2 (Key Management for Multi-party session) the 1562 reference to EKT needs to be updated, question is if draft-ietf- 1563 perc-ekt-diet is appropriate here? 1565 8. Can we find a more approriate term than archetypes? 1567 9. 1569 8. IANA Considerations 1571 This document makes no request of IANA. 1573 Note to RFC Editor: this section can be removed on publication as an 1574 RFC. 1576 9. Security Considerations 1578 There is discussion of the security implications of choosing SSRC vs 1579 Multiple RTP session in Section 4.3. 1581 10. References 1583 10.1. Normative References 1585 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1586 Jacobson, "RTP: A Transport Protocol for Real-Time 1587 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 1588 July 2003, . 1590 [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and 1591 B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms 1592 for Real-Time Transport Protocol (RTP) Sources", RFC 7656, 1593 DOI 10.17487/RFC7656, November 2015, 1594 . 1596 10.2. Informative References 1598 [ALF] Clark, D. and D. Tennenhouse, "Architectural 1599 Considerations for a New Generation of Protocols", SIGCOMM 1600 Symposium on Communications Architectures and 1601 Protocols (Philadelphia, Pennsylvania), pp. 200--208, IEEE 1602 Computer Communications Review, Vol. 20(4), September 1603 1990. 1605 [I-D.ietf-avt-srtp-ekt] 1606 Wing, D., McGrew, D., and K. Fischer, "Encrypted Key 1607 Transport for Secure RTP", draft-ietf-avt-srtp-ekt-03 1608 (work in progress), October 2011. 1610 [I-D.ietf-avtcore-multi-media-rtp-session] 1611 Westerlund, M., Perkins, C., and J. Lennox, "Sending 1612 Multiple Types of Media in a Single RTP Session", draft- 1613 ietf-avtcore-multi-media-rtp-session-13 (work in 1614 progress), December 2015. 1616 [I-D.ietf-mmusic-msid] 1617 Alvestrand, H., "WebRTC MediaStream Identification in the 1618 Session Description Protocol", draft-ietf-mmusic-msid-16 1619 (work in progress), February 2017. 1621 [I-D.ietf-mmusic-rid] 1622 Thatcher, P., Zanaty, M., Nandakumar, S., Burman, B., 1623 Roach, A., and B. Campen, "RTP Payload Format 1624 Restrictions", draft-ietf-mmusic-rid-11 (work in 1625 progress), July 2017. 1627 [I-D.ietf-mmusic-sdp-bundle-negotiation] 1628 Holmberg, C., Alvestrand, H., and C. Jennings, 1629 "Negotiating Media Multiplexing Using the Session 1630 Description Protocol (SDP)", draft-ietf-mmusic-sdp-bundle- 1631 negotiation-39 (work in progress), August 2017. 1633 [I-D.lennox-mmusic-sdp-source-selection] 1634 Lennox, J. and H. Schulzrinne, "Mechanisms for Media 1635 Source Selection in the Session Description Protocol 1636 (SDP)", draft-lennox-mmusic-sdp-source-selection-05 (work 1637 in progress), October 2012. 1639 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1640 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 1641 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1642 DOI 10.17487/RFC2198, September 1997, 1643 . 1645 [RFC2205] Braden, R., Ed., Zhang, L., Berson, S., Herzog, S., and S. 1646 Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 1647 Functional Specification", RFC 2205, DOI 10.17487/RFC2205, 1648 September 1997, . 1650 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1651 "Definition of the Differentiated Services Field (DS 1652 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1653 DOI 10.17487/RFC2474, December 1998, 1654 . 1656 [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session 1657 Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974, 1658 October 2000, . 1660 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1661 A., Peterson, J., Sparks, R., Handley, M., and E. 1662 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1663 DOI 10.17487/RFC3261, June 2002, 1664 . 1666 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1667 with Session Description Protocol (SDP)", RFC 3264, 1668 DOI 10.17487/RFC3264, June 2002, 1669 . 1671 [RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for 1672 Comfort Noise (CN)", RFC 3389, DOI 10.17487/RFC3389, 1673 September 2002, . 1675 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1676 Video Conferences with Minimal Control", STD 65, RFC 3551, 1677 DOI 10.17487/RFC3551, July 2003, 1678 . 1680 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1681 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1682 RFC 3711, DOI 10.17487/RFC3711, March 2004, 1683 . 1685 [RFC3830] Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K. 1686 Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830, 1687 DOI 10.17487/RFC3830, August 2004, 1688 . 1690 [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text 1691 Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005, 1692 . 1694 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1695 Description Protocol", RFC 4566, DOI 10.17487/RFC4566, 1696 July 2006, . 1698 [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session 1699 Description Protocol (SDP) Security Descriptions for Media 1700 Streams", RFC 4568, DOI 10.17487/RFC4568, July 2006, 1701 . 1703 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 1704 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 1705 DOI 10.17487/RFC4588, July 2006, 1706 . 1708 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 1709 "Codec Control Messages in the RTP Audio-Visual Profile 1710 with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, 1711 February 2008, . 1713 [RFC5109] Li, A., Ed., "RTP Payload Format for Generic Forward Error 1714 Correction", RFC 5109, DOI 10.17487/RFC5109, December 1715 2007, . 1717 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 1718 Media Attributes in the Session Description Protocol 1719 (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009, 1720 . 1722 [RFC5761] Perkins, C. and M. Westerlund, "Multiplexing RTP Data and 1723 Control Packets on a Single Port", RFC 5761, 1724 DOI 10.17487/RFC5761, April 2010, 1725 . 1727 [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer 1728 Security (DTLS) Extension to Establish Keys for the Secure 1729 Real-time Transport Protocol (SRTP)", RFC 5764, 1730 DOI 10.17487/RFC5764, May 2010, 1731 . 1733 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 1734 Protocol (SDP) Grouping Framework", RFC 5888, 1735 DOI 10.17487/RFC5888, June 2010, 1736 . 1738 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 1739 "RTP Payload Format for Scalable Video Coding", RFC 6190, 1740 DOI 10.17487/RFC6190, May 2011, 1741 . 1743 [RFC6465] Ivov, E., Ed., Marocco, E., Ed., and J. Lennox, "A Real- 1744 time Transport Protocol (RTP) Header Extension for Mixer- 1745 to-Client Audio Level Indication", RFC 6465, 1746 DOI 10.17487/RFC6465, December 2011, 1747 . 1749 [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP 1750 Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014, 1751 . 1753 [RFC7657] Black, D., Ed. and P. Jones, "Differentiated Services 1754 (Diffserv) and Real-Time Communication", RFC 7657, 1755 DOI 10.17487/RFC7657, November 2015, 1756 . 1758 [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, 1759 DOI 10.17487/RFC7667, November 2015, 1760 . 1762 [RFC7826] Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M., 1763 and M. Stiemerling, Ed., "Real-Time Streaming Protocol 1764 Version 2.0", RFC 7826, DOI 10.17487/RFC7826, December 1765 2016, . 1767 [RFC8088] Westerlund, M., "How to Write an RTP Payload Format", 1768 RFC 8088, DOI 10.17487/RFC8088, May 2017, 1769 . 1771 [RFC8108] Lennox, J., Westerlund, M., Wu, Q., and C. Perkins, 1772 "Sending Multiple RTP Streams in a Single RTP Session", 1773 RFC 8108, DOI 10.17487/RFC8108, March 2017, 1774 . 1776 Appendix A. Dismissing Payload Type Multiplexing 1778 This section documents a number of reasons why using the payload type 1779 as a multiplexing point for most things related to multiple streams 1780 is unsuitable. If one attempts to use Payload type multiplexing 1781 beyond it's defined usage, that has well known negative effects on 1782 RTP. To use Payload type as the single discriminator for multiple 1783 streams implies that all the different media streams are being sent 1784 with the same SSRC, thus using the same timestamp and sequence number 1785 space. This has many effects: 1787 1. Putting restraint on RTP timestamp rate for the multiplexed 1788 media. For example, media streams that use different RTP 1789 timestamp rates cannot be combined, as the timestamp values need 1790 to be consistent across all multiplexed media frames. Thus 1791 streams are forced to use the same rate. When this is not 1792 possible, Payload Type multiplexing cannot be used. 1794 2. Many RTP payload formats can fragment a media object over 1795 multiple packets, like parts of a video frame. These payload 1796 formats need to determine the order of the fragments to 1797 correctly decode them. Thus it is important to ensure that all 1798 fragments related to a frame or a similar media object are 1799 transmitted in sequence and without interruptions within the 1800 object. This can relatively simple be solved on the sender side 1801 by ensuring that the fragments of each media stream are sent in 1802 sequence. 1804 3. Some media formats require uninterrupted sequence number space 1805 between media parts. These are media formats where any missing 1806 RTP sequence number will result in decoding failure or invoking 1807 of a repair mechanism within a single media context. The text/ 1808 T140 payload format [RFC4103] is an example of such a format. 1809 These formats will need a sequence numbering abstraction 1810 function between RTP and the individual media stream before 1811 being used with Payload Type multiplexing. 1813 4. Sending multiple streams in the same sequence number space makes 1814 it impossible to determine which Payload Type and thus which 1815 stream a packet loss relates to. 1817 5. If RTP Retransmission [RFC4588] is used and there is a loss, it 1818 is possible to ask for the missing packet(s) by SSRC and 1819 sequence number, not by Payload Type. If only some of the 1820 Payload Type multiplexed streams are of interest, there is no 1821 way of telling which missing packet(s) belong to the interesting 1822 stream(s) and all lost packets need be requested, wasting 1823 bandwidth. 1825 6. The current RTCP feedback mechanisms are built around providing 1826 feedback on media streams based on stream ID (SSRC), packet 1827 (sequence numbers) and time interval (RTP Timestamps). There is 1828 almost never a field to indicate which Payload Type is reported, 1829 so sending feedback for a specific media stream is difficult 1830 without extending existing RTCP reporting. 1832 7. The current RTCP media control messages [RFC5104] specification 1833 is oriented around controlling particular media flows, i.e. 1834 requests are done addressing a particular SSRC. Such mechanisms 1835 would need to be redefined to support Payload Type multiplexing. 1837 8. The number of payload types are inherently limited. 1838 Accordingly, using Payload Type multiplexing limits the number 1839 of streams that can be multiplexed and does not scale. This 1840 limitation is exacerbated if one uses solutions like RTP and 1841 RTCP multiplexing [RFC5761] where a number of payload types are 1842 blocked due to the overlap between RTP and RTCP. 1844 9. At times, there is a need to group multiplexed streams and this 1845 is currently possible for RTP Sessions and for SSRC, but there 1846 is no defined way to group Payload Types. 1848 10. It is currently not possible to signal bandwidth requirements 1849 per media stream when using Payload Type Multiplexing. 1851 11. Most existing SDP media level attributes cannot be applied on a 1852 per Payload Type level and would require re-definition in that 1853 context. 1855 12. A legacy endpoint that does not understand the indication that 1856 different RTP payload types are different media streams might be 1857 slightly confused by the large amount of possibly overlapping or 1858 identically defined RTP Payload Types. 1860 Appendix B. Signalling considerations 1862 Signalling is not an architectural consideration for RTP itself, so 1863 this discussion has been moved to an appendix. However, it is hugely 1864 important for anyone building complete applications, so it is 1865 deserving of discussion. 1867 The issues raised here need to be addressed in the WGs that deal with 1868 signalling; they cannot be addressed by tweaking, extending or 1869 profiling RTP. 1871 B.1. Signalling Aspects 1873 There exist various signalling solutions for establishing RTP 1874 sessions. Many are SDP [RFC4566] based, however SDP functionality is 1875 also dependent on the signalling protocols carrying the SDP. Where 1876 RTSP [RFC7826] and SAP [RFC2974] both use SDP in a declarative 1877 fashion, while SIP [RFC3261] uses SDP with the additional definition 1878 of Offer/Answer [RFC3264]. The impact on signalling and especially 1879 SDP needs to be considered as it can greatly affect how to deploy a 1880 certain multiplexing point choice. 1882 B.1.1. Session Oriented Properties 1884 One aspect of the existing signalling is that it is focused around 1885 sessions, or at least in the case of SDP the media description. 1886 There are a number of things that are signalled on a session level/ 1887 media description but those are not necessarily strictly bound to an 1888 RTP session and could be of interest to signal specifically for a 1889 particular media stream (SSRC) within the session. The following 1890 properties have been identified as being potentially useful to signal 1891 not only on RTP session level: 1893 o Bitrate/Bandwidth exist today only at aggregate or a common any 1894 media stream limit, unless either codec-specific bandwidth 1895 limiting or RTCP signalling using TMMBR is used. 1897 o Which SSRC that will use which RTP Payload Types (this will be 1898 visible from the first media packet, but is sometimes useful to 1899 know before packet arrival). 1901 Some of these issues are clearly SDP's problem rather than RTP 1902 limitations. However, if the aim is to deploy an solution using 1903 additional SSRCs that contains several sets of media streams with 1904 different properties (encoding/packetization parameter, bit-rate, 1905 etc.), putting each set in a different RTP session would directly 1906 enable negotiation of the parameters for each set. If insisting on 1907 additional SSRC only, a number of signalling extensions are needed to 1908 clarify that there are multiple sets of media streams with different 1909 properties and that they need in fact be kept different, since a 1910 single set will not satisfy the application's requirements. 1912 For some parameters, such as resolution and framerate, a SSRC-linked 1913 mechanism has been proposed: 1914 [I-D.lennox-mmusic-sdp-source-selection]. 1916 B.1.2. SDP Prevents Multiple Media Types 1918 SDP chose to use the m= line both to delineate an RTP session and to 1919 specify the top level of the MIME media type; audio, video, text, 1920 image, application. This media type is used as the top-level media 1921 type for identifying the actual payload format bound to a particular 1922 payload type using the rtpmap attribute. This binding has to be 1923 loosened in order to use SDP to describe RTP sessions containing 1924 multiple MIME top level types. 1926 There is an accepted WG item in the MMUSIC WG to define how multiple 1927 media lines describe a single underlying transport 1928 [I-D.ietf-mmusic-sdp-bundle-negotiation] and thus it becomes possible 1929 in SDP to define one RTP session with media types having different 1930 MIME top level types. 1932 B.1.3. Signalling Media Stream Usage 1934 Media streams being transported in RTP has some particular usage in 1935 an RTP application. This usage of the media stream is in many 1936 applications so far implicitly signalled. For example, an 1937 application might choose to take all incoming audio RTP streams, mix 1938 them and play them out. However, in more advanced applications that 1939 use multiple media streams there will be more than a single usage or 1940 purpose among the set of media streams being sent or received. RTP 1941 applications will need to signal this usage somehow. The signalling 1942 used will have to identify the media streams affected by their RTP- 1943 level identifiers, which means that they have to be identified either 1944 by their session or by their SSRC + session. 1946 In some applications, the receiver cannot utilise the media stream at 1947 all before it has received the signalling message describing the 1948 media stream and its usage. In other applications, there exists a 1949 default handling that is appropriate. 1951 If all media streams in an RTP session are to be treated in the same 1952 way, identifying the session is enough. If SSRCs in a session are to 1953 be treated differently, signalling needs to identify both the session 1954 and the SSRC. 1956 If this signalling affects how any RTP central node, like an RTP 1957 mixer or translator that selects, mixes or processes streams, treats 1958 the streams, the node will also need to receive the same signalling 1959 to know how to treat media streams with different usage in the right 1960 fashion. 1962 Authors' Addresses 1964 Magnus Westerlund 1965 Ericsson 1966 Torshamsgatan 23 1967 SE-164 80 Kista 1968 Sweden 1970 Phone: +46 10 714 82 87 1971 Email: magnus.westerlund@ericsson.com 1973 Bo Burman 1974 Ericsson 1975 Farogatan 6 1976 SE-164 80 Kista 1977 Sweden 1979 Phone: +46 10 714 13 11 1980 Email: bo.burman@ericsson.com 1982 Colin Perkins 1983 University of Glasgow 1984 School of Computing Science 1985 Glasgow G12 8QQ 1986 United Kingdom 1988 Email: csp@csperkins.org 1990 Harald Tveit Alvestrand 1991 Google 1992 Kungsbron 2 1993 Stockholm 11122 1994 Sweden 1996 Email: harald@alvestrand.no 1997 Roni Even 1998 Huawei 2000 Email: roni.even@huawei.com 2002 Hui Zheng 2003 Huawei 2005 Email: marvin.zhenghui@huawei.com