idnits 2.17.1 draft-ietf-avtcore-multiplex-guidelines-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 599: '... each medium SHOULD be carried in a ...' RFC 2119 keyword, line 602: '...nd video streams SHOULD NOT be carried...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 30, 2017) is 2369 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-17) exists of draft-ietf-mmusic-msid-16 == Outdated reference: A later version (-15) exists of draft-ietf-mmusic-rid-11 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-39 -- Obsolete informational reference (is this intentional?): RFC 4566 (Obsoleted by RFC 8866) Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Westerlund 3 Internet-Draft B. Burman 4 Intended status: Informational Ericsson 5 Expires: May 3, 2018 C. Perkins 6 University of Glasgow 7 H. Alvestrand 8 Google 9 R. Even 10 H. Zheng 11 Huawei 12 October 30, 2017 14 Guidelines for using the Multiplexing Features of RTP to Support 15 Multiple Media Streams 16 draft-ietf-avtcore-multiplex-guidelines-04 18 Abstract 20 The Real-time Transport Protocol (RTP) is a flexible protocol that 21 can be used in a wide range of applications, networks, and system 22 topologies. That flexibility makes for wide applicability, but can 23 complicate the application design process. One particular design 24 question that has received much attention is how to support multiple 25 media streams in RTP. This memo discusses the available options and 26 design trade-offs, and provides guidelines on how to use the 27 multiplexing features of RTP to support multiple media streams. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at https://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on May 3, 2018. 46 Copyright Notice 48 Copyright (c) 2017 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (https://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 65 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 66 2.2. Subjects Out of Scope . . . . . . . . . . . . . . . . . . 5 67 3. RTP Multiplexing Overview . . . . . . . . . . . . . . . . . . 5 68 3.1. Reasons for Multiplexing and Grouping RTP Media Streams . 5 69 3.2. RTP Multiplexing Points . . . . . . . . . . . . . . . . . 6 70 3.2.1. RTP Session . . . . . . . . . . . . . . . . . . . . . 7 71 3.2.2. Synchronisation Source (SSRC) . . . . . . . . . . . . 8 72 3.2.3. Contributing Source (CSRC) . . . . . . . . . . . . . 10 73 3.2.4. RTP Payload Type . . . . . . . . . . . . . . . . . . 10 74 3.3. Issues Related to RTP Topologies . . . . . . . . . . . . 11 75 3.4. Issues Related to RTP and RTCP Protocol . . . . . . . . . 13 76 3.4.1. The RTP Specification . . . . . . . . . . . . . . . . 13 77 3.4.2. Multiple SSRCs in a Session . . . . . . . . . . . . . 15 78 3.4.3. Binding Related Sources . . . . . . . . . . . . . . . 15 79 3.4.4. Forward Error Correction . . . . . . . . . . . . . . 17 80 4. Particular Considerations for RTP Multiplexing . . . . . . . 17 81 4.1. Interworking Considerations . . . . . . . . . . . . . . . 17 82 4.1.1. Types of Interworking . . . . . . . . . . . . . . . . 17 83 4.1.2. RTP Translator Interworking . . . . . . . . . . . . . 18 84 4.1.3. Gateway Interworking . . . . . . . . . . . . . . . . 18 85 4.1.4. Multiple SSRC Legacy Considerations . . . . . . . . . 19 86 4.2. Network Considerations . . . . . . . . . . . . . . . . . 20 87 4.2.1. Quality of Service . . . . . . . . . . . . . . . . . 20 88 4.2.2. NAT and Firewall Traversal . . . . . . . . . . . . . 20 89 4.2.3. Multicast . . . . . . . . . . . . . . . . . . . . . . 22 90 4.3. Security and Key Management Considerations . . . . . . . 23 91 4.3.1. Security Context Scope . . . . . . . . . . . . . . . 24 92 4.3.2. Key Management for Multi-party session . . . . . . . 24 93 4.3.3. Complexity Implications . . . . . . . . . . . . . . . 25 95 5. Archetypes . . . . . . . . . . . . . . . . . . . . . . . . . 25 96 5.1. Single SSRC per Session . . . . . . . . . . . . . . . . . 25 97 5.2. Multiple SSRCs of the Same Media Type . . . . . . . . . . 27 98 5.3. Multiple Sessions for one Media type . . . . . . . . . . 28 99 5.4. Multiple Media Types in one Session . . . . . . . . . . . 30 100 5.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 31 101 6. Summary considerations and guidelines . . . . . . . . . . . . 31 102 6.1. Guidelines . . . . . . . . . . . . . . . . . . . . . . . 32 103 7. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 33 104 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 33 105 9. Security Considerations . . . . . . . . . . . . . . . . . . . 34 106 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 34 107 10.1. Normative References . . . . . . . . . . . . . . . . . . 34 108 10.2. Informative References . . . . . . . . . . . . . . . . . 34 109 Appendix A. Dismissing Payload Type Multiplexing . . . . . . . . 38 110 Appendix B. Signalling considerations . . . . . . . . . . . . . 40 111 B.1. Signalling Aspects . . . . . . . . . . . . . . . . . . . 40 112 B.1.1. Session Oriented Properties . . . . . . . . . . . . . 40 113 B.1.2. SDP Prevents Multiple Media Types . . . . . . . . . . 41 114 B.1.3. Signalling Media Stream Usage . . . . . . . . . . . . 41 115 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 42 117 1. Introduction 119 The Real-time Transport Protocol (RTP) [RFC3550] is a commonly used 120 protocol for real-time media transport. It is a protocol that 121 provides great flexibility and can support a large set of different 122 applications. RTP was from the beginning designed for multiple 123 participants in a communication session. It supports many paradigms 124 of topologies and usages, as defined in [RFC7667]. RTP has several 125 multiplexing points designed for different purposes. These enable 126 support of multiple media streams and switching between different 127 encoding or packetization of the media. By using multiple RTP 128 sessions, sets of media streams can be structured for efficient 129 processing or identification. Thus the question for any RTP 130 application designer is how to best use the RTP session, the SSRC and 131 the payload type to meet the application's needs. 133 There have been increased interest in more advanced usage of RTP, for 134 example, multiple streams can occur when a single endpoint have 135 multiple media sources, like multiple cameras or microphones that 136 need to be sent simultaneously. Consequently, questions are raised 137 regarding the most appropriate RTP usage. The limitations in some 138 implementations, RTP/RTCP extensions, and signalling has also been 139 exposed. The authors also hope that clarification on the usefulness 140 of some functionalities in RTP will result in more complete 141 implementations in the future. 143 The purpose of this document is to provide clear information about 144 the possibilities of RTP when it comes to multiplexing. The RTP 145 application designer needs to understand the implications that come 146 from a particular usage of the RTP multiplexing points. The document 147 will recommend against some usages as being unsuitable, in general or 148 for particular purposes. 150 The document starts with some definitions and then goes into the 151 existing RTP functionalities around multiplexing. Both the desired 152 behaviour and the implications of a particular behaviour depend on 153 which topologies are used, which requires some consideration. This 154 is followed by a discussion of some choices in multiplexing behaviour 155 and their impacts. Some archetypes of RTP usage are discussed. 156 Finally, some recommendations and examples are provided. 158 2. Definitions 160 2.1. Terminology 162 The definitions in Section 3 of [RFC3550] are referenced normatively. 164 The taxonomy defined in [RFC7656] is referenced normatively. 166 The following terms and abbreviations are used in this document: 168 Multiparty: A communication situation including multiple endpoints. 169 In this document it will be used to refer to situations where more 170 than two endpoints communicate. 172 RTP Source: The originator or source of a particular Media Stream. 173 Identified using an SSRC in a particular RTP session. An RTP 174 source is the source of a single media stream, and is associated 175 with a single endpoint and a single Media Source. An RTP Source 176 is just called a Source in RFC 3550. 178 RTP Sink: A recipient of a Media Stream. The Media Sink is 179 identified using one or more SSRCs. There can be more than one 180 RTP Sink for one RTP source. 182 Multiplexing: The operation of taking multiple entities as input, 183 aggregating them onto some common resource while keeping the 184 individual entities addressable such that they can later be fully 185 and unambiguously separated (de-multiplexed) again. 187 RTP Session Group: One or more RTP sessions that are used together 188 to perform some function. Examples are multiple RTP sessions used 189 to carry different layers of a layered encoding. In an RTP 190 Session Group, CNAMEs are assumed to be valid across all RTP 191 sessions, and designate synchronisation contexts that can cross 192 RTP sessions. 194 Signalling: The process of configuring endpoints to participate in 195 one or more RTP sessions. 197 2.2. Subjects Out of Scope 199 This document is focused on issues that affect RTP. Thus, issues 200 that involve signalling protocols, such as whether SIP, Jingle or 201 some other protocol is in use for session configuration, the 202 particular syntaxes used to define RTP session properties, or the 203 constraints imposed by particular choices in the signalling 204 protocols, are mentioned only as examples in order to describe the 205 RTP issues more precisely. 207 This document assumes the applications will use RTCP. While there 208 are such applications that don't send RTCP, they do not conform to 209 the RTP specification, and thus can be regarded as reusing the RTP 210 packet format but not implementing the RTP protocol. 212 3. RTP Multiplexing Overview 214 3.1. Reasons for Multiplexing and Grouping RTP Media Streams 216 The reasons why an endpoint might choose to send multiple media 217 streams are widespread. In the below discussion, please keep in mind 218 that the reasons for having multiple media streams vary and include 219 but are not limited to the following: 221 o Multiple Media Sources 223 o Multiple Media Streams might be needed to represent one Media 224 Source (for instance when using layered encodings) 226 o A Retransmission stream might repeat the content of another Media 227 Stream 229 o An FEC stream might provide material that can be used to repair 230 another Media Stream 232 o Alternative Encodings, for instance different codecs for the same 233 audio stream 235 o Alternative formats, for instance multiple resolutions of the same 236 video stream 238 For each of these, it is necessary to decide if each additional media 239 stream gets its own SSRC multiplexed within a RTP Session, or if it 240 is necessary to use additional RTP sessions to group the media 241 streams. The choice between these made due to one reason might not 242 be the choice suitable for another reason. The clearest 243 understanding is associated with multiple media sources of the same 244 media type. However, all warrant discussion and clarification on how 245 to deal with them. As the discussion below will show, in reality we 246 cannot choose a single one of the two solutions. To utilise RTP well 247 and as efficiently as possible, both are needed. The real issue is 248 finding the right guidance on when to create RTP sessions and when 249 additional SSRCs in an RTP session is the right choice. 251 3.2. RTP Multiplexing Points 253 This section describes the multiplexing points present in the RTP 254 protocol that can be used to distinguish media streams and groups of 255 media streams. Figure 1 outlines the process of demultiplexing 256 incoming RTP streams: 258 | 260 | packets 262 +-- v 264 | +------------+ 266 | | Socket | 268 | +------------+ 270 | || || 272 RTP | RTP/ || |+-----> SCTP ( ...and any other protocols) 274 Session | RTCP || +------> STUN (multiplexed using same port) 276 +-- || 278 +-- || 280 | (split by SSRC) 282 | || || || 284 | || || || 286 Media | +--+ +--+ +--+ 288 Streams | |PB| |PB| |PB| Jitter buffer, process RTCP, FEC, etc. 290 | +--+ +--+ +--+ 292 +-- | | | 294 (pick rendering context based on PT) 296 +-- | / | 298 | +---+ | 300 | / | | 302 Payload | +--+ +--+ +--+ 304 Formats | |CR| |CR| |CR| Codecs and rendering 306 | +--+ +--+ +--+ 308 +-- 310 Figure 1: RTP Demultiplexing Process 312 3.2.1. RTP Session 314 An RTP Session is the highest semantic layer in the RTP protocol, and 315 represents an association between a group of communicating endpoints. 316 The set of participants that form an RTP session is defined as those 317 that share a single synchronisation source space [RFC3550]. That is, 318 if a group of participants are each aware of the synchronisation 319 source identifiers belonging to the other participants, then those 320 participants are in a single RTP session. A participant can become 321 aware of a synchronisation source identifier by receiving an RTP 322 packet containing it in the SSRC field or CSRC list, by receiving an 323 RTCP packet mentioning it in an SSRC field, or through signalling 324 (e.g., the SDP OCGBPa=ssrc:OCOe attribute). Thus, the scope of an 325 RTP session is determined by the participants' network 326 interconnection topology, in combination with RTP and RTCP forwarding 327 strategies deployed by the endpoints and any middleboxes, and by the 328 signalling. 330 RTP does not contain a session identifier. Rather, it relies on the 331 underlying transport layer to separate different sessions, and on the 332 signalling to identify sessions in a manner that is meaningful to the 333 application. The signalling layer might give sessions an explicit 334 identifier, or their identification might be implicit based on the 335 addresses and ports used. Accordingly, a single RTP Session can have 336 multiple associated identifiers, explicit and implicit, belonging to 337 different contexts. For example, when running RTP on top of UDP/IP, 338 an RTP endpoint can identify and delimit an RTP Session from other 339 RTP Sessions using the UDP source and destination IP addresses and 340 UDP port numbers. Another example is when using SDP grouping 341 framework [RFC5888] which uses an identifier per OCGBPm=OCOe-line; if 342 there is a one-to-one mapping between OCGBPm=OCOe-lines and RTP 343 sessions, that grouping framework identifier will identify an RTP 344 Session. [I-D.ietf-mmusic-sdp-bundle-negotiation] extends the 345 OCGBPm-OCGBP-line for bundled media, which adds complexity to 346 demultiplexing media stream. Section 10.2 of 347 [I-D.ietf-mmusic-sdp-bundle-negotiation] provides information about 348 how RTP/RTCP streams are associated with SDP media description. 350 RTP sessions are globally unique, but their identity can only be 351 determined by the communication context at an endpoint of the 352 session, or by a middlebox that is aware of the session context. The 353 relationship between RTP sessions depending on the underlying 354 application, transport, and signalling protocol. The RTP protocol 355 makes no normative statements about the relationship between 356 different RTP sessions, however the applications that use more than 357 one RTP session will have some higher layer understanding of the 358 relationship between the sessions they create. 360 3.2.2. Synchronisation Source (SSRC) 362 A synchronisation source (SSRC) identifies an RTP source or an RTP 363 sink. Every endpoint will have at least one synchronisation source 364 identifier, even if it does not send media (endpoints that are only 365 RTP sinks still send RTCP, and use their synchronisation source 366 identifier in the RTCP packets they send). An endpoint can have 367 multiple synchronisation sources identifiers if it contains multiple 368 RTP sources (i.e., if it sends multiple media streams). Endpoints 369 that are both RTP sources and RTP sinks use the same synchronisation 370 sources in both roles. At any given time, a RTP source has one and 371 only one SSRC - although that can change over the lifetime of the RTP 372 source or sink. 374 The synchronisation Source identifier is a 32-bit unsigned integer. 375 It is present in every RTP and RTCP packet header, and in the payload 376 of some RTCP packet types. It can also be present in SDP signalling. 377 Unless pre-signalled using the SDP OCGBPa=ssrc:OCOe attribute 378 [RFC5576], the synchronisation source identifier is chosen at random. 379 It is not dependent on the network address of the endpoint, and is 380 intended to be unique within an RTP session. Synchronisation source 381 identifier collisions can occur, and are handled as specified in 382 [RFC3550] and [RFC5576], resulting in the synchronisation source 383 identifier of the affecting RTP sources and/or sinks changing. An 384 RTP source that changes its RTP Session identifier (e.g. source 385 transport address) during a session has to choose a new SSRC 386 identifier to avoid being interpreted as looped source. 388 Synchronisation source identifiers that belong to the same 389 synchronisation context (i.e., that represent media streams that can 390 be synchronised using information in RTCP SR packets) are indicated 391 by use of identical CNAME chunks in corresponding RTCP SDES packets. 392 SDP signalling can also be used to provide explicit grouping of 393 synchronisation sources [RFC5576]. 395 In some cases, the same SSRC Identifier value is used to relate 396 streams in two different RTP Sessions, such as in Multi-Session 397 Transmission of scalable video [RFC6190]. This is to be avoided 398 since there is no guarantee of uniqueness in SSRC values across 399 RTP sessions. 401 Note that RTP sequence number and RTP timestamp are scoped by the 402 synchronisation source. Each RTP source will have a different 403 synchronisation source, and the corresponding media stream will have 404 a separate RTP sequence number and timestamp space. 406 An SSRC identifier is used by different type of sources as well as 407 sinks: 409 Real Media Source: Connected to a OCGBPphysicalOCOe media source, 410 for example a camera or microphone. 412 Processed Media Source: A source with some attributed property 413 generated by some network node, for example a filtering function 414 in an RTP mixer that provides the most active speaker based on 415 some criteria, or a mix representing a set of other sources. 417 RTP Sink: A source that does not generate any RTP media stream in 418 itself (e.g. an endpoint or middlebox only receiving in an RTP 419 session). It still needs a sender SSRC for use as source in RTCP 420 reports. 422 Note that an endpoint that generates more than one media type, e.g. 423 a conference participant sending both audio and video, need not (and 424 commonly does not) use the same SSRC value across RTP sessions. RTCP 425 Compound packets containing the CNAME SDES item is the designated 426 method to bind an SSRC to a CNAME, effectively cross-correlating 427 SSRCs within and between RTP Sessions as coming from the same 428 endpoint. The main property attributed to SSRCs associated with the 429 same CNAME is that they are from a particular synchronisation context 430 and can be synchronised at playback. 432 An RTP receiver receiving a previously unseen SSRC value will 433 interpret it as a new source. It might in fact be a previously 434 existing source that had to change SSRC number due to an SSRC 435 conflict. However, the originator of the previous SSRC ought to have 436 ended the conflicting source by sending an RTCP BYE for it prior to 437 starting to send with the new SSRC, so the new SSRC is anyway 438 effectively a new source. 440 3.2.3. Contributing Source (CSRC) 442 The Contributing Source (CSRC) is not a separate identifier. Rather 443 a synchronisation source identifier is listed as a CSRC in the RTP 444 header of a packet generated by an RTP mixer if the corresponding 445 SSRC was in the header of one of the packets that contributed to the 446 mix. 448 It is not possible, in general, to extract media represented by an 449 individual CSRC since it is typically the result of a media mixing 450 (merge) operation by an RTP mixer on the individual media streams 451 corresponding to the CSRC identifiers. The exception is the case 452 when only a single CSRC is indicated as this represent forwarding of 453 a media stream, possibly modified. The RTP header extension for 454 Mixer-to-Client Audio Level Indication [RFC6465] expands on the 455 receivers information about a packet with a CSRC list. Due to these 456 restrictions, CSRC will not be considered a fully qualified 457 multiplexing point and will be disregarded in the rest of this 458 document. 460 3.2.4. RTP Payload Type 462 Each Media Stream utilises one or more RTP payload formats. An RTP 463 payload format describes how the output of a particular media codec 464 is framed and encoded into RTP packets. The payload format used is 465 identified by the payload type field in the RTP data packet header. 466 The combination therefore identifies a specific Media Stream encoding 467 format. The format definition can be taken from [RFC3551] for 468 statically allocated payload types, but ought to be explicitly 469 defined in signalling, such as SDP, both for static and dynamic 470 Payload Types. The term OCGBPformatOCOe here includes whatever can 471 be described by out-of-band signalling means. In SDP, the term 472 OCGBPformatOCOe includes media type, RTP timestamp sampling rate, 473 codec, codec configuration, payload format configurations, and 474 various robustness mechanisms such as redundant encodings [RFC2198]. 476 The payload type is scoped by sending endpoint within an RTP Session. 477 All synchronisation sources sent from a single endpoint share the 478 same payload types definitions. The RTP Payload Type is designed 479 such that only a single Payload Type is valid at any time instant in 480 the RTP source's RTP timestamp time line, effectively time- 481 multiplexing different Payload Types if any change occurs. The 482 payload type used can change on a per-packet basis for an SSRC, for 483 example a speech codec making use of generic comfort noise [RFC3389]. 484 If there is a true need to send multiple Payload Types for the same 485 SSRC that are valid for the same instant, then redundant encodings 486 [RFC2198] can be used. Several additional constraints than the ones 487 mentioned above need to be met to enable this use, one of which is 488 that the combined payload sizes of the different Payload Types ought 489 not exceed the transport MTU. 491 Other aspects of RTP payload format use are described in RTP Payload 492 HowTo [RFC8088]. 494 The payload type is not a multiplexing point at the RTP layer (see 495 Appendix A for a detailed discussion of why using the payload type as 496 an RTP multiplexing point does not work). The RTP payload type is, 497 however, used to determine how to render a media stream, and so can 498 be viewed as selecting a rendering context. The rendering context 499 can be defined by the signalling, and the RTP payload type number is 500 sometimes used to associate an RTP media stream with the signalling. 501 This association is possible provided unique RTP payload type numbers 502 are used in each context. For example, an RTP media stream can be 503 associated with an SDP OCGBPm=OCOe line by comparing the RTP payload 504 type numbers used by the media stream with payload types signalled in 505 the OCGBPa=rtpmap:OCOe lines in the media sections of the SDP. If 506 RTP media streams are being associated with signalling contexts based 507 on the RTP payload type, then the assignment of RTP payload type 508 numbers needs to be unique across signalling contexts; if the same 509 RTP payload format configuration is used in multiple contexts, then a 510 different RTP payload type number has to be assigned in each context 511 to ensure uniqueness. If the RTP payload type number is not being 512 used to associated RTP media streams with a signalling context, then 513 the same RTP payload type number can be used to indicate the exact 514 same RTP payload format configuration in multiple contexts. In case 515 of bundled media, Section 10.2 of 516 [I-D.ietf-mmusic-sdp-bundle-negotiation] provides more information on 517 SDP signalling. 519 3.3. Issues Related to RTP Topologies 521 The impact of how RTP multiplexing is performed will in general vary 522 with how the RTP Session participants are interconnected, described 523 by RTP Topology [RFC7667]. 525 Even the most basic use case, denoted Topo-Point-to-Point in 526 [RFC7667], raises a number of considerations that are discussed in 527 detail in following sections. They range over such aspects as: 529 o Does my communication peer support RTP as defined with multiple 530 SSRCs? 532 o Do I need network differentiation in form of QoS? 534 o Can the application more easily process and handle the media 535 streams if they are in different RTP sessions? 537 o Do I need to use additional media streams for RTP retransmission 538 or FEC. 540 o etc. 542 For some Point to Multi-point topologies (e.g. Topo-ASM and Topo-SSM 543 in [RFC7667]), multicast is used to interconnect the session 544 participants. Special considerations (documented in Section 4.2.3) 545 need to be made as multicast is a one to many distribution system. 547 Sometimes an RTP communication can end up in a situation when the 548 peer it is communicating with is not compatible with the other peer 549 for various reasons: 551 o No common media codec for a media type thus requiring transcoding 553 o Different support for multiple RTP sources and RTP sessions 555 o Usage of different media transport protocols, i.e RTP or other. 557 o Usage of different transport protocols, e.g. UDP, DCCP, TCP 559 o Different security solutions, e.g. IPsec, TLS, DTLS, SRTP with 560 different keying mechanisms. 562 In many situations this is resolved by the inclusion of a translator 563 between the two peers, as described by Topo-PtP-Translator in 564 [RFC7667]. The translator's main purpose is to make the peer look to 565 the other peer like something it is compatible with. There can also 566 be other reasons than compatibility to insert a translator in the 567 form of a middlebox or gateway, for example a need to monitor the 568 media streams. If the stream transport characteristics are changed 569 by the translator, appropriate media handling can require thorough 570 understanding of the application logic, specifically any congestion 571 control or media adaptation. 573 The point to point topology can contain one to many RTP sessions with 574 one to many media sources per session, each having one or more RTP 575 sources per media source. 577 3.4. Issues Related to RTP and RTCP Protocol 579 Using multiple media streams is a well supported feature of RTP. 580 However, it can be unclear for most implementers or people writing 581 RTP/RTCP applications or extensions attempting to apply multiple 582 streams when it is most appropriate to add an additional SSRC in an 583 existing RTP session and when it is better to use multiple RTP 584 sessions. This section tries to discuss the various considerations 585 needed. 587 3.4.1. The RTP Specification 589 RFC 3550 contains some recommendations and a bullet list with 5 590 arguments for different aspects of RTP multiplexing. Let's review 591 Section 5.2 of [RFC3550], reproduced below: 593 OCGBPFor efficient protocol processing, the number of multiplexing 594 points should be minimised, as described in the integrated layer 595 processing design principle [ALF]. In RTP, multiplexing is provided 596 by the destination transport address (network address and port 597 number) which is different for each RTP session. For example, in a 598 teleconference composed of audio and video media encoded separately, 599 each medium SHOULD be carried in a separate RTP session with its own 600 destination transport address. 602 Separate audio and video streams SHOULD NOT be carried in a single 603 RTP session and demultiplexed based on the payload type or SSRC 604 fields. Interleaving packets with different RTP media types but 605 using the same SSRC would introduce several problems: 607 1. If, say, two audio streams shared the same RTP session and the 608 same SSRC value, and one were to change encodings and thus 609 acquire a different RTP payload type, there would be no general 610 way of identifying which stream had changed encodings. 612 2. An SSRC is defined to identify a single timing and sequence 613 number space. Interleaving multiple payload types would require 614 different timing spaces if the media clock rates differ and would 615 require different sequence number spaces to tell which payload 616 type suffered packet loss. 618 3. The RTCP sender and receiver reports (see Section 6.4) can only 619 describe one timing and sequence number space per SSRC and do not 620 carry a payload type field. 622 4. An RTP mixer would not be able to combine interleaved streams of 623 incompatible media into one stream. 625 5. Carrying multiple media in one RTP session precludes: the use of 626 different network paths or network resource allocations if 627 appropriate; reception of a subset of the media if desired, for 628 example just audio if video would exceed the available bandwidth; 629 and receiver implementations that use separate processes for the 630 different media, whereas using separate RTP sessions permits 631 either single- or multiple-process implementations. 633 Using a different SSRC for each medium but sending them in the same 634 RTP session would avoid the first three problems but not the last 635 two. 637 On the other hand, multiplexing multiple related sources of the same 638 medium in one RTP session using different SSRC values is the norm for 639 multicast sessions. The problems listed above don't apply: an RTP 640 mixer can combine multiple audio sources, for example, and the same 641 treatment is applicable for all of them. It might also be 642 appropriate to multiplex streams of the same medium using different 643 SSRC values in other scenarios where the last two problems do not 644 apply.OCOe 646 Let's consider one argument at a time. The first is an argument for 647 using different SSRC for each individual media stream, which is very 648 applicable. 650 The second argument is advocating against using payload type 651 multiplexing, which still stands as can been seen by the extensive 652 list of issues found in Appendix A. 654 The third argument is yet another argument against payload type 655 multiplexing. 657 The fourth is an argument against multiplexing media streams that 658 require different handling into the same session. As we saw in the 659 discussion of RTP mixers, the RTP mixer has to embed application 660 logic in order to handle streams anyway; the separation of streams 661 according to stream type is just another piece of application logic, 662 which might or might not be appropriate for a particular application. 663 A type of application that can mix different media sources 664 OCGBPblindlyOCOe is the audio only OCGBPtelephoneOCOe bridge; most 665 other type of application needs application-specific logic to perform 666 the mix correctly. 668 The fifth argument discusses network aspects that we will discuss 669 more below in Section 4.2. It also goes into aspects of 670 implementation, like decomposed endpoints where different processes 671 or inter-connected devices handle different aspects of the whole 672 multi-media session. 674 A summary of RFC 3550's view on multiplexing is to use unique SSRCs 675 for anything that is its own media/packet stream, and to use 676 different RTP sessions for media streams that don't share a media 677 type. This document supports the first point; it is very valid. The 678 later is one thing which needs to be further discussed, as imposing a 679 single solution on all usages of RTP is inappropriate. Multiple 680 Media Types in an RTP Session specification 681 [I-D.ietf-avtcore-multi-media-rtp-session] provides a detailed 682 analysis of the potential issues in having multiple media types in 683 the same RTP session. This document tries to provide an wider scoped 684 consideration regarding the usage of RTP session and considers 685 multiple media types in one RTP session as possible choice for the 686 RTP application designer. 688 3.4.2. Multiple SSRCs in a Session 690 Using multiple SSRCs in an RTP session at one endpoint requires 691 resolving some unclear aspects of the RTP specification. These could 692 potentially lead to some interoperability issues as well as some 693 potential significant inefficiencies. These are further discussed in 694 OCGBPRTP Considerations for Endpoints Sending Multiple Media 695 StreamsOCOe [RFC8108]. A application designer needs to consider 696 these issues and the impact availability or lack of the optimization 697 in the endpoints has on their application. 699 If an application will become affected by the issues described, using 700 Multiple RTP sessions can mitigate these issues. 702 3.4.3. Binding Related Sources 704 A common problem in a number of various RTP extensions has been how 705 to bind related RTP sources and their media streams together. This 706 issue is common to both using additional SSRCs and Multiple RTP 707 sessions. 709 The solutions can be divided into some groups, RTP/RTCP based, 710 Signalling based (SDP), grouping related RTP sessions, and grouping 711 SSRCs within an RTP session. Most solutions are explicit, but some 712 implicit methods have also been applied to the problem. 714 The SDP-based signalling solutions are: 716 SDP Media Description Grouping: The SDP Grouping Framework [RFC5888] 717 uses various semantics to group any number of media descriptions. 719 These has previously been considered primarily as grouping RTP 720 sessions, [I-D.ietf-mmusic-sdp-bundle-negotiation] groups multiple 721 media descriptors as a single RTP session. 723 SDP SSRC grouping: Source-Specific Media Attributes in SDP [RFC5576] 724 includes a solution for grouping SSRCs the same way as the 725 Grouping framework groups Media Descriptions. 727 SDP MSID grouping: Media Stream Identifiers [I-D.ietf-mmusic-msid] 728 includes a solution for grouping SSRCs that is independent of 729 their allocation to RTP sessions. 731 This supports a lot of use cases. All these solutions have 732 shortcomings in cases where the session's dynamic properties are such 733 that it is difficult or resource consuming to keep the list of 734 related SSRCs up to date. 736 Within RTP/RTCP based solutions when binding to an endpoint or 737 synchronization context, i.e. the CNAME has not been sufficient and 738 one way to bind related streams in multiple RTP sessions has been to 739 use the same SSRC value across all the RTP sessions. RTP 740 Retransmission [RFC4588] is multiple RTP session mode, Generic FEC 741 [RFC5109], as well as the RTP payload format for Scalable Video 742 Coding [RFC6190] in Multi Session Transmission (MST) mode uses this 743 method. This method clearly works but might have some downside in 744 RTP sessions with many participating SSRCs. The birthday paradox 745 ensures that if you populate a single session with 9292 SSRCs at 746 random, the chances are approximately 1% that at least one collision 747 will occur. When a collision occur this will force one to change 748 SSRC in all RTP sessions and thus resynchronizing all of them instead 749 of only the single media stream having the collision. Therefore it 750 is not recommended to use such method. Using [RFC7656] streams from 751 the same media source should use the same RTP session. 753 It can be noted that Section 8.3 of the RTP Specification [RFC3550] 754 recommends using a single SSRC space across all RTP sessions for 755 layered coding. 757 Another solution that has been applied to binding SSRCs has been an 758 implicit method used by RTP Retransmission [RFC4588] when doing 759 retransmissions in the same RTP session as the source RTP media 760 stream. This issues an RTP retransmission request, and then await a 761 new SSRC carrying the RTP retransmission payload and where that SSRC 762 is from the same CNAME. This limits a requestor to having only one 763 outstanding request on any new source SSRCs per endpoint. 765 [I-D.ietf-mmusic-rid] provides an RTP/RTCP based mechanism capable of 766 supporting explicit association within an RTP session. 768 3.4.4. Forward Error Correction 770 There exist a number of Forward Error Correction (FEC) based schemes 771 for how to reduce the packet loss of the original streams. Most of 772 the FEC schemes will protect a single source flow. The protection is 773 achieved by transmitting a certain amount of redundant information 774 that is encoded such that it can repair one or more packet losses 775 over the set of packets they protect. This sequence of redundant 776 information also needs to be transmitted as its own media stream, or 777 in some cases instead of the original media stream. Thus many of 778 these schemes create a need for binding related flows as discussed 779 above. Looking at the history of these schemes, there are schemes 780 using multiple SSRCs and schemes using multiple RTP sessions, and 781 some schemes that support both modes of operation. 783 Using multiple RTP sessions supports the case where some set of 784 receivers might not be able to utilise the FEC information. By 785 placing it in a separate RTP session, it can easily be ignored. 787 In usages involving multicast, having the FEC information on its own 788 multicast group allows for flexibility. This is especially useful 789 when receivers see very heterogeneous packet loss rates. Those 790 receivers that are not seeing packet loss don't need to join the 791 multicast group with the FEC data, and so avoid the overhead of 792 receiving unnecessary FEC packets, for example. 794 4. Particular Considerations for RTP Multiplexing 796 4.1. Interworking Considerations 798 There are several different kinds of interworking, and this section 799 discusses two related ones. The interworking between different 800 applications and the implications of potentially different choices of 801 usage of RTP's multiplexing points. The second topic relates to what 802 limitations have to be considered working with some legacy 803 applications. 805 4.1.1. Types of Interworking 807 It is not uncommon that applications or services of similar usage, 808 especially the ones intended for interactive communication, encounter 809 a situation where one want to interconnect two or more of these 810 applications. 812 In these cases one ends up in a situation where one might use a 813 gateway to interconnect applications. This gateway then needs to 814 change the multiplexing structure or adhere to limitations in each 815 application. 817 There are two fundamental approaches to gatewaying: RTP Translator 818 interworking (RTP bridging), where the gateway acts as an RTP 819 Translator, and the two applications are members of the same RTP 820 session, and Gateway Interworking (with RTP termination), where there 821 are independent RTP sessions running from each interconnected 822 application to the gateway. 824 4.1.2. RTP Translator Interworking 826 From an RTP perspective the RTP Translator approach could work if all 827 the applications are using the same codecs with the same payload 828 types, have made the same multiplexing choices, have the same 829 capabilities in number of simultaneous media streams combined with 830 the same set of RTP/RTCP extensions being supported. Unfortunately 831 this might not always be true. 833 When one is gatewaying via an RTP Translator, a natural requirement 834 is that the two applications being interconnected need to use the 835 same approach to multiplexing. Furthermore, if one of the 836 applications is capable of working in several modes (such as being 837 able to use Additional SSRCs or Multiple RTP sessions at will), and 838 the other one is not, successful interconnection depends on locking 839 the more flexible application into the operating mode where 840 interconnection can be successful, even if no participants using the 841 less flexible application are present when the RTP sessions are being 842 created. 844 4.1.3. Gateway Interworking 846 When one terminates RTP sessions at the gateway, there are certain 847 tasks that the gateway has to carry out: 849 o Generating appropriate RTCP reports for all media streams 850 (possibly based on incoming RTCP reports), originating from SSRCs 851 controlled by the gateway. 853 o Handling SSRC collision resolution in each application's RTP 854 sessions. 856 o Signalling, choosing and policing appropriate bit-rates for each 857 session. 859 For applications that uses any security mechanism, e.g. in the form 860 of SRTP, then the gateway needs to be able to decrypt incoming 861 packets and re-encrypt them in the other application's security 862 context. This is necessary even if all that's needed is a simple 863 remapping of SSRC numbers. If this is done, the gateway also needs 864 to be a member of the security contexts of both sides, of course. 866 Other tasks a gateway might need to apply include transcoding (for 867 incompatible codec types), rescaling (for incompatible video size 868 requirements), suppression of content that is known not to be handled 869 in the destination application, or the addition or removal of 870 redundancy coding or scalability layers to fit the need of the 871 destination domain. 873 From the above, we can see that the gateway needs to have an intimate 874 knowledge of the application requirements; a gateway is by its nature 875 application specific, not a commodity product. 877 This fact reveals the potential for these gateways to block evolution 878 of the applications by blocking unknown RTP and RTCP extensions that 879 the regular application has been extended with. 881 If one uses security functions, like SRTP, they can as seen above 882 incur both additional risk due to the gateway needing to be in 883 security association between the endpoints, unless the gateway is on 884 the transport level, and additional complexities in form of the 885 decrypt-encrypt cycles needed for each forwarded packet. SRTP, due 886 to its keying structure, also requires that each RTP session needs 887 different master keys, as use of the same key in two RTP sessions for 888 some ciphers can result in two-time pads that completely breaks the 889 confidentiality of the packets. 891 4.1.4. Multiple SSRC Legacy Considerations 893 Historically, the most common RTP use cases have been point to point 894 Voice over IP (VoIP) or streaming applications, commonly with no more 895 than one media source per endpoint and media type (typically audio 896 and video). Even in conferencing applications, especially voice 897 only, the conference focus or bridge has provided a single stream 898 with a mix of the other participants to each participant. It is also 899 common to have individual RTP sessions between each endpoint and the 900 RTP mixer, meaning that the mixer functions as an RTP-terminating 901 gateway. 903 When establishing RTP sessions that can contain endpoints that aren't 904 updated to handle multiple streams following these recommendations, a 905 particular application can have issues with multiple SSRCs within a 906 single session. These issues include: 908 1. Need to handle more than one stream simultaneously rather than 909 replacing an already existing stream with a new one. 911 2. Be capable of decoding multiple streams simultaneously. 913 3. Be capable of rendering multiple streams simultaneously. 915 This indicates that gateways attempting to interconnect to this class 916 of devices has to make sure that only one media stream of each type 917 gets delivered to the endpoint if it's expecting only one, and that 918 the multiplexing format is what the device expects. It is highly 919 unlikely that RTP translator-based interworking can be made to 920 function successfully in such a context. 922 4.2. Network Considerations 924 The multiplexing choice has impact on network level mechanisms that 925 need to be considered by the implementer. 927 4.2.1. Quality of Service 929 When it comes to Quality of Service mechanisms, they are either flow 930 based or packet marking based. RSVP [RFC2205] is an example of a 931 flow based mechanism, while Diff-Serv [RFC2474] is an example of a 932 packet marking based one. For a packet marking based scheme, the 933 method of multiplexing will not affect the possibility to use QoS. 935 However, for a flow based scheme there is a clear difference between 936 the methods. Additional SSRC will result in all media streams being 937 part of the same 5-tuple (protocol, source address, destination 938 address, source port, destination port) which is the most common 939 selector for flow based QoS. 941 It also needs to be noted that packet marking based QoS mechanisms 942 can have limitations. A general observation is that different DSCP 943 can be assigned to different packets within a flow as well as within 944 an RTP Media Stream. However, care needs to be taken when 945 considering which forwarding behaviours that are applied on path due 946 to these DSCPs. In some cases the forwarding behaviour can result in 947 packet reordering. For more discussion of this see [RFC7657]. 949 More specific to the choice between using one or more RTP session can 950 be the method for assigning marking to packets. If this is done 951 using a network ingress function, it can have issues discriminating 952 the different RTP media streams. The network API on the endpoint 953 also needs to be capable of setting the marking on a per packet basis 954 to reach the full functionality. 956 4.2.2. NAT and Firewall Traversal 958 In today's network there exist a large number of middleboxes. The 959 ones that normally have most impact on RTP are Network Address 960 Translators (NAT) and Firewalls (FW). 962 Below we analyse and comment on the impact of requiring more 963 underlying transport flows in the presence of NATs and Firewalls: 965 End-Point Port Consumption: A given IP address only has 65536 966 available local ports per transport protocol for all consumers of 967 ports that exist on the machine. This is normally never an issue 968 for an end-user machine. It can become an issue for servers that 969 handle large number of simultaneous streams. However, if the 970 application uses ICE to authenticate STUN requests, a server can 971 serve multiple endpoints from the same local port, and use the 972 whole 5-tuple (source and destination address, source and 973 destination port, protocol) as identifier of flows after having 974 securely bound them to the remote endpoint address using the STUN 975 request. In theory the minimum number of media server ports 976 needed are the maximum number of simultaneous RTP Sessions a 977 single endpoint can use. In practice, implementation will 978 probably benefit from using more server ports to simplify 979 implementation or avoid performance bottlenecks. 981 NAT State: If an endpoint sits behind a NAT, each flow it generates 982 to an external address will result in a state that has to be kept 983 in the NAT. That state is a limited resource. In home or Small 984 Office/Home Office (SOHO) NATs, memory or processing are usually 985 the most limited resources. For large scale NATs serving many 986 internal endpoints, available external ports are likely the scarce 987 resource. Port limitations is primarily a problem for larger 988 centralised NATs where endpoint independent mapping requires each 989 flow to use one port for the external IP address. This affects 990 the maximum number of internal users per external IP address. 991 However, it is worth pointing out that a real-time video 992 conference session with audio and video is likely using less than 993 10 UDP flows, compared to certain web applications that can use 994 100+ TCP flows to various servers from a single browser instance. 996 NAT Traversal Excess Time: Performing the NAT/FW traversal takes a 997 certain amount of time for each flow. It also takes time in a 998 phase of communication between accepting to communicate and the 999 media path being established which is fairly critical. The best 1000 case scenario for how much extra time it takes after finding the 1001 first valid candidate pair following the specified ICE procedures 1002 are: 1.5*RTT + Ta*(Additional_Flows-1), where Ta is the pacing 1003 timer, which ICE specifies to be no smaller than 20 ms. That 1004 assumes a message in one direction, and then an immediate 1005 triggered check back. The reason it isn't more, is that ICE first 1006 finds one candidate pair that works prior to attempting to 1007 establish multiple flows. Thus, there is no extra time until one 1008 has found a working candidate pair. Based on that working pair 1009 the needed extra time is to in parallel establish the, in most 1010 cases 2-3, additional flows. However, packet loss causes extra 1011 delays, at least 100 ms, which is the minimal retransmission timer 1012 for ICE. 1014 NAT Traversal Failure Rate: Due to the need to establish more than a 1015 single flow through the NAT, there is some risk that establishing 1016 the first flow succeeds but that one or more of the additional 1017 flows fail. The risk that this happens is hard to quantify, but 1018 ought to be fairly low as one flow from the same interfaces has 1019 just been successfully established. Thus only rare events such as 1020 NAT resource overload, or selecting particular port numbers that 1021 are filtered etc., ought to be reasons for failure. 1023 Deep Packet Inspection and Multiple Streams: Firewalls differ in how 1024 deeply they inspect packets. There exist some potential that 1025 deeply inspecting firewalls will have similar legacy issues with 1026 multiple SSRCs as some stack implementations. 1028 Additional SSRC keeps the additional media streams within one RTP 1029 Session and transport flow and does not introduce any additional NAT 1030 traversal complexities per media stream. This can be compared with 1031 normally one or two additional transport flows per RTP session when 1032 using multiple RTP sessions. Additional lower layer transport flows 1033 will be needed, unless an explicit de-multiplexing layer is added 1034 between RTP and the transport protocol. At time of writing no such 1035 mechanism was defined. 1037 4.2.3. Multicast 1039 Multicast groups provides a powerful semantics for a number of real- 1040 time applications, especially the ones that desire broadcast-like 1041 behaviours with one endpoint transmitting to a large number of 1042 receivers, like in IPTV. But that same semantics do result in a 1043 certain number of limitations. 1045 One limitation is that for any group, sender side adaptation to the 1046 actual receiver properties causes degradation for all participants to 1047 what is supported by the receiver with the worst conditions among the 1048 group participants. In most cases this is not acceptable. Instead 1049 various receiver based solutions are employed to ensure that the 1050 receivers achieve best possible performance. By using scalable 1051 encoding and placing each scalability layer in a different multicast 1052 group, the receiver can control the amount of traffic it receives. 1053 To have each scalability layer on a different multicast group, one 1054 RTP session per multicast group is used. 1056 In addition, the transport flow considerations in multicast are a bit 1057 different from unicast; NATs with port translation are not useful in 1058 the multicast environment, meaning that the entire port range of each 1059 multicast address is available for distinguishing between RTP 1060 sessions. 1062 Thus it appears easiest and most straightforward to use multiple RTP 1063 sessions for sending different media flows used for adapting to 1064 network conditions. It is also common that streams that improve 1065 transport robustness are sent in their own multicast group to allow 1066 for interworking with legacy or to support different levels of 1067 protection. 1069 Here are some common behaviours for RTP multicast: 1071 1. Multicast applications use a group of RTP sessions, not one. 1072 Each endpoint will need to be a member of a number of RTP 1073 sessions in order to perform well. 1075 2. Within each RTP session, the number of RTP Sinks is likely to be 1076 much larger than the number of RTP sources. 1078 3. Multicast applications need signalling functions to identify the 1079 relationships between RTP sessions. 1081 4. Multicast applications need signalling functions to identify the 1082 relationships between SSRCs in different RTP sessions. 1084 All multicast configurations share a signalling requirement; all of 1085 the participants will need to have the same RTP and payload type 1086 configuration. Otherwise, A could for example be using payload type 1087 97 as the video codec H.264 while B thinks it is MPEG-2. It is to be 1088 noted that SDP offer/answer [RFC3264] is not appropriate for ensuring 1089 this property. The signalling aspects of multicast are not explored 1090 further in this memo. 1092 Security solutions for this type of group communications are also 1093 challenging. First of all the key-management and the security 1094 protocol needs to support group communication. Source authentication 1095 requires special solutions. For more discussion on this please 1096 review Options for Securing RTP Sessions [RFC7201]. 1098 4.3. Security and Key Management Considerations 1100 When dealing with point-to-point, 2-member RTP sessions only, there 1101 are few security issues that are relevant to the choice of having one 1102 RTP session or multiple RTP sessions. However, there are a few 1103 aspects of multiparty sessions that might warrant consideration. For 1104 general information of possible methods of securing RTP, please 1105 review RTP Security Options [RFC7201]. 1107 4.3.1. Security Context Scope 1109 When using SRTP [RFC3711] the security context scope is important and 1110 can be a necessary differentiation in some applications. As SRTP's 1111 crypto suites (so far) are built around symmetric keys, the receiver 1112 will need to have the same key as the sender. This results in that 1113 no one in a multi-party session can be certain that a received packet 1114 really was sent by the claimed sender or by another party having 1115 access to the key. In most cases this is a sufficient security 1116 property, but there are a few cases where this does create issues. 1118 The first case is when someone leaves a multi-party session and one 1119 wants to ensure that the party that left can no longer access the 1120 media streams. This requires that everyone re-keys without 1121 disclosing the keys to the excluded party. 1123 A second case is when using security as an enforcing mechanism for 1124 differentiation. Take for example a scalable layer or a high quality 1125 simulcast version which only premium users are allowed to access. 1126 The mechanism preventing a receiver from getting the high quality 1127 stream can be based on the stream being encrypted with a key that 1128 user can't access without paying premium, having the key-management 1129 limit access to the key. 1131 SRTP [RFC3711] has no special functions for dealing with different 1132 sets of master keys for different SSRCs. The key-management 1133 functions have different capabilities to establish different set of 1134 keys, normally on a per endpoint basis. For example, DTLS-SRTP 1135 [RFC5764] and Security Descriptions [RFC4568] establish different 1136 keys for outgoing and incoming traffic from an endpoint. This key 1137 usage has to be written into the cryptographic context, possibly 1138 associated with different SSRCs. 1140 4.3.2. Key Management for Multi-party session 1142 Performing key-management for multi-party session can be a challenge. 1143 This section considers some of the issues. 1145 Multi-party sessions, such as transport translator based sessions and 1146 multicast sessions, cannot use Security Description [RFC4568] nor 1147 DTLS-SRTP [RFC5764] without an extension as each endpoint provides 1148 its set of keys. In centralised conferences, the signalling 1149 counterpart is a conference server and the media plane unicast 1150 counterpart (to which DTLS messages would be sent) is the transport 1151 translator. Thus an extension like Encrypted Key Transport 1152 [I-D.ietf-avt-srtp-ekt] is needed or a MIKEY [RFC3830] based solution 1153 that allows for keying all session participants with the same master 1154 key. 1156 4.3.3. Complexity Implications 1158 The usage of security functions can surface complexity implications 1159 of the choice of multiplexing and topology. This becomes especially 1160 evident in RTP topologies having any type of middlebox that processes 1161 or modifies RTP/RTCP packets. Where there is very small overhead for 1162 an RTP translator or mixer to rewrite an SSRC value in the RTP packet 1163 of an unencrypted session, the cost of doing it when using 1164 cryptographic security functions is higher. For example if using 1165 SRTP [RFC3711], the actual security context and exact crypto key are 1166 determined by the SSRC field value. If one changes it, the 1167 encryption and authentication tag needs to be performed using another 1168 key. Thus changing the SSRC value implies a decryption using the old 1169 SSRC and its security context followed by an encryption using the new 1170 one. 1172 5. Archetypes 1174 This section discusses some archetypes of how RTP multiplexing can be 1175 used in applications to achieve certain goals and a summary of their 1176 implications. For each archetype there is discussion of benefits and 1177 downsides. 1179 5.1. Single SSRC per Session 1181 In this archetype each endpoint in a point-to-point session has only 1182 a single SSRC, thus the RTP session contains only two SSRCs, one 1183 local and one remote. This session can be used both unidirectional, 1184 i.e. only a single media stream or bi-directional, i.e. both 1185 endpoints have one media stream each. If the application needs 1186 additional media flows between the endpoints, they will have to 1187 establish additional RTP sessions. 1189 The Pros: 1191 1. This archetype has great legacy interoperability potential as it 1192 will not tax any RTP stack implementations. 1194 2. The signalling has good possibilities to negotiate and describe 1195 the exact formats and bit-rates for each media stream, especially 1196 using today's tools in SDP. 1198 3. It does not matter if usage or purpose of the media stream is 1199 signalled on media stream level or session level as there is no 1200 difference. 1202 4. It is possible to control security association per RTP media 1203 stream with current key-management, since each media stream is 1204 directly related to an RTP session, and the keying operates on a 1205 per-session basis. 1207 The Cons: 1209 a. The number of RTP sessions grows directly in proportion with the 1210 number of media streams, which has the implications: 1212 * Linear growth of the amount of NAT/FW state with number of 1213 media streams. 1215 * Increased delay and resource consumption from NAT/FW 1216 traversal. 1218 * Likely larger signalling message and signalling processing 1219 requirement due to the amount of session related information. 1221 * Higher potential for a single media stream to fail during 1222 transport between the endpoints. 1224 b. When the number of RTP sessions grows, the amount of explicit 1225 state for relating media stream also grows, linearly or possibly 1226 exponentially, depending on how the application needs to relate 1227 media streams. 1229 c. The port consumption might become a problem for centralised 1230 services, where the central node's port consumption grows rapidly 1231 with the number of sessions. 1233 d. For applications where the media streams are highly dynamic in 1234 their usage, i.e. entering and leaving, the amount of signalling 1235 can grow high. Issues arising from the timely establishment of 1236 additional RTP sessions can also arise. 1238 e. Cross session RTCP requests might be needed, and the fact that 1239 they're impossible can cause issues. 1241 f. If the same SSRC value is reused in multiple RTP sessions rather 1242 than being randomly chosen, interworking with applications that 1243 uses another multiplexing structure than this application will 1244 require SSRC translation. 1246 g. Cannot be used with Any Source Multicast (ASM) as one cannot 1247 guarantee that only two endpoints participate as packet senders. 1248 Using SSM, it is possible to restrict to these requirements if no 1249 RTCP feedback is injected back into the SSM group. 1251 h. For most security mechanisms, each RTP session or transport flow 1252 requires individual key-management and security association 1253 establishment thus increasing the overhead. 1255 RTP applications that need to inter-work with legacy RTP 1256 applications, like most deployed VoIP and video conferencing 1257 solutions, can potentially benefit from this structure. However, a 1258 large number of media descriptions in SDP can also run into issues 1259 with existing implementations. For any application needing a larger 1260 number of media flows, the overhead can become very significant. 1261 This structure is also not suitable for multi-party sessions, as any 1262 given media stream from each participant, although having same usage 1263 in the application, needs its own RTP session. In addition, the 1264 dynamic behaviour that can arise in multi-party applications can tax 1265 the signalling system and make timely media establishment more 1266 difficult. 1268 5.2. Multiple SSRCs of the Same Media Type 1270 In this archetype, each RTP session serves only a single media type. 1271 The RTP session can contain multiple media streams, either from a 1272 single endpoint or from multiple endpoints. This commonly creates a 1273 low number of RTP sessions, typically only one for audio and one for 1274 video, with a corresponding need for two listening ports when using 1275 RTP/RTCP multiplexing. 1277 The Pros: 1279 1. Low number of RTP sessions needed compared to single SSRC case. 1280 This implies: 1282 * Reduced NAT/FW state 1284 * Lower NAT/FW Traversal Cost in both processing and delay. 1286 2. Allows for early de-multiplexing in the processing chain in RTP 1287 applications where all media streams of the same type have the 1288 same usage in the application. 1290 3. Works well with media type de-composite endpoints. 1292 4. Enables Flow-based QoS with different prioritisation between 1293 media types. 1295 5. For applications with dynamic usage of media streams, i.e. they 1296 come and go frequently, having much of the state associated with 1297 the RTP session rather than an individual SSRC can avoid the need 1298 for in-session signalling of meta-information about each SSRC. 1300 6. Low overhead for security association establishment. 1302 The Cons: 1304 a. May have some need for cross session RTCP requests for things 1305 that affect both media types in an asynchronous way. 1307 b. Some potential for concern with legacy implementations that does 1308 not support the RTP specification fully when it comes to handling 1309 multiple SSRC per endpoint. 1311 c. Will not be able to control security association for sets of 1312 media streams within the same media type with today's key- 1313 management mechanisms, unless these are split into different RTP 1314 sessions. 1316 For RTP applications where all media streams of the same media type 1317 share same usage, this structure provides efficiency gains in amount 1318 of network state used and provides more fate sharing with other media 1319 flows of the same type. At the same time, it is still maintaining 1320 almost all functionalities when it comes to negotiation in the 1321 signalling of the properties for the individual media type and also 1322 enabling flow based QoS prioritisation between media types. It 1323 handles multi-party session well, independently of multicast or 1324 centralised transport distribution, as additional sources can 1325 dynamically enter and leave the session. 1327 5.3. Multiple Sessions for one Media type 1329 In this archetype one goes one step further than in the above 1330 (Section 5.2) by using multiple RTP sessions also for a single media 1331 type, but still not as far as having a single SSRC per RTP session. 1332 The main reason for going in this direction is that the RTP 1333 application needs separation of the media streams due to their usage. 1334 Some typical reasons for going to this archetype are scalability over 1335 multicast, simulcast, need for extended QoS prioritisation of media 1336 streams due to their usage in the application, or the need for fine- 1337 grained signalling using today's tools. 1339 The Pros: 1341 1. More suitable for Multicast usage where receivers can 1342 individually select which RTP sessions they want to participate 1343 in, assuming each RTP session has its own multicast group. 1345 2. Indication of the application's usage of the media stream, where 1346 multiple different usages exist. 1348 3. Less need for SSRC specific explicit signalling for each media 1349 stream and thus reduced need for explicit and timely signalling. 1351 4. Enables detailed QoS prioritisation for flow based mechanisms. 1353 5. Works well with de-composite endpoints. 1355 6. Handles dynamic usage of media streams well. 1357 7. For transport translator based multi-party sessions, this 1358 structure allows for improved control of which type of media 1359 streams an endpoint receives. 1361 8. The scope for who is included in a security association can be 1362 structured around the different RTP sessions, thus enabling such 1363 functionality with existing key-management. 1365 The Cons: 1367 a. Increases the amount of RTP sessions compared to Multiple SSRCs 1368 of the Same Media Type. 1370 b. Increased amount of session configuration state. 1372 c. May need synchronised cross-session RTCP requests and require 1373 some consideration due to this. 1375 d. For media streams that are part of scalability, simulcast or 1376 transport robustness it will be needed to bind sources, which 1377 need to support multiple RTP sessions. 1379 e. Some potential for concern with legacy implementations that does 1380 not support the RTP specification fully when it comes to handling 1381 multiple SSRC per endpoint. 1383 f. Higher overhead for security association establishment. 1385 g. If the applications need finer control than on media type level 1386 over which session participants that are included in different 1387 sets of security associations, most of today's key-management 1388 will have difficulties establishing such a session. 1390 For more complex RTP applications that have several different usages 1391 for media streams of the same media type and / or uses scalability or 1392 simulcast, this solution can enable those functions at the cost of 1393 increased overhead associated with the additional sessions. This 1394 type of structure is suitable for more advanced applications as well 1395 as multicast based applications requiring differentiation to 1396 different participants. 1398 5.4. Multiple Media Types in one Session 1400 This archetype is to use a single RTP session for multiple different 1401 media types, like audio and video, and possibly also transport 1402 robustness mechanisms like FEC or Retransmission. Each media stream 1403 will use its own SSRC and a given SSRC value from a particular 1404 endpoint will never use the SSRC for more than a single media type. 1406 The Pros: 1408 1. Single RTP session which implies: 1410 * Minimal NAT/FW state. 1412 * Minimal NAT/FW Traversal Cost. 1414 * Fate-sharing for all media flows. 1416 2. Enables separation of the different media types based on the 1417 payload types so media type specific endpoint or central 1418 processing can still be supported despite single session. 1420 3. Can handle dynamic allocations of media streams well on an RTP 1421 level. Depends on the application's needs for explicit 1422 indication of the stream usage and how timely that can be 1423 signalled. 1425 4. Minimal overhead for security association establishment. 1427 The Cons: 1429 a. Less suitable for interworking with other applications that uses 1430 individual RTP sessions per media type or multiple sessions for a 1431 single media type, due to need of SSRC translation. 1433 b. Negotiation of bandwidth for the different media types is 1434 currently not possible in SDP. This requires SDP extensions to 1435 enable payload or source specific bandwidth. Likely to be a 1436 problem due to media type asymmetry in needed bandwidth. 1438 c. Not suitable for de-composite endpoints. 1440 d. Flow based QoS cannot provide separate treatment to some media 1441 streams compared to others in the single RTP session. 1443 e. If there is significant asymmetry between the media streams' RTCP 1444 reporting needs, there are some challenges in configuration and 1445 usage to avoid wasting RTCP reporting on the media stream that 1446 does not need that frequent reporting. 1448 f. Not suitable for applications where some receivers like to 1449 receive only a subset of the media streams, especially if 1450 multicast or transport translator is being used. 1452 g. Additional concern with legacy implementations that do not 1453 support the RTP specification fully when it comes to handling 1454 multiple SSRC per endpoint, as also multiple simultaneous media 1455 types needs to be handled. 1457 h. If the applications need finer control over which session 1458 participants that are included in different sets of security 1459 associations, most key-management will have difficulties 1460 establishing such a session. 1462 5.5. Summary 1464 There are some clear relations between these archetypes. Both the 1465 OCGBPsingle SSRC per RTP sessionOCOe and the OCGBPmultiple media 1466 types in one sessionOCOe are cases which require full explicit 1467 signalling of the media stream relations. However, they operate on 1468 two different levels where the first primarily enables session level 1469 binding, and the second needs to do it all on SSRC level. From 1470 another perspective, the two solutions are the two extreme points 1471 when it comes to number of RTP sessions needed. 1473 The two other archetypes OCGBPMultiple SSRCs of the Same Media 1474 TypeOCOe and OCGBPMultiple Sessions for one Media TypeOCOe are 1475 examples of two other cases that first of all allows for some 1476 implicit mapping of the role or usage of the media streams based on 1477 which RTP session they appear in. It thus potentially allows for 1478 less signalling and in particular reduced need for real-time 1479 signalling in dynamic sessions. They also represent points in 1480 between the first two when it comes to amount of RTP sessions 1481 established, i.e. representing an attempt to reduce the amount of 1482 sessions as much as possible without compromising the functionality 1483 the session provides both on network level and on signalling level. 1485 6. Summary considerations and guidelines 1486 6.1. Guidelines 1488 This section contains a number of recommendations for implementers or 1489 specification writers when it comes to handling multi-stream. 1491 Do not Require the same SSRC across Sessions: As discussed in 1492 Section 3.4.3 there exist drawbacks in using the same SSRC in 1493 multiple RTP sessions as a mechanism to bind related media streams 1494 together. It is instead suggested that a mechanism to explicitly 1495 signal the relation is used, either in RTP/RTCP or in the used 1496 signalling mechanism that establishes the RTP session(s). 1498 Use additional SSRCs for additional Media Sources: In the cases 1499 where an RTP endpoint needs to transmit additional media streams 1500 of the same media type in the application, with the same 1501 processing requirements at the network and RTP layers, it is 1502 suggested to send them as additional SSRCs in the same RTP 1503 session. For example a telepresence room where there are three 1504 cameras, and each camera captures 2 persons sitting at the table, 1505 sending each camera as its own SSRC within a single RTP session is 1506 suggested. 1508 Use additional RTP sessions for streams with different requirements: 1510 When media streams have different processing requirements from the 1511 network or the RTP layer at the endpoints, it is suggested that 1512 the different types of streams are put in different RTP sessions. 1513 This includes the case where different participants want different 1514 subsets of the set of RTP streams. 1516 When using multiple RTP Sessions use grouping: When using Multiple 1517 RTP session solutions, it is suggested to explicitly group the 1518 involved RTP sessions when needed using the signalling mechanism, 1519 for example The Session Description Protocol (SDP) Grouping 1520 Framework. [RFC5888], using some appropriate grouping semantics. 1522 RTP/RTCP Extensions May Support Additional SSRCs as well as Multiple 1523 RTP sessions: 1524 When defining an RTP or RTCP extension, the creator needs to 1525 consider if this extension is applicable to usage with additional 1526 SSRCs and Multiple RTP sessions. Any extension intended to be 1527 generic is suggested to support both. Applications that are not 1528 as generally applicable will have to consider if interoperability 1529 is better served by defining a single solution or providing both 1530 options. 1532 Transport Support Extensions: When defining new RTP/RTCP extensions 1533 intended for transport support, like the retransmission or FEC 1534 mechanisms, they are expected to include support for both 1535 additional SSRCs and multiple RTP sessions so that application 1536 developers can choose freely from the set of mechanisms without 1537 concerning themselves with which of the multiplexing choices a 1538 particular solution supports. 1540 7. Open Issues 1542 There are currently some issues that needs to be resolved before this 1543 document is ready to be published: 1545 1. Use of RFC 2119 language is section on SSRC (3.2.2) 1547 2. Better align source and sink terminolgy with Taxonomy 1548 (Section 3.2.2) 1550 3. Section on Binding Related Sources (Section 3.4.3) needs more 1551 text on usage of the RID and other SDES based mechanisms created. 1553 4. Does the MSID text need to be updated and clarified based on the 1554 evoulsion of MSID since previous version. Section 3.4.3. 1556 5. Section 4.1.2 (RTP Translator Interworking) needs to be updated. 1557 It is not obvious that it is a natural requirement that the same 1558 multiplexing is used. This needs better discussion. 1560 6. Refernce to Ta for ICE being 20 ms will need to be updated due to 1561 ICE update. 1563 7. In Section 4.3.2 (Key Management for Multi-party session) the 1564 reference to EKT needs to be updated, question is if draft-ietf- 1565 perc-ekt-diet is appropriate here? 1567 8. Can we find a more approriate term than archetypes? 1569 9. 1571 8. IANA Considerations 1573 This document makes no request of IANA. 1575 Note to RFC Editor: this section can be removed on publication as an 1576 RFC. 1578 9. Security Considerations 1580 There is discussion of the security implications of choosing SSRC vs 1581 Multiple RTP session in Section 4.3. 1583 10. References 1585 10.1. Normative References 1587 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1588 Jacobson, "RTP: A Transport Protocol for Real-Time 1589 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 1590 July 2003, . 1592 [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and 1593 B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms 1594 for Real-Time Transport Protocol (RTP) Sources", RFC 7656, 1595 DOI 10.17487/RFC7656, November 2015, 1596 . 1598 10.2. Informative References 1600 [ALF] Clark, D. and D. Tennenhouse, "Architectural 1601 Considerations for a New Generation of Protocols", SIGCOMM 1602 Symposium on Communications Architectures and 1603 Protocols (Philadelphia, Pennsylvania), pp. 200--208, IEEE 1604 Computer Communications Review, Vol. 20(4), September 1605 1990. 1607 [I-D.ietf-avt-srtp-ekt] 1608 Wing, D., McGrew, D., and K. Fischer, "Encrypted Key 1609 Transport for Secure RTP", draft-ietf-avt-srtp-ekt-03 1610 (work in progress), October 2011. 1612 [I-D.ietf-avtcore-multi-media-rtp-session] 1613 Westerlund, M., Perkins, C., and J. Lennox, "Sending 1614 Multiple Types of Media in a Single RTP Session", draft- 1615 ietf-avtcore-multi-media-rtp-session-13 (work in 1616 progress), December 2015. 1618 [I-D.ietf-mmusic-msid] 1619 Alvestrand, H., "WebRTC MediaStream Identification in the 1620 Session Description Protocol", draft-ietf-mmusic-msid-16 1621 (work in progress), February 2017. 1623 [I-D.ietf-mmusic-rid] 1624 Thatcher, P., Zanaty, M., Nandakumar, S., Burman, B., 1625 Roach, A., and B. Campen, "RTP Payload Format 1626 Restrictions", draft-ietf-mmusic-rid-11 (work in 1627 progress), July 2017. 1629 [I-D.ietf-mmusic-sdp-bundle-negotiation] 1630 Holmberg, C., Alvestrand, H., and C. Jennings, 1631 "Negotiating Media Multiplexing Using the Session 1632 Description Protocol (SDP)", draft-ietf-mmusic-sdp-bundle- 1633 negotiation-39 (work in progress), August 2017. 1635 [I-D.lennox-mmusic-sdp-source-selection] 1636 Lennox, J. and H. Schulzrinne, "Mechanisms for Media 1637 Source Selection in the Session Description Protocol 1638 (SDP)", draft-lennox-mmusic-sdp-source-selection-05 (work 1639 in progress), October 2012. 1641 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1642 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 1643 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1644 DOI 10.17487/RFC2198, September 1997, 1645 . 1647 [RFC2205] Braden, R., Ed., Zhang, L., Berson, S., Herzog, S., and S. 1648 Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 1649 Functional Specification", RFC 2205, DOI 10.17487/RFC2205, 1650 September 1997, . 1652 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 1653 "Definition of the Differentiated Services Field (DS 1654 Field) in the IPv4 and IPv6 Headers", RFC 2474, 1655 DOI 10.17487/RFC2474, December 1998, 1656 . 1658 [RFC2974] Handley, M., Perkins, C., and E. Whelan, "Session 1659 Announcement Protocol", RFC 2974, DOI 10.17487/RFC2974, 1660 October 2000, . 1662 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 1663 A., Peterson, J., Sparks, R., Handley, M., and E. 1664 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 1665 DOI 10.17487/RFC3261, June 2002, 1666 . 1668 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1669 with Session Description Protocol (SDP)", RFC 3264, 1670 DOI 10.17487/RFC3264, June 2002, 1671 . 1673 [RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for 1674 Comfort Noise (CN)", RFC 3389, DOI 10.17487/RFC3389, 1675 September 2002, . 1677 [RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio and 1678 Video Conferences with Minimal Control", STD 65, RFC 3551, 1679 DOI 10.17487/RFC3551, July 2003, 1680 . 1682 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1683 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1684 RFC 3711, DOI 10.17487/RFC3711, March 2004, 1685 . 1687 [RFC3830] Arkko, J., Carrara, E., Lindholm, F., Naslund, M., and K. 1688 Norrman, "MIKEY: Multimedia Internet KEYing", RFC 3830, 1689 DOI 10.17487/RFC3830, August 2004, 1690 . 1692 [RFC4103] Hellstrom, G. and P. Jones, "RTP Payload for Text 1693 Conversation", RFC 4103, DOI 10.17487/RFC4103, June 2005, 1694 . 1696 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1697 Description Protocol", RFC 4566, DOI 10.17487/RFC4566, 1698 July 2006, . 1700 [RFC4568] Andreasen, F., Baugher, M., and D. Wing, "Session 1701 Description Protocol (SDP) Security Descriptions for Media 1702 Streams", RFC 4568, DOI 10.17487/RFC4568, July 2006, 1703 . 1705 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 1706 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 1707 DOI 10.17487/RFC4588, July 2006, 1708 . 1710 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 1711 "Codec Control Messages in the RTP Audio-Visual Profile 1712 with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, 1713 February 2008, . 1715 [RFC5109] Li, A., Ed., "RTP Payload Format for Generic Forward Error 1716 Correction", RFC 5109, DOI 10.17487/RFC5109, December 1717 2007, . 1719 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 1720 Media Attributes in the Session Description Protocol 1721 (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009, 1722 . 1724 [RFC5761] Perkins, C. and M. Westerlund, "Multiplexing RTP Data and 1725 Control Packets on a Single Port", RFC 5761, 1726 DOI 10.17487/RFC5761, April 2010, 1727 . 1729 [RFC5764] McGrew, D. and E. Rescorla, "Datagram Transport Layer 1730 Security (DTLS) Extension to Establish Keys for the Secure 1731 Real-time Transport Protocol (SRTP)", RFC 5764, 1732 DOI 10.17487/RFC5764, May 2010, 1733 . 1735 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 1736 Protocol (SDP) Grouping Framework", RFC 5888, 1737 DOI 10.17487/RFC5888, June 2010, 1738 . 1740 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 1741 "RTP Payload Format for Scalable Video Coding", RFC 6190, 1742 DOI 10.17487/RFC6190, May 2011, 1743 . 1745 [RFC6465] Ivov, E., Ed., Marocco, E., Ed., and J. Lennox, "A Real- 1746 time Transport Protocol (RTP) Header Extension for Mixer- 1747 to-Client Audio Level Indication", RFC 6465, 1748 DOI 10.17487/RFC6465, December 2011, 1749 . 1751 [RFC7201] Westerlund, M. and C. Perkins, "Options for Securing RTP 1752 Sessions", RFC 7201, DOI 10.17487/RFC7201, April 2014, 1753 . 1755 [RFC7657] Black, D., Ed. and P. Jones, "Differentiated Services 1756 (Diffserv) and Real-Time Communication", RFC 7657, 1757 DOI 10.17487/RFC7657, November 2015, 1758 . 1760 [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, 1761 DOI 10.17487/RFC7667, November 2015, 1762 . 1764 [RFC7826] Schulzrinne, H., Rao, A., Lanphier, R., Westerlund, M., 1765 and M. Stiemerling, Ed., "Real-Time Streaming Protocol 1766 Version 2.0", RFC 7826, DOI 10.17487/RFC7826, December 1767 2016, . 1769 [RFC8088] Westerlund, M., "How to Write an RTP Payload Format", 1770 RFC 8088, DOI 10.17487/RFC8088, May 2017, 1771 . 1773 [RFC8108] Lennox, J., Westerlund, M., Wu, Q., and C. Perkins, 1774 "Sending Multiple RTP Streams in a Single RTP Session", 1775 RFC 8108, DOI 10.17487/RFC8108, March 2017, 1776 . 1778 Appendix A. Dismissing Payload Type Multiplexing 1780 This section documents a number of reasons why using the payload type 1781 as a multiplexing point for most things related to multiple streams 1782 is unsuitable. If one attempts to use Payload type multiplexing 1783 beyond it's defined usage, that has well known negative effects on 1784 RTP. To use Payload type as the single discriminator for multiple 1785 streams implies that all the different media streams are being sent 1786 with the same SSRC, thus using the same timestamp and sequence number 1787 space. This has many effects: 1789 1. Putting restraint on RTP timestamp rate for the multiplexed 1790 media. For example, media streams that use different RTP 1791 timestamp rates cannot be combined, as the timestamp values need 1792 to be consistent across all multiplexed media frames. Thus 1793 streams are forced to use the same rate. When this is not 1794 possible, Payload Type multiplexing cannot be used. 1796 2. Many RTP payload formats can fragment a media object over 1797 multiple packets, like parts of a video frame. These payload 1798 formats need to determine the order of the fragments to 1799 correctly decode them. Thus it is important to ensure that all 1800 fragments related to a frame or a similar media object are 1801 transmitted in sequence and without interruptions within the 1802 object. This can relatively simple be solved on the sender side 1803 by ensuring that the fragments of each media stream are sent in 1804 sequence. 1806 3. Some media formats require uninterrupted sequence number space 1807 between media parts. These are media formats where any missing 1808 RTP sequence number will result in decoding failure or invoking 1809 of a repair mechanism within a single media context. The text/ 1810 T140 payload format [RFC4103] is an example of such a format. 1811 These formats will need a sequence numbering abstraction 1812 function between RTP and the individual media stream before 1813 being used with Payload Type multiplexing. 1815 4. Sending multiple streams in the same sequence number space makes 1816 it impossible to determine which Payload Type and thus which 1817 stream a packet loss relates to. 1819 5. If RTP Retransmission [RFC4588] is used and there is a loss, it 1820 is possible to ask for the missing packet(s) by SSRC and 1821 sequence number, not by Payload Type. If only some of the 1822 Payload Type multiplexed streams are of interest, there is no 1823 way of telling which missing packet(s) belong to the interesting 1824 stream(s) and all lost packets need be requested, wasting 1825 bandwidth. 1827 6. The current RTCP feedback mechanisms are built around providing 1828 feedback on media streams based on stream ID (SSRC), packet 1829 (sequence numbers) and time interval (RTP Timestamps). There is 1830 almost never a field to indicate which Payload Type is reported, 1831 so sending feedback for a specific media stream is difficult 1832 without extending existing RTCP reporting. 1834 7. The current RTCP media control messages [RFC5104] specification 1835 is oriented around controlling particular media flows, i.e. 1836 requests are done addressing a particular SSRC. Such mechanisms 1837 would need to be redefined to support Payload Type multiplexing. 1839 8. The number of payload types are inherently limited. 1840 Accordingly, using Payload Type multiplexing limits the number 1841 of streams that can be multiplexed and does not scale. This 1842 limitation is exacerbated if one uses solutions like RTP and 1843 RTCP multiplexing [RFC5761] where a number of payload types are 1844 blocked due to the overlap between RTP and RTCP. 1846 9. At times, there is a need to group multiplexed streams and this 1847 is currently possible for RTP Sessions and for SSRC, but there 1848 is no defined way to group Payload Types. 1850 10. It is currently not possible to signal bandwidth requirements 1851 per media stream when using Payload Type Multiplexing. 1853 11. Most existing SDP media level attributes cannot be applied on a 1854 per Payload Type level and would require re-definition in that 1855 context. 1857 12. A legacy endpoint that does not understand the indication that 1858 different RTP payload types are different media streams might be 1859 slightly confused by the large amount of possibly overlapping or 1860 identically defined RTP Payload Types. 1862 Appendix B. Signalling considerations 1864 Signalling is not an architectural consideration for RTP itself, so 1865 this discussion has been moved to an appendix. However, it is hugely 1866 important for anyone building complete applications, so it is 1867 deserving of discussion. 1869 The issues raised here need to be addressed in the WGs that deal with 1870 signalling; they cannot be addressed by tweaking, extending or 1871 profiling RTP. 1873 B.1. Signalling Aspects 1875 There exist various signalling solutions for establishing RTP 1876 sessions. Many are SDP [RFC4566] based, however SDP functionality is 1877 also dependent on the signalling protocols carrying the SDP. Where 1878 RTSP [RFC7826] and SAP [RFC2974] both use SDP in a declarative 1879 fashion, while SIP [RFC3261] uses SDP with the additional definition 1880 of Offer/Answer [RFC3264]. The impact on signalling and especially 1881 SDP needs to be considered as it can greatly affect how to deploy a 1882 certain multiplexing point choice. 1884 B.1.1. Session Oriented Properties 1886 One aspect of the existing signalling is that it is focused around 1887 sessions, or at least in the case of SDP the media description. 1888 There are a number of things that are signalled on a session level/ 1889 media description but those are not necessarily strictly bound to an 1890 RTP session and could be of interest to signal specifically for a 1891 particular media stream (SSRC) within the session. The following 1892 properties have been identified as being potentially useful to signal 1893 not only on RTP session level: 1895 o Bitrate/Bandwidth exist today only at aggregate or a common any 1896 media stream limit, unless either codec-specific bandwidth 1897 limiting or RTCP signalling using TMMBR is used. 1899 o Which SSRC that will use which RTP Payload Types (this will be 1900 visible from the first media packet, but is sometimes useful to 1901 know before packet arrival). 1903 Some of these issues are clearly SDP's problem rather than RTP 1904 limitations. However, if the aim is to deploy an solution using 1905 additional SSRCs that contains several sets of media streams with 1906 different properties (encoding/packetization parameter, bit-rate, 1907 etc.), putting each set in a different RTP session would directly 1908 enable negotiation of the parameters for each set. If insisting on 1909 additional SSRC only, a number of signalling extensions are needed to 1910 clarify that there are multiple sets of media streams with different 1911 properties and that they need in fact be kept different, since a 1912 single set will not satisfy the application's requirements. 1914 For some parameters, such as resolution and framerate, a SSRC-linked 1915 mechanism has been proposed: 1916 [I-D.lennox-mmusic-sdp-source-selection]. 1918 B.1.2. SDP Prevents Multiple Media Types 1920 SDP chose to use the m= line both to delineate an RTP session and to 1921 specify the top level of the MIME media type; audio, video, text, 1922 image, application. This media type is used as the top-level media 1923 type for identifying the actual payload format bound to a particular 1924 payload type using the rtpmap attribute. This binding has to be 1925 loosened in order to use SDP to describe RTP sessions containing 1926 multiple MIME top level types. 1928 There is an accepted WG item in the MMUSIC WG to define how multiple 1929 media lines describe a single underlying transport 1930 [I-D.ietf-mmusic-sdp-bundle-negotiation] and thus it becomes possible 1931 in SDP to define one RTP session with media types having different 1932 MIME top level types. 1934 B.1.3. Signalling Media Stream Usage 1936 Media streams being transported in RTP has some particular usage in 1937 an RTP application. This usage of the media stream is in many 1938 applications so far implicitly signalled. For example, an 1939 application might choose to take all incoming audio RTP streams, mix 1940 them and play them out. However, in more advanced applications that 1941 use multiple media streams there will be more than a single usage or 1942 purpose among the set of media streams being sent or received. RTP 1943 applications will need to signal this usage somehow. The signalling 1944 used will have to identify the media streams affected by their RTP- 1945 level identifiers, which means that they have to be identified either 1946 by their session or by their SSRC + session. 1948 In some applications, the receiver cannot utilise the media stream at 1949 all before it has received the signalling message describing the 1950 media stream and its usage. In other applications, there exists a 1951 default handling that is appropriate. 1953 If all media streams in an RTP session are to be treated in the same 1954 way, identifying the session is enough. If SSRCs in a session are to 1955 be treated differently, signalling needs to identify both the session 1956 and the SSRC. 1958 If this signalling affects how any RTP central node, like an RTP 1959 mixer or translator that selects, mixes or processes streams, treats 1960 the streams, the node will also need to receive the same signalling 1961 to know how to treat media streams with different usage in the right 1962 fashion. 1964 Authors' Addresses 1966 Magnus Westerlund 1967 Ericsson 1968 Torshamsgatan 23 1969 SE-164 80 Kista 1970 Sweden 1972 Phone: +46 10 714 82 87 1973 Email: magnus.westerlund@ericsson.com 1975 Bo Burman 1976 Ericsson 1977 Farogatan 6 1978 SE-164 80 Kista 1979 Sweden 1981 Phone: +46 10 714 13 11 1982 Email: bo.burman@ericsson.com 1984 Colin Perkins 1985 University of Glasgow 1986 School of Computing Science 1987 Glasgow G12 8QQ 1988 United Kingdom 1990 Email: csp@csperkins.org 1992 Harald Tveit Alvestrand 1993 Google 1994 Kungsbron 2 1995 Stockholm 11122 1996 Sweden 1998 Email: harald@alvestrand.no 1999 Roni Even 2000 Huawei 2002 Email: roni.even@huawei.com 2004 Hui Zheng 2005 Huawei 2007 Email: marvin.zhenghui@huawei.com