idnits 2.17.1 draft-ietf-mmusic-sdp-simulcast-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 13, 2017) is 2600 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-15) exists of draft-ietf-mmusic-rid-09 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-36 == Outdated reference: A later version (-19) exists of draft-ietf-mmusic-sdp-mux-attributes-16 ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) == Outdated reference: A later version (-12) exists of draft-ietf-avtcore-multiplex-guidelines-03 -- Obsolete informational reference (is this intentional?): RFC 5285 (Obsoleted by RFC 8285) Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group B. Burman 3 Internet-Draft M. Westerlund 4 Intended status: Standards Track Ericsson 5 Expires: September 14, 2017 S. Nandakumar 6 M. Zanaty 7 Cisco 8 March 13, 2017 10 Using Simulcast in SDP and RTP Sessions 11 draft-ietf-mmusic-sdp-simulcast-08 13 Abstract 15 In some application scenarios it may be desirable to send multiple 16 differently encoded versions of the same media source in different 17 RTP streams. This is called simulcast. This document describes how 18 to accomplish simulcast in RTP and how to signal it in SDP. The 19 described solution uses an RTP/RTCP identification method to identify 20 RTP streams belonging to the same media source, and makes an 21 extension to SDP to relate those RTP streams as being different 22 simulcast formats of that media source. The SDP extension consists 23 of a new media level SDP attribute that expresses capability to send 24 and/or receive simulcast RTP streams. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on September 14, 2017. 43 Copyright Notice 45 Copyright (c) 2017 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 62 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 63 2.2. Requirements Language . . . . . . . . . . . . . . . . . . 5 64 3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 5 65 3.1. Reaching a Diverse Set of Receivers . . . . . . . . . . . 6 66 3.2. Application Specific Media Source Handling . . . . . . . 7 67 3.3. Receiver Media Source Preferences . . . . . . . . . . . . 7 68 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 7 69 5. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 9 70 6. Detailed Description . . . . . . . . . . . . . . . . . . . . 10 71 6.1. Simulcast Attribute . . . . . . . . . . . . . . . . . . . 10 72 6.2. Simulcast Capability . . . . . . . . . . . . . . . . . . 11 73 6.3. Offer/Answer Use . . . . . . . . . . . . . . . . . . . . 14 74 6.3.1. Generating the Initial SDP Offer . . . . . . . . . . 14 75 6.3.2. Creating the SDP Answer . . . . . . . . . . . . . . . 14 76 6.3.3. Offerer Processing the SDP Answer . . . . . . . . . . 15 77 6.3.4. Modifying the Session . . . . . . . . . . . . . . . . 16 78 6.4. Use with Declarative SDP . . . . . . . . . . . . . . . . 16 79 6.5. Relating Simulcast Streams . . . . . . . . . . . . . . . 16 80 6.6. Signaling Examples . . . . . . . . . . . . . . . . . . . 17 81 6.6.1. Single-Source Client . . . . . . . . . . . . . . . . 17 82 6.6.2. Multi-Source Client . . . . . . . . . . . . . . . . . 18 83 7. RTP Aspects . . . . . . . . . . . . . . . . . . . . . . . . . 21 84 7.1. Outgoing from Endpoint with Media Source . . . . . . . . 21 85 7.2. RTP Middlebox to Receiver . . . . . . . . . . . . . . . . 21 86 7.2.1. Media-Switching Mixer . . . . . . . . . . . . . . . . 23 87 7.2.2. Selective Forwarding Middlebox . . . . . . . . . . . 24 88 7.3. RTP Middlebox to RTP Middlebox . . . . . . . . . . . . . 25 89 8. Network Aspects . . . . . . . . . . . . . . . . . . . . . . . 26 90 8.1. Bitrate Adaptation . . . . . . . . . . . . . . . . . . . 26 91 9. Limitation . . . . . . . . . . . . . . . . . . . . . . . . . 26 92 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 93 11. Security Considerations . . . . . . . . . . . . . . . . . . . 28 94 12. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 28 95 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 28 96 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 97 14.1. Normative References . . . . . . . . . . . . . . . . . . 28 98 14.2. Informative References . . . . . . . . . . . . . . . . . 29 99 Appendix A. Changes From Earlier Versions . . . . . . . . . . . 32 100 A.1. Modifications Between WG Version -07 and -08 . . . . . . 32 101 A.2. Modifications Between WG Version -06 and -07 . . . . . . 32 102 A.3. Modifications Between WG Version -05 and -06 . . . . . . 32 103 A.4. Modifications Between WG Version -04 and -05 . . . . . . 33 104 A.5. Modifications Between WG Version -03 and -04 . . . . . . 33 105 A.6. Modifications Between WG Version -02 and -03 . . . . . . 34 106 A.7. Modifications Between WG Version -01 and -02 . . . . . . 34 107 A.8. Modifications Between WG Version -00 and -01 . . . . . . 34 108 A.9. Modifications Between Individual Version -00 and WG 109 Version -00 . . . . . . . . . . . . . . . . . . . . . . . 34 110 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 34 112 1. Introduction 114 Most of today's multiparty video conference solutions make use of 115 centralized servers to reduce the bandwidth and CPU consumption in 116 the endpoints. Those servers receive RTP streams from each 117 participant and send some suitable set of possibly modified RTP 118 streams to the rest of the participants, which usually have 119 heterogeneous capabilities (screen size, CPU, bandwidth, codec, etc). 120 One of the biggest issues is how to perform RTP stream adaptation to 121 different participants' constraints with the minimum possible impact 122 on both video quality and server performance. 124 Simulcast is defined in this memo as the act of simultaneously 125 sending multiple different encoded streams of the same media source, 126 e.g. the same video source encoded with different video encoder types 127 or image resolutions. This can be done in several ways and for 128 different purposes. This document focuses on the case where it is 129 desirable to provide a media source as multiple encoded streams over 130 RTP [RFC3550] towards an intermediary so that the intermediary can 131 provide the wanted functionality by selecting which RTP stream(s) to 132 forward to other participants in the session, and more specifically 133 how the identification and grouping of the involved RTP streams are 134 done. 136 The intended scope of the defined mechanism is to support negotiation 137 and usage of simulcast when using SDP offer/answer and media 138 transport over RTP. The media transport topologies considered are 139 point to point RTP sessions as well as centralized multi-party RTP 140 sessions, where a media sender will provide the simulcasted streams 141 to an RTP middlebox or endpoint, and middleboxes may further 142 distribute the simulcast streams to other middleboxes or endpoints. 144 Usage of multicast or broadcast transport is out of scope and left 145 for future extension. 147 This document describes a few scenarios where it is motivated to use 148 simulcast, and also defines the needed RTP/RTCP and SDP signaling for 149 it. 151 2. Definitions 153 2.1. Terminology 155 This document makes use of the terminology defined in RTP Taxonomy 156 [RFC7656], and RTP Topologies [RFC7667]. The following terms are 157 especially noted or here defined: 159 RTP Mixer: An RTP middle node, defined in [RFC7667] (Section 3.6 to 160 3.9). 162 RTP Switch: A common short term for the terms "switching RTP mixer", 163 "source projecting middlebox", and "video switching MCU" as 164 discussed in [RFC7667]. 166 Simulcast Stream: One encoded stream or dependent stream from a set 167 of concurrently transmitted encoded streams and optional dependent 168 streams, all sharing a common media source, as defined in 169 [RFC7656]. For example, HD and thumbnail video simulcast versions 170 of a single media source sent concurrently as separate RTP 171 Streams. 173 Simulcast Format: Different formats of a simulcast stream serve the 174 same purpose as alternative RTP payload types in non-simulcast 175 SDP: to allow multiple alternative media formats for a given RTP 176 stream. As for multiple RTP payload types on the m-line in offer/ 177 answer [RFC3264], any one of the negotiated alternative formats 178 can be used in a single RTP stream at a given point in time, but 179 not more than one (based on RTP timestamp). What format is used 180 can change dynamically from one RTP packet to another. 182 Simulcast Stream Identifier (SCID): The identification value used to 183 refer to an individual simulcast format, identical to the "rid-id" 184 identification value for an RTP Payload Format Restriction 185 [I-D.ietf-mmusic-rid] and the corresponding content of 186 "RtpStreamId" RTCP SDES Item [I-D.ietf-avtext-rid]. 188 2.2. Requirements Language 190 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 191 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 192 document are to be interpreted as described in RFC 2119 [RFC2119]. 194 3. Use Cases 196 Many use cases of simulcast as described in this document relate to a 197 multi-party communication session where one or more central nodes are 198 used to adapt the view of the communication session towards 199 individual participants, and facilitate the media transport between 200 participants. Thus, these cases target the RTP Mixer type of 201 topology. 203 There are two principle approaches for an RTP Mixer to provide this 204 adapted view of the communication session to each receiving 205 participant: 207 o Transcoding (decoding and re-encoding) received RTP streams with 208 characteristics adapted to each receiving participant. This often 209 include mixing or composition of media sources from multiple 210 participants into a mixed media source originated by the RTP 211 Mixer. The main advantage of this approach is that it achieves 212 close to optimal adaptation to individual receiving participants. 213 The main disadvantages are that it can be very computationally 214 expensive to the RTP Mixer, typically degrades media Quality of 215 Experience (QoE) such as end-to-end delay for the receiving 216 participants, and requires RTP Mixer access to media content. 218 o Switching a subset of all received RTP streams or sub-streams to 219 each receiving participant, where the used subset is typically 220 specific to each receiving participant. The main advantages of 221 this approach are that it is computationally cheap to the RTP 222 Mixer, has very limited impact on media QoE, and does not require 223 RTP Mixer (full) access to media content. The main disadvantage 224 is that it can be difficult to combine a subset of received RTP 225 streams into a perfect fit to the resource situation of a 226 receiving participant. 228 The use of simulcast relates to the latter approach, where it is more 229 important to reduce the load on the RTP Mixer and/or minimize QoE 230 impact than to achieve an optimal adaptation of resource usage. 232 3.1. Reaching a Diverse Set of Receivers 234 The media sources provided by a sending participant potentially need 235 to reach several receiving participants that differ in terms of 236 available resources. The receiver resources that typically differ 237 include, but are not limited to: 239 Codec: This includes codec type (such as SDP MIME type) and can 240 include codec configuration options (e.g. SDP fmtp parameters). 241 A couple of codec resources that differ only in codec 242 configuration will be "different" if they are somehow not 243 "compatible", like if they differ in video codec profile, or the 244 transport packetization configuration. 246 Sampling: This relates to how the media source is sampled, in 247 spatial as well as in temporal domain. For video streams, spatial 248 sampling affects image resolution and temporal sampling affects 249 video frame rate. For audio, spatial sampling relates to the 250 number of audio channels and temporal sampling affects audio 251 bandwidth. This may be used to suit different rendering 252 capabilities or needs at the receiving endpoints, as well as a 253 method to achieve different transport capabilities, bitrates and 254 eventually QoE by controlling the amount of source data. 256 Bitrate: This relates to the amount of bits spent per second to 257 transmit the media source as an RTP stream, which typically also 258 affects the Quality of Experience (QoE) for the receiving user. 260 Letting the sending participant create a simulcast of a few 261 differently configured RTP streams per media source can be a good 262 tradeoff when using an RTP switch as middlebox, instead of sending a 263 single RTP stream and using an RTP mixer to create individual 264 transcodings to each receiving participant. 266 This requires that the receiving participants can be categorized in 267 terms of available resources and that the sending participant can 268 choose a matching configuration for a single RTP stream per category 269 and media source. 271 For example, assume for simplicity a set of receiving participants 272 that differ only in that some have support to receive Codec A, and 273 the others have support to receive Codec B. Further assume that the 274 sending participant can send both Codec A and B. It can then reach 275 all receivers by creating two simulcasted RTP streams from each media 276 source; one for Codec A and one for Codec B. 278 In another simple example, a set of receiving participants differ 279 only in screen resolution; some are able to display video with at 280 most 360p resolution and some support 720p resolution. A sending 281 participant can then reach all receivers with best possible 282 resolution by creating a simulcast of RTP streams with 360p and 720p 283 resolution for each sent video media source. 285 In more elaborate cases, the receiving participants differ both in 286 available sampling and bitrate, and maybe also codec, and it is up to 287 the RTP switch to find a good trade-off in which simulcasted stream 288 to choose for each intended receiver. It is also the responsibility 289 of the RTP switch to negotiate a good fit of simulcast streams with 290 the sending participant. 292 The maximum number of simulcasted RTP streams that can be sent is 293 mainly limited by the amount of processing and uplink network 294 resources available to the sending participant. 296 3.2. Application Specific Media Source Handling 298 The application logic that controls the communication session may 299 include special handling of some media sources. It is, for example, 300 commonly the case that the media from a sending participant is not 301 sent back to itself. 303 It is also common that a currently active speaker participant is 304 shown in larger size or higher quality than other participants (the 305 sampling or bitrate aspects of Section 3.1). Not sending the active 306 speaker media back to itself means there is some other participant's 307 media that instead has to receive special handling towards the active 308 speaker; typically the previous active speaker. This way, the 309 previously active speaker is needed both in larger size (to current 310 active speaker) and in small size (to the rest of the participants), 311 which can be solved with a simulcast from the previously active 312 speaker to the RTP switch. 314 3.3. Receiver Media Source Preferences 316 The application logic that controls the communication session may 317 allow receiving participants to apply preferences to the 318 characteristics of the RTP stream they receive, for example in terms 319 of the aspects listed in Section 3.1. Sending a simulcast of RTP 320 streams is one way of accommodating receivers with conflicting or 321 otherwise incompatible preferences. 323 4. Requirements 325 The following requirements need to be met to support the use cases in 326 previous sections: 328 REQ-1: Identification: 330 REQ-1.1: It must be possible to identify a set of simulcasted RTP 331 streams as originating from the same media source in SDP 332 signaling. 334 REQ-1.2: An RTP endpoint must be capable of identifying the 335 simulcast stream a received RTP stream is associated with, 336 knowing the content of the SDP signalling. 338 REQ-2: Transport usage. The solution must work when using: 340 REQ-2.1: Legacy SDP with separate media transports per SDP media 341 description. 343 REQ-2.2: Bundled [I-D.ietf-mmusic-sdp-bundle-negotiation] SDP 344 media descriptions. 346 REQ-3: Capability negotiation. It must be possible that: 348 REQ-3.1: Sender can express capability of sending simulcast. 350 REQ-3.2: Receiver can express capability of receiving simulcast. 352 REQ-3.3: Sender can express maximum number of simulcast streams 353 that can be provided. 355 REQ-3.4: Receiver can express maximum number of simulcast streams 356 that can be received. 358 REQ-3.5: Sender can detail the characteristics of the simulcast 359 streams that can be provided. 361 REQ-3.6: Receiver can detail the characteristics of the simulcast 362 streams that it prefers to receive. 364 REQ-4: Distinguishing features. It must be possible to have 365 different simulcast streams use different codec parameters, as can 366 be expressed by SDP format values and RTP payload types. 368 REQ-5: Compatibility. It must be possible to use simulcast in 369 combination with other RTP mechanisms that generate additional RTP 370 streams: 372 REQ-5.1: RTP Retransmission [RFC4588]. 374 REQ-5.2: RTP Forward Error Correction [RFC5109]. 376 REQ-5.3: Related payload types such as audio Comfort Noise and/or 377 DTMF. 379 REQ-5.4: A single simulcast stream can consist of multiple RTP 380 streams, to support codecs where a dependent stream is 381 dependent on a set of encoded and dependent streams, each 382 potentially carried in their own RTP stream. 384 REQ-6: Interoperability. The solution must be possible to use in: 386 REQ-6.1: Interworking with non-simulcast legacy clients using a 387 single media source per media type. 389 REQ-6.2: WebRTC environment with a single media source per SDP 390 media description. 392 5. Overview 394 As an overview, the above requirements are met by signaling simulcast 395 capability and configurations in SDP [RFC4566]: 397 o An offer or answer can contain a number of simulcast streams, 398 separate for send and receive directions. 400 o An offer or answer can contain multiple, alternative simulcast 401 stream formats in the same fashion as multiple, alternative 402 formats can be offered in a media description. 404 o A single media source per SDP media description is assumed, which 405 is aligned with the concepts defined in [RFC7656] and will 406 specifically work in a WebRTC context, both with and without 407 BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation] grouping. 409 o The codec configuration for a simulcast stream is expressed 410 through use of separately specified RTP payload format 411 restrictions [I-D.ietf-mmusic-rid] with an associated RTP-level 412 identification mechanism [I-D.ietf-avtext-rid] to identify which 413 RTP payload format restrictions an RTP stream adheres to. This 414 complements and effectively extends simulcast stream 415 identification and configuration possibilities that could be 416 provided by using only SDP formats as identifier. Use of multiple 417 RTP streams with the same (non-redundancy) media type in the 418 context of a single media source, where those RTP streams are 419 using different RtpStreamId, is a strong but not totally 420 unambiguous indication of those RTP streams being part of a 421 simulcast. 423 o It is possible to use source-specific signaling [RFC5576] with the 424 proposed solution, but it is only in certain cases possible to 425 learn from that signaling which SSRC will belong to a particular 426 simulcast stream. 428 6. Detailed Description 430 This section further details the overview above (Section 5). First, 431 formal syntax is provided (Section 6.1), followed by the rest of the 432 SDP attribute definition in Section 6.2. Relating Simulcast Streams 433 (Section 6.5) provides the definition of the RTP/RTCP mechanisms 434 used. The section is concluded with a number of examples. 436 6.1. Simulcast Attribute 438 This document defines a new SDP media-level "a=simulcast" attribute, 439 with value according to the following ABNF [RFC5234] syntax: 441 sc-value = ( sc-send [SP sc-recv] ) / ( sc-recv [SP sc-send] ) 442 sc-send = "send" SP sc-str-list 443 sc-recv = "recv" SP sc-str-list 444 sc-str-list = sc-alt-list *( ";" sc-alt-list ) 445 sc-alt-list = sc-id *( "," sc-id ) 446 sc-id-paused = "~" 447 sc-id = [sc-id-paused] rid-id 448 ; SP defined in [RFC5234] 449 ; rid-id defined in [I-D.ietf-mmusic-rid] 451 Figure 1: ABNF for Simulcast Value 453 The "a=simulcast" attribute has a parameter in the form of one or two 454 simulcast stream descriptions, each consisting of a direction ("send" 455 or "recv"), followed by a list of one or more simulcast streams. 456 Each simulcast stream consists of one or more alternative simulcast 457 formats. Each simulcast format is identified by a simulcast stream 458 identifier (SCID). The SCID MUST have the form of an RTP stream 459 identifier, as described by RTP Payload Format Restrictions 460 [I-D.ietf-mmusic-rid]. 462 In the list of simulcast streams, each simulcast stream is separated 463 by a semicolon (";"). Each simulcast stream can in turn be offered 464 in one or more alternative formats, represented by SCIDs, separated 465 by a comma (","). Each SCID can also be specified as initially 466 paused [RFC7728], indicated by prepending a "~" to the SCID. The 467 reason to allow separate initial pause states for each SCID is that 468 pause capability can be specified individually for each RTP payload 469 type referenced by an SCID. Since pause capability specified via the 470 "a=rtcp-fb" attribute and SCID specified by "a=rid" can refer to 471 common payload types, it is unfeasible to pause streams with SCID 472 where any of the related RTP payload type(s) do not have pause 473 capability. 475 Examples: 477 a=simulcast:send 1,2,3;~4,~5 recv 6;~7,~8 478 a=simulcast:recv 1;4,5 send 6;7 480 Figure 2: Simulcast Examples 482 Above are two examples of different "a=simulcast" lines. 484 The first line is an example offer to send two simulcast streams and 485 to receive two simulcast streams. The first simulcast stream in send 486 direction can be sent in three different alternative formats (SCID 1, 487 2, 3), and the second simulcast stream in send direction can be sent 488 in two different alternative formats (SCID 4, 5). Both of the second 489 simulcast stream alternative formats in send direction are offered as 490 initially paused. The first simulcast stream in receive direction 491 has no alternative formats (SCID 6). The second simulcast stream in 492 receive direction has two alternative formats (SCID 7, 8) that are 493 both offered as initially paused. 495 The second line is an example answer to the first line, accepting to 496 send and receive the two offered simulcast streams, however send and 497 receive directions are specified in opposite order compared to the 498 first line, which lets the answer keep the same order of simulcast 499 streams in the SDP as in the offer, for convenience, even though 500 directionality is reversed. This example answer has removed all 501 offered alternative formats for the first simulcast stream (keeping 502 only SCID 1), but kept alternative formats for the second simulcast 503 stream in receive direction (4, 5). The answer thus accepts to send 504 two simulcast streams, without alternatives. The answer does not 505 accept initial pause of any simulcast streams, in either direction. 506 More examples can be found in Section 6.6. 508 6.2. Simulcast Capability 510 Simulcast capability is expressed through a new media level SDP 511 attribute, "a=simulcast" (Section 6.1). The meaning of the attribute 512 on SDP session level is undefined, MUST NOT be used by 513 implementations of this specification and MUST be ignored if received 514 on session level. Extensions to this specification MAY define such 515 session level usage. The meaning of including multiple "a=simulcast" 516 lines in a single SDP media description is undefined, MUST NOT be 517 used by implementations of this specification, and any additional 518 "a=simulcast" lines beyond the first in a media description MUST be 519 ignored if received. 521 There are separate and independent sets of simulcast streams in send 522 and receive directions. When listing multiple directions, each 523 direction MUST NOT occur more than once on the same line. 525 Simulcast streams using undefined SCID MUST NOT be used as valid 526 simulcast streams by an RTP stream receiver. The direction for an 527 SCID MUST be aligned with the direction specified for the 528 corresponding RTP stream identifier on the "a=rid" line. 530 The listed number of simulcast streams for a direction sets a limit 531 to the number of supported simulcast streams in that direction. The 532 order of the listed simulcast streams in the "send" direction 533 suggests a proposed order of preference, in decreasing order: the 534 SCID listed first is the most preferred and subsequent streams have 535 progressively lower preference. The order of the listed SCID in the 536 "recv" direction expresses which simulcast streams that are 537 preferred, with the leftmost being most preferred. This can be of 538 importance if the number of actually sent simulcast streams have to 539 be reduced for some reason. 541 SCID that have explicit dependencies [RFC5583] [I-D.ietf-mmusic-rid] 542 to other SCID (even in the same media description) MAY be used. 544 Use of more than a single, alternative simulcast format for a 545 simulcast stream MAY be specified as part of the attribute parameters 546 by expressing the simulcast stream as a comma-separated list of 547 alternative SCID. In this case, it is not possible to align what 548 alternative SCID that are used across different simulcast streams, 549 like requiring all simulcast streams to use SCID alternatives 550 referring to the same codec format. The order of the SCID 551 alternatives within a simulcast stream is significant; the SCID 552 alternatives are listed from (left) most preferred to (right) least 553 preferred. For the use of simulcast, this overrides the normal codec 554 preference as expressed by format type ordering on the "m=" line, 555 using regular SDP rules. This is to enable a separation of general 556 codec preferences and simulcast stream configuration preferences. 558 A simulcast stream can use a codec defined such that the same RTP 559 SSRC can change RTP payload type multiple times during a session, 560 possibly even on a per-packet basis. A typical example can be a 561 speech codec that makes use of Comfort Noise [RFC3389] and/or DTMF 562 [RFC4733] formats. In those cases, such "related" formats MUST NOT 563 be defined as having their own SCID listed explicitly in the 564 attribute parameters, since they are not strictly simulcast streams 565 of the media source, but rather a specific way of generating the RTP 566 stream of a single simulcast stream with varying RTP payload type. 568 If RTP stream pause/resume [RFC7728] is supported, any SCID MAY be 569 prefixed by a "~" character to indicate that the corresponding 570 simulcast stream is initially paused already from start of the RTP 571 session. In this case, support for RTP stream pause/resume MUST also 572 be included under the same "m=" line where "a=simulcast" is included. 573 All RTP payload types related to such initially paused simulcast 574 stream MUST be listed in the SDP as pause/resume capable as specified 575 by [RFC7728], e.g. by using the "*" wildcard format for "a=rtcp-fb". 577 An initially paused simulcast stream in "send" direction MUST be 578 considered equivalent to an unsolicited locally paused stream, and be 579 handled accordingly. Initially paused simulcast streams are resumed 580 as described by the RTP pause/resume specification. An RTP stream 581 receiver that wishes to resume an unsolicited locally paused stream 582 needs to know the SSRC of that stream. The SSRC of an initially 583 paused simulcast stream can be obtained from an RTP stream sender 584 RTCP Sender Report (SR) including both the desired SSRC as "SSRC of 585 sender", and the SCID value in an RtpStreamId RTCP SDES item 586 [I-D.ietf-avtext-rid]. 588 Including an initially paused simulcast stream in "recv" direction in 589 an SDP towards an RTP sender, SHOULD cause the remote RTP sender to 590 put the stream as unsolicited locally paused, unless there are other 591 RTP stream receivers that do not mark the simulcast stream as 592 initially paused. The reason to require an initially paused "recv" 593 stream to be considered locally paused by the remote RTP sender, 594 instead of making it equivalent to implicitly sending a pause 595 request, is because the pausing RTP sender cannot know which 596 receiving SSRC owns the restriction when TMMBR/TMMBN are used for 597 pause/resume signaling since the RTP receiver's SSRC in send 598 direction is sometimes not yet known. 600 Use of the redundant audio data [RFC2198] format could be seen as a 601 form of simulcast for loss protection purposes, but is not considered 602 conflicting with the mechanisms described in this memo and MAY 603 therefore be used as any other format. In this case the "red" 604 format, rather than the carried formats, SHOULD be the one to list as 605 a simulcast stream on the "a=simulcast" line. 607 The media formats and corresponding characteristics of simulcast 608 streams SHOULD be chosen such that they are different, e.g. as 609 different SDP formats with differing "a=rtpmap" and/or "a=fmtp" 610 lines, or as differently defined RTP payload format restrictions. If 611 this difference is not required, RTP duplication [RFC7104] procedures 612 SHOULD be considered instead of simulcast. To avoid complications in 613 implementations, a single SCID MUST NOT occur more than once per 614 "a=simulcast" line. Note that this does not eliminate use of 615 simulcast as an RTP duplication mechanism, since it is possible to 616 define multiple different SCID that are effectively equivalent. 618 6.3. Offer/Answer Use 620 Note: The inclusion of "a=simulcast" or the use of simulcast does 621 not change any of the interpretation or Offer/Answer procedures 622 for other SDP attributes, like "a=fmtp" or "a=rid". 624 6.3.1. Generating the Initial SDP Offer 626 An offerer wanting to use simulcast SHALL include the "a=simulcast" 627 attribute in the offer. An offerer listing a set of receive 628 simulcast streams and/or alternative formats as SCID in the offer 629 MUST be prepared to receive RTP streams for any of those simulcast 630 streams and/or alternative formats from the answerer. 632 6.3.2. Creating the SDP Answer 634 An answerer that does not understand the concept of simulcast will 635 also not know the attribute and will remove it in the SDP answer, as 636 defined in existing SDP Offer/Answer [RFC3264] procedures. 637 Similarly, an answerer that receives an offer with the "a=simulcast" 638 attribute on session level SHALL remove it in the answer. An 639 answerer that understands the attribute but receives multiple 640 "a=simulcast" attributes in the same media description and that 641 desires to use simulcast SHALL ignore and remove all but the first in 642 the answer. 644 An answerer that does understand the attribute and that wants to 645 support simulcast in an indicated direction SHALL reverse 646 directionality of the unidirectional direction parameters; "send" 647 becomes "recv" and vice versa, and include it in the answer. 649 An answerer that receives an offer with simulcast containing an 650 "a=simulcast" attribute listing alternative SCID MAY keep all the 651 alternative SCID in the answer, but it MAY also choose to remove any 652 non-desirable alternative SCID in the answer. The answerer MUST NOT 653 add any alternative SCID in send direction in the answer that were 654 not present in the offer receive direction. The answerer MUST be 655 prepared to receive any of the receive direction SCID alternatives, 656 and MAY send any of the send direction alternatives that are kept in 657 the answer. 659 An answerer that receives an offer with simulcast that lists a number 660 of simulcast streams, MAY reduce the number of simulcast streams in 661 the answer, but MUST NOT add simulcast streams. 663 An answerer that receives an offer without RTP stream pause/resume 664 capability MUST NOT mark any simulcast streams as initially paused in 665 the answer. 667 An RTP stream pause/resume capable answerer that receives an offer 668 with RTP stream pause/resume capability MAY mark any SCID that refer 669 to pause/resume capable formats as initially paused in the answer. 671 An answerer that receives indication in an offer of an SCID being 672 initially paused SHOULD mark that SCID as initially paused also in 673 the answer, regardless of direction, unless it has good reason for 674 the SCID not being initially paused. One such reason could, for 675 example, be that the answerer would otherwise initially not receive 676 any media of that type at all. 678 6.3.3. Offerer Processing the SDP Answer 680 An offerer that receives an answer without "a=simulcast" MUST NOT use 681 simulcast towards the answerer. An offerer that receives an answer 682 with "a=simulcast" without any SCID in a specified direction MUST NOT 683 use simulcast in that direction. 685 An offerer that receives an answer where some SCID alternatives are 686 kept MUST be prepared to receive any of the kept send direction SCID 687 alternatives, and MAY send any of the kept receive direction SCID 688 alternatives. 690 An offerer that receives an answer where some of the SCID are removed 691 compared to the offer MAY release the corresponding resources (codec, 692 transport, etc) in its receive direction and MUST NOT send any RTP 693 packets corresponding to the removed SCID. 695 An offerer that offered some of its SCID as initially paused and that 696 receives an answer that does not indicate RTP stream pause/resume 697 capability, MUST NOT initially pause any simulcast streams. 699 An offerer with RTP stream pause/resume capability that receives an 700 answer where some SCID are marked as initially paused, SHOULD 701 initially pause those RTP streams regardless if they were marked as 702 initially paused also in the offer, unless it has good reason for 703 those RTP streams not being initially paused. One such reason could, 704 for example, be that the answerer would otherwise initially not 705 receive any media of that type at all. 707 6.3.4. Modifying the Session 709 Offers and answers inside an existing session follow the rules for 710 initial session negotiation, with the additional restriction that any 711 SCID marked as initially paused in such offer or answer MUST already 712 be paused, thus a new offer/answer MUST NOT replace use of RTP stream 713 pause/resume [RFC7728] in the session. Session modification 714 restrictions in section 6.5 of RTP payload format restrictions 715 [I-D.ietf-mmusic-rid] also apply. 717 6.4. Use with Declarative SDP 719 This document does not define the use of "a=simulcast" in declarative 720 SDP, partly motivated by use of the simulcast format identification 721 [I-D.ietf-mmusic-rid] not being defined for use in declarative SDP. 722 If concrete use cases for simulcast in declarative SDP are identified 723 in the future, we expect that additional specifications will address 724 such use. 726 6.5. Relating Simulcast Streams 728 Simulcast RTP streams MUST be related on RTP level through 729 RtpStreamId [I-D.ietf-avtext-rid], as specified in the SDP 730 "a=simulcast" attribute (Section 6.2) parameters. This is sufficient 731 as long as there is only a single media source per SDP media 732 description. When using BUNDLE 733 [I-D.ietf-mmusic-sdp-bundle-negotiation], where multiple SDP media 734 descriptions jointly specify a single RTP session, the SDES MID 735 identification mechanism in BUNDLE allows relating RTP streams back 736 to individual media descriptions, after which the above described 737 RtpStreamId relations can be used. Use of the RTP header extension 738 [RFC5285] for both MID and RtpStreamId identifications can be 739 important to ensure rapid initial reception, required to correctly 740 interpret and process the RTP streams. Implementers of this 741 specification MUST support the RTCP source description (SDES) item 742 method and SHOULD support RTP header extension method to signal 743 RtpStreamId on RTP level. 745 RTP streams MUST only use a single alternative SCID at a time (based 746 on RTP timestamps), but MAY change format (and SCID) on a per-RTP 747 packet basis. This corresponds to the existing (non-simulcast) SDP 748 offer/answer case when multiple formats are included on the "m=" line 749 in the SDP answer, enabling per-RTP packet change of RTP payload 750 type. 752 6.6. Signaling Examples 754 These examples describe a client to video conference service, using a 755 centralized media topology with an RTP mixer. 757 +---+ +-----------+ +---+ 758 | A |<---->| |<---->| B | 759 +---+ | | +---+ 760 | Mixer | 761 +---+ | | +---+ 762 | F |<---->| |<---->| J | 763 +---+ +-----------+ +---+ 765 Figure 3: Four-party Mixer-based Conference 767 6.6.1. Single-Source Client 769 Alice is calling in to the mixer with a simulcast-enabled client 770 capable of a single media source per media type. The client can send 771 a simulcast of 2 video resolutions and frame rates: HD 1280x720p 772 30fps and thumbnail 320x180p 15fps. This is defined below using the 773 "imageattr" [RFC6236]. In this example, only the "pt" "a=rid" 774 parameter is used, effectively achieving a 1:1 mapping between 775 RtpStreamId and media formats (RTP payload types), to describe 776 simulcast stream formats. Alice's Offer: 778 v=0 779 o=alice 2362969037 2362969040 IN IP4 192.0.2.156 780 s=Simulcast Enabled Client 781 t=0 0 782 c=IN IP4 192.0.2.156 783 m=audio 49200 RTP/AVP 0 784 a=rtpmap:0 PCMU/8000 785 m=video 49300 RTP/AVP 97 98 786 a=rtpmap:97 H264/90000 787 a=rtpmap:98 H264/90000 788 a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000 789 a=fmtp:98 profile-level-id=42c00b; max-fs=240; max-mbps=3600 790 a=imageattr:97 send [x=1280,y=720] recv [x=1280,y=720] 791 a=imageattr:98 send [x=320,y=180] recv [x=320,y=180] 792 a=rid:1 send pt=97 793 a=rid:2 send pt=98 794 a=rid:3 recv pt=97 795 a=simulcast:send 1;2 recv 3 796 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:RtpStreamId 798 Figure 4: Single-Source Simulcast Offer 800 The only thing in the SDP that indicates simulcast capability is the 801 line in the video media description containing the "simulcast" 802 attribute. The included "a=fmtp" and "a=imageattr" parameters 803 indicates that sent simulcast streams can differ in video resolution. 804 The RTP header extension for RtpStreamId is offered to avoid issues 805 with the initial binding between RTP streams (SSRCs) and the 806 RtpStreamId identifying the simulcast stream and its format. 808 The Answer from the server indicates that it too is simulcast 809 capable. Should it not have been simulcast capable, the 810 "a=simulcast" line would not have been present and communication 811 would have started with the media negotiated in the SDP. Also the 812 usage of the RtpStreamId RTP header extension is accepted. 814 v=0 815 o=server 823479283 1209384938 IN IP4 192.0.2.2 816 s=Answer to Simulcast Enabled Client 817 t=0 0 818 c=IN IP4 192.0.2.43 819 m=audio 49672 RTP/AVP 0 820 a=rtpmap:0 PCMU/8000 821 m=video 49674 RTP/AVP 97 98 822 a=rtpmap:97 H264/90000 823 a=rtpmap:98 H264/90000 824 a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000 825 a=fmtp:98 profile-level-id=42c00b; max-fs=240; max-mbps=3600 826 a=imageattr:97 send [x=1280,y=720] recv [x=1280,y=720] 827 a=imageattr:98 send [x=320,y=180] recv [x=320,y=180] 828 a=rid:1 recv pt=97 829 a=rid:2 recv pt=98 830 a=rid:3 send pt=97 831 a=simulcast:recv 1;2 send 3 832 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:RtpStreamId 834 Figure 5: Single-Source Simulcast Answer 836 Since the server is the simulcast media receiver, it reverses the 837 direction of the "simulcast" and "rid" attribute parameters. 839 6.6.2. Multi-Source Client 841 Fred is calling in to the same conference as in the example above 842 with a two-camera, two-display system, thus capable of handling two 843 separate media sources in each direction, where each media source is 844 simulcast-enabled in the send direction. Fred's client is restricted 845 to a single media source per media description. 847 The first two simulcast streams for the first media source use 848 different codecs, H264-SVC [RFC6190] and H264 [RFC6184]. These two 849 simulcast streams also have a temporal dependency. Two different 850 video codecs, VP8 [RFC7741] and H264, are offered as alternatives for 851 the third simulcast stream for the first media source. Only the 852 highest fidelity simulcast stream is sent from start, the lower 853 fidelity streams being initially paused. 855 The second media source is offered with three different simulcast 856 streams. All video streams of this second media source are loss 857 protected by RTP retransmission [RFC4588]. Also here, all but the 858 highest fidelity simulcast stream are initially paused. 860 Fred's client is also using BUNDLE to send all RTP streams from all 861 media descriptions in the same RTP session on a single media 862 transport. Although using many different simulcast streams in this 863 example, the use of RtpStreamId as simulcast stream identification 864 enables use of a low number of RTP payload types. Note that the use 865 of both BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation] and "a=rid" 866 [I-D.ietf-mmusic-rid] recommends using the RTP header extension 867 [RFC5285] for carrying these RTP stream identification fields, which 868 is consequently also included in the SDP. Note also that for 869 "a=rid", the corresponding SDES attribute is named RtpStreamId 870 [I-D.ietf-avtext-rid]. 872 v=0 873 o=fred 238947129 823479223 IN IP6 2001:db8::c000:27d 874 s=Offer from Simulcast Enabled Multi-Source Client 875 t=0 0 876 c=IN IP6 2001:db8::c000:27d 877 a=group:BUNDLE foo bar zen 879 m=audio 49200 RTP/AVP 99 880 a=mid:foo 881 a=rtpmap:99 G722/8000 883 m=video 49600 RTP/AVPF 100 101 103 884 a=mid:bar 885 a=rtpmap:100 H264-SVC/90000 886 a=rtpmap:101 H264/90000 887 a=rtpmap:103 VP8/90000 888 a=fmtp:100 profile-level-id=42400d; max-fs=3600; max-mbps=108000; \ 889 mst-mode=NI-TC 890 a=fmtp:101 profile-level-id=42c00d; max-fs=3600; max-mbps=54000 891 a=fmtp:103 max-fs=900; max-fr=30 892 a=rid:1 send pt=100;max-width=1280;max-height=720;max-fps=60;depend=2 893 a=rid:2 send pt=101;max-width=1280;max-height=720;max-fps=30 894 a=rid:3 send pt=101;max-width=640;max-height=360 895 a=rid:4 send pt=103;max-width=640;max-height=360 896 a=depend:100 lay bar:101 897 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid 898 a=extmap:2 urn:ietf:params:rtp-hdrext:sdes:RtpStreamId 899 a=rtcp-fb:* ccm pause nowait 900 a=simulcast:send 1;2;~4,3 902 m=video 49602 RTP/AVPF 96 104 903 a=mid:zen 904 a=rtpmap:96 VP8/90000 905 a=fmtp:96 max-fs=3600; max-fr=30 906 a=rtpmap:104 rtx/90000 907 a=fmtp:104 apt=96;rtx-time=200 908 a=rid:1 send pt=96;max-fs=921600;max-fps=30 909 a=rid:2 send pt=96;max-fs=614400;max-fps=15 910 a=rid:3 send pt=96;max-fs=230400;max-fps=30 911 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid 912 a=extmap:2 urn:ietf:params:rtp-hdrext:sdes:RtpStreamId 913 a=rtcp-fb:* ccm pause nowait 914 a=simulcast:send 1;~2;~3 916 Figure 6: Fred's Multi-Source Simulcast Offer 918 Note: Empty lines in the SDP above are added only for readability 919 and would not be present in an actual SDP. 921 7. RTP Aspects 923 This section discusses what the different entities in a simulcast 924 media path can expect to happen on RTP level. This is explored from 925 source to sink by starting in an endpoint with a media source that is 926 simulcasted to an RTP middlebox. That RTP middlebox sends media 927 sources both to other RTP middleboxes (cascaded middleboxes), as well 928 as selecting some simulcast format of the media source and sending it 929 to receiving endpoints. Different types of RTP middleboxes and their 930 usage of the different simulcast formats results in several different 931 behaviors. 933 7.1. Outgoing from Endpoint with Media Source 935 The most straightforward simulcast case is the RTP streams being 936 emitted from the endpoint that originates a media source. When 937 simulcast has been negotiated in the sending direction, the endpoint 938 can transmit up to the number of RTP streams needed for the 939 negotiated simulcast streams for that media source. Each RTP stream 940 (SSRC) is identified by associating (Section 6.5) it with an 941 RtpStreamId SDES item, transmitted in RTCP and possibly also as an 942 RTP header extension. In cases where multiple media sources have 943 been negotiated for the same RTP session and thus BUNDLE 944 [I-D.ietf-mmusic-sdp-bundle-negotiation] is used, also the MID SDES 945 item will be sent similarly to the RtpStreamId. 947 Each RTP stream may not be continuously transmitted due to any of the 948 following reasons; temporarily paused using Pause/Resume [RFC7728], 949 sender side application logic temporarily pausing it, or lack of 950 network resources to transmit this simulcast stream. However, all 951 simulcast streams that have been negotiated have active and 952 maintained SSRC (at least in regular RTCP reports), even if no RTP 953 packets are currently transmitted. The relation between an RTP 954 Stream (SSRC) and a particular simulcast stream is not expected to 955 change, except in exceptional situations such as SSRC collisions. At 956 SSRC changes, the usage of MID and RtpStreamId should enable the 957 receiver to correctly identify the RTP streams even after an SSRC 958 change. 960 7.2. RTP Middlebox to Receiver 962 RTP streams in a multi-party RTP session can be used in multiple 963 different ways, when the session utilizes simulcast at least on the 964 media source to middlebox legs. This is to a large degree due to the 965 different RTP middlebox behaviors, but also the needs of the 966 application. This text assumes that the RTP middlebox will select a 967 media source and choose which simulcast stream for that media source 968 to deliver to a specific receiver. In many cases, at most one 969 simulcast stream per media source will be forwarded to a particular 970 receiver at any instant in time, even if the selected simulcast 971 stream may vary. For cases where this does not hold due to 972 application needs, then the RTP stream aspects will fall under the 973 middlebox to middlebox case Section 7.3. 975 The selection of which simulcast streams to forward towards the 976 receiver, is application specific. However, in conferencing 977 applications, active speaker selection is common. In case the number 978 of media sources possible to forward, N, is less than the total 979 amount of media sources available in an multi-media session, the 980 current and previous speakers (up to N in total) are often the ones 981 forwarded. To avoid the need for media specific processing to 982 determine the current speaker(s) in the RTP middlebox, the endpoint 983 providing a media source may include meta data, such as the RTP 984 Header Extension for Client-to-Mixer Audio Level Indication 985 [RFC6464]. 987 The possibilities for stream switching are media type specific, but 988 for media types with significant interframe dependencies in the 989 encoding, like most video coding, the switching needs to be made at 990 suitable switching points in the media stream that breaks or 991 otherwise deals with the dependency structure. Even if switching 992 points can be included periodically, it is common to use mechanisms 993 like Full Intra Requests [RFC5104] to request switching points from 994 the endpoint performing the encoding of the media source. 996 Inclusion of the RtpStreamId SDES item for an SSRC in the middlebox 997 to receiver direction should only occur when use of RtpStreamId has 998 been negotiated in that direction. It is worth noting that one can 999 signal multiple RtpStreamIds when simulcast signalling indicates only 1000 a single simulcast stream, allowing one to use all of the 1001 RtpStreamIds as alternatives for that simulcast stream. One reason 1002 for including the RtpStreamId in the middlebox to receiver direction 1003 for an RTP stream is to let the receiver know which restrictions 1004 apply to the currently delivered RTP stream. In case the RtpStreamId 1005 is negotiated to be used, it is important to remember that the used 1006 identifiers will be specific to each signalling session. Even if the 1007 central entity can attempt to coordinate, it is likely that the 1008 RtpStreamIds need to be translated to the leg specific values. The 1009 below cases will have as base line that RtpStreamId is not used in 1010 the mixer to receiver direction. 1012 7.2.1. Media-Switching Mixer 1014 This section discusses the behavior in cases where the RTP middlebox 1015 behaves like the Media-Switching Mixer (Section 3.6.2) in RTP 1016 Topologies [RFC7667]. The fundamental aspect here is that the media 1017 sources delivered from the middlebox will be the mixer's conceptual 1018 or functional ones. For example, one media source may be the main 1019 speaker in high resolution video, while a number of other media 1020 sources are thumbnails of each participant. 1022 The above results in that the RTP stream produced by the mixer is one 1023 that switches between a number of received incoming RTP streams for 1024 different media sources and in different simulcast versions. The 1025 mixer selects the media source to be sent as one of the RTP streams, 1026 and then selects among the available simulcast streams for the most 1027 appropriate one. The selection criteria include available bandwidth 1028 on the mixer to receiver path and restrictions based on the 1029 functional usage of the RTP stream delivered to the receiver. An 1030 example of the latter, is that it is unnecessary to forward a full HD 1031 video to a receiver if the display area is just a thumbnail. Thus, 1032 restrictions may exist to not allow some simulcast streams to be 1033 forwarded for some of the mixer's media sources. 1035 This will result in a single RTP stream being used for a particular 1036 of the RTP mixer's media sources. This RTP stream is at any point in 1037 time a selection of one particular RTP stream arriving to the mixer, 1038 where the RTP header field values are rewritten to provide a 1039 consistent, single RTP stream. If the RTP mixer doesn't receive any 1040 incoming stream matched to this media source, the SSRC will not 1041 transmit, but be kept alive using RTCP. The SSRC and thus RTP stream 1042 for the mixer's media source is expected to be long term stable. It 1043 will only be changed by signalling or other disruptive events. Note 1044 that although the above talks about a single RTP stream, there can in 1045 some cases be multiple RTP streams carrying the selected simulcast 1046 stream for the originating media source, including repair or other 1047 auxiliary RTP streams. 1049 The mixer may communicate the identity of the originating media 1050 source to the receiver by including the CSRC field with the 1051 originating media source's SSRC value. Note that due to the 1052 possibility that the RTP mixer switches between simulcast versions of 1053 the media source, the CSRC value may change, even if the media source 1054 is kept the same. 1056 It is important to note that any MID SDES item from the originating 1057 media source needs to be removed and not be associated with the RTP 1058 stream's SSRC. This as there is nothing in the signalling between 1059 the mixer and the receiver that is structured around the originating 1060 media sources, only the mixer's media sources. If they would be 1061 associated with the SSRC, the receiver would likely believe that 1062 there has been an SSRC collision, and that the RTP stream is spurious 1063 as it doesn't carry the identifiers used to relate it to the correct 1064 context. However, this is not true for CSRC values, as long as they 1065 are never used as SSRC. In these cases one could provide CNAME and 1066 MID as SDES items. A receiver could use this to determine which CSRC 1067 values that are associated with the same originating media source. 1069 If RtpStreamIds are used in this scenario, it should be noted that 1070 the RtpStreamId on a particular SSRC will change based on the actual 1071 simulcast stream selected for switching. These RtpStreamId 1072 identifiers will be local to this leg's signalling context. In 1073 addition, the defined RtpStreamIds and their parameters need to cover 1074 all the media sources and simulcast streams that can be switched into 1075 this media source. 1077 7.2.2. Selective Forwarding Middlebox 1079 This section discusses the behavior in cases where the RTP middlebox 1080 behaves like the Selective Forwarding Middlebox (Section 3.7) in RTP 1081 Topologies [RFC7667]. Applications for this type of RTP middlebox 1082 results in that each originating media source will have a 1083 corresponding media source on the leg between the middlebox and the 1084 receiver. A SFM could go as far as exposing all the simulcast 1085 streams for an media source, however this section will focus on 1086 having a single simulcast stream that can contain any of the 1087 simulcast formats. This section will assume that the SFM projection 1088 mechanism works on media source level, and maps one of the media 1089 source's simulcast streams onto one RTP stream from the SFM to the 1090 receiver. 1092 This usage will result in that the individual RTP stream(s) for one 1093 media source can switch between being active to paused, based on the 1094 subset of media sources the SFM wants to provide the receiver for the 1095 moment. With SFMs there exist no reasons to use CSRC to indicate the 1096 originating stream, as there is a one to one media source mapping. 1097 If the application requires knowing the simulcast version received to 1098 function well, then RtpStreamId should be negotiated on the SFM to 1099 receiver leg. Which simulcast stream that is being forwarded is not 1100 made explicit unless RtpStreamId is used on the leg. 1102 Any MID SDES items being sent by the SFM to the receiver are only 1103 those agreed between the SFM and the receiver, and no MID values from 1104 the originating side of the SFM are to be forwarded. 1106 A SFM could expose corresponding RTP streams for all the media 1107 sources and their simulcast streams, and then for any media source 1108 that is to be provided forward one selected simulcast stream. 1109 However, this is not recommended as it would unnecessarily increase 1110 the number of RTP streams and require the receiver to timely detect 1111 switching between simulcast streams. The above usage requires the 1112 same SFM functionality for switching, while avoiding the 1113 uncertainties of timely detecting that a RTP stream ends. The 1114 benefit would be that the received simulcast stream would be 1115 implicitly provided by which RTP stream would be active for a media 1116 source. However, using RtpStreamId to make this explicit also 1117 exposes which alternative format is used. The conclusion is that 1118 using one RTP stream per simulcast stream is unnecessary. The issue 1119 with timely detecting end of streams, independent if they are stopped 1120 temporarily or long term, is that there is no explicit indication 1121 that the transmission has intentionally been stopped. The RTCP based 1122 Pause and Resume mechanism [RFC7728] includes a PAUSED indication 1123 that provides the last RTP sequence number transmitted prior to the 1124 pause. Due to usage, the timeliness of this solution depends on when 1125 delivery using RTCP can occur in relation to the transmission of the 1126 last RTP packet. If no explicit information is provided at all, then 1127 detection based on non increasing RTCP SR field values and timers 1128 need to be used to determine pause in RTP packet delivery. This 1129 results in that one can usually not determine when the last RTP 1130 packet arrives (if it arrives) that this will be the last. That it 1131 was the last is something that one learns later. 1133 7.3. RTP Middlebox to RTP Middlebox 1135 This relates to the transmission of simulcast streams between RTP 1136 middleboxes or other usages where one wants to enable the delivery of 1137 multiple simultaneous simulcast streams per media source, but the 1138 transmitting entity is not the originating endpoint. For a 1139 particular direction between middlebox A and B, this looks very 1140 similar to the originating to middlebox case on a media source basis. 1141 However, in this case there is usually multiple media sources, 1142 originating from multiple endpoints. This can create situations 1143 where limitations in the number of simultaneously received media 1144 streams can arise, for example due to limitation in network 1145 bandwidth. In this case, a subset of not only the simulcast streams, 1146 but also media sources can be selected. This results in that 1147 individual RTP streams can be become paused at any point and later 1148 being resumed based on various criteria. 1150 The MIDs used between A and B are the ones agreed between these two 1151 identities in signalling. The RtpStreamId values will also be 1152 provided to ensure explicit information about which simulcast stream 1153 they are. The RTP stream to MID and RtpStreamId associations should 1154 here be long term stable. 1156 8. Network Aspects 1158 Simulcast is in this memo defined as the act of sending multiple 1159 alternative encoded streams of the same underlying media source. 1160 When transmitting multiple independent streams that originate from 1161 the same source, it could potentially be done in several different 1162 ways using RTP. A general discussion on considerations for use of 1163 the different RTP multiplexing alternatives can be found in 1164 Guidelines for Multiplexing in RTP 1165 [I-D.ietf-avtcore-multiplex-guidelines]. Discussion and 1166 clarification on how to handle multiple streams in an RTP session can 1167 be found in [RFC8108]. 1169 The network aspects that are relevant for simulcast are: 1171 Quality of Service: When using simulcast it might be of interest to 1172 prioritize a particular simulcast stream, rather than applying 1173 equal treatment to all streams. For example, lower bit-rate 1174 streams may be prioritized over higher bit-rate streams to 1175 minimize congestion or packet losses in the low bit-rate streams. 1176 Thus, there is a benefit to use a simulcast solution with good QoS 1177 support. 1179 NAT/FW Traversal: Using multiple RTP sessions incurs more cost for 1180 NAT/FW traversal unless they can re-use the same transport flow, 1181 which can be achieved by Multiplexing Negotiation Using SDP Port 1182 Numbers [I-D.ietf-mmusic-sdp-bundle-negotiation]. 1184 8.1. Bitrate Adaptation 1186 Use of multiple simulcast streams can require a significant amount of 1187 network resources. If the amount of available network resources 1188 varies during an RTP session such that it does not match what is 1189 negotiated in SDP, the bitrate used by the different simulcast 1190 streams may have to be reduced dynamically. What simulcast streams 1191 to prioritize when allocating available bitrate among the simulcast 1192 streams in such adaptation SHOULD be taken from the simulcast stream 1193 order on the "a=simulcast" line and ordering of alternative simulcast 1194 formats Section 6.2. Simulcast streams that have pause/resume 1195 capability and that would be given such low bitrate by the adaptation 1196 process that they are considered not really useful can be temporarily 1197 paused until the limiting condition clears. 1199 9. Limitation 1201 The chosen approach has a limitation that relates to the use of a 1202 single RTP session for all simulcast formats of a media source, which 1203 comes from sending all simulcast streams related to a media source 1204 under the same SDP media description. 1206 It is not possible to use different simulcast streams on different 1207 media transports, limiting the possibilities to apply different QoS 1208 to different simulcast streams. When using unicast, QoS mechanisms 1209 based on individual packet marking are feasible, since they do not 1210 require separation of simulcast streams into different RTP sessions 1211 to apply different QoS. 1213 It is also not possible to separate different simulcast streams into 1214 different multicast groups to allow a multicast receiver to pick the 1215 stream it wants, rather than receive all of them. In this case, the 1216 only reasonable implementation is to use different RTP sessions for 1217 each multicast group so that reporting and other RTCP functions 1218 operate as intended. Such simulcast usage in multicast context is 1219 out of scope for the current document and would require additional 1220 specification. 1222 10. IANA Considerations 1224 This document requests to register a new media-level SDP attribute, 1225 "simulcast", in the "att-field (media level only)" registry within 1226 the SDP parameters registry, according to the procedures of [RFC4566] 1227 and [I-D.ietf-mmusic-sdp-mux-attributes]. 1229 Contact name, email: IETF, contacted via mmusic@ietf.org, or a 1230 successor address designated by IESG 1232 Attribute name: simulcast 1234 Long-form attribute name: Simulcast stream description 1236 Charset dependent: No 1238 Attribute value: sc-value; see Section 6.1 of RFC XXXX. 1240 Purpose: Signals simulcast capability for a set of RTP streams 1242 MUX category: NORMAL 1244 Note to RFC Editor: Please replace "RFC XXXX" with the assigned 1245 number of this RFC. 1247 11. Security Considerations 1249 The simulcast capability, configuration attributes, and parameters 1250 are vulnerable to attacks in signaling. 1252 A false inclusion of the "a=simulcast" attribute may result in 1253 simultaneous transmission of multiple RTP streams that would 1254 otherwise not be generated. The impact is limited by the media 1255 description joint bandwidth, shared by all simulcast streams 1256 irrespective of their number. There may however be a large number of 1257 unwanted RTP streams that will impact the share of bandwidth 1258 allocated for the originally wanted RTP stream. 1260 A hostile removal of the "a=simulcast" attribute will result in 1261 simulcast not being used. 1263 Neither of the above will likely have any major consequences and can 1264 be mitigated by signaling that is at least integrity and source 1265 authenticated to prevent an attacker to change it. 1267 Security considerations related to the use of "a=rid" and the 1268 RtpStreamId SDES item is covered in [I-D.ietf-mmusic-rid] and 1269 [I-D.ietf-avtext-rid]. There are no additional security concerns 1270 related to their use in this specification. 1272 12. Contributors 1274 Morgan Lindqvist and Fredrik Jansson, both from Ericsson, have 1275 contributed with important material to the first versions of this 1276 document. Robert Hansen and Cullen Jennings, from Cisco, Peter 1277 Thatcher, from Google, and Adam Roach, from Mozilla, contributed 1278 significantly to subsequent versions. 1280 13. Acknowledgements 1282 The authors would like to thank Bernard Aboba, Thomas Belling, Roni 1283 Even, Adam Roach, Inaki Baz Castillo, and Paul Kyzivat for the 1284 feedback they provided during the development of this document. 1286 14. References 1288 14.1. Normative References 1290 [I-D.ietf-avtext-rid] 1291 Roach, A., Nandakumar, S., and P. Thatcher, "RTP Stream 1292 Identifier Source Description (SDES)", draft-ietf-avtext- 1293 rid-09 (work in progress), October 2016. 1295 [I-D.ietf-mmusic-rid] 1296 Thatcher, P., Zanaty, M., Nandakumar, S., Burman, B., 1297 Roach, A., and B. Campen, "RTP Payload Format 1298 Restrictions", draft-ietf-mmusic-rid-09 (work in 1299 progress), February 2017. 1301 [I-D.ietf-mmusic-sdp-bundle-negotiation] 1302 Holmberg, C., Alvestrand, H., and C. Jennings, 1303 "Negotiating Media Multiplexing Using the Session 1304 Description Protocol (SDP)", draft-ietf-mmusic-sdp-bundle- 1305 negotiation-36 (work in progress), October 2016. 1307 [I-D.ietf-mmusic-sdp-mux-attributes] 1308 Nandakumar, S., "A Framework for SDP Attributes when 1309 Multiplexing", draft-ietf-mmusic-sdp-mux-attributes-16 1310 (work in progress), December 2016. 1312 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1313 Requirement Levels", BCP 14, RFC 2119, 1314 DOI 10.17487/RFC2119, March 1997, 1315 . 1317 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1318 Jacobson, "RTP: A Transport Protocol for Real-Time 1319 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 1320 July 2003, . 1322 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1323 Description Protocol", RFC 4566, DOI 10.17487/RFC4566, 1324 July 2006, . 1326 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 1327 Specifications: ABNF", STD 68, RFC 5234, 1328 DOI 10.17487/RFC5234, January 2008, 1329 . 1331 [RFC7728] Burman, B., Akram, A., Even, R., and M. Westerlund, "RTP 1332 Stream Pause and Resume", RFC 7728, DOI 10.17487/RFC7728, 1333 February 2016, . 1335 14.2. Informative References 1337 [I-D.ietf-avtcore-multiplex-guidelines] 1338 Westerlund, M., Perkins, C., and H. Alvestrand, 1339 "Guidelines for using the Multiplexing Features of RTP to 1340 Support Multiple Media Streams", draft-ietf-avtcore- 1341 multiplex-guidelines-03 (work in progress), October 2014. 1343 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1344 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 1345 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1346 DOI 10.17487/RFC2198, September 1997, 1347 . 1349 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1350 with Session Description Protocol (SDP)", RFC 3264, 1351 DOI 10.17487/RFC3264, June 2002, 1352 . 1354 [RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for 1355 Comfort Noise (CN)", RFC 3389, DOI 10.17487/RFC3389, 1356 September 2002, . 1358 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 1359 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 1360 DOI 10.17487/RFC4588, July 2006, 1361 . 1363 [RFC4733] Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF 1364 Digits, Telephony Tones, and Telephony Signals", RFC 4733, 1365 DOI 10.17487/RFC4733, December 2006, 1366 . 1368 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 1369 "Codec Control Messages in the RTP Audio-Visual Profile 1370 with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, 1371 February 2008, . 1373 [RFC5109] Li, A., Ed., "RTP Payload Format for Generic Forward Error 1374 Correction", RFC 5109, DOI 10.17487/RFC5109, December 1375 2007, . 1377 [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP 1378 Header Extensions", RFC 5285, DOI 10.17487/RFC5285, July 1379 2008, . 1381 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 1382 Media Attributes in the Session Description Protocol 1383 (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009, 1384 . 1386 [RFC5583] Schierl, T. and S. Wenger, "Signaling Media Decoding 1387 Dependency in the Session Description Protocol (SDP)", 1388 RFC 5583, DOI 10.17487/RFC5583, July 2009, 1389 . 1391 [RFC6184] Wang, Y., Even, R., Kristensen, T., and R. Jesup, "RTP 1392 Payload Format for H.264 Video", RFC 6184, 1393 DOI 10.17487/RFC6184, May 2011, 1394 . 1396 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 1397 "RTP Payload Format for Scalable Video Coding", RFC 6190, 1398 DOI 10.17487/RFC6190, May 2011, 1399 . 1401 [RFC6236] Johansson, I. and K. Jung, "Negotiation of Generic Image 1402 Attributes in the Session Description Protocol (SDP)", 1403 RFC 6236, DOI 10.17487/RFC6236, May 2011, 1404 . 1406 [RFC6464] Lennox, J., Ed., Ivov, E., and E. Marocco, "A Real-time 1407 Transport Protocol (RTP) Header Extension for Client-to- 1408 Mixer Audio Level Indication", RFC 6464, 1409 DOI 10.17487/RFC6464, December 2011, 1410 . 1412 [RFC7104] Begen, A., Cai, Y., and H. Ou, "Duplication Grouping 1413 Semantics in the Session Description Protocol", RFC 7104, 1414 DOI 10.17487/RFC7104, January 2014, 1415 . 1417 [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and 1418 B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms 1419 for Real-Time Transport Protocol (RTP) Sources", RFC 7656, 1420 DOI 10.17487/RFC7656, November 2015, 1421 . 1423 [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, 1424 DOI 10.17487/RFC7667, November 2015, 1425 . 1427 [RFC7741] Westin, P., Lundin, H., Glover, M., Uberti, J., and F. 1428 Galligan, "RTP Payload Format for VP8 Video", RFC 7741, 1429 DOI 10.17487/RFC7741, March 2016, 1430 . 1432 [RFC8108] Lennox, J., Westerlund, M., Wu, Q., and C. Perkins, 1433 "Sending Multiple RTP Streams in a Single RTP Session", 1434 RFC 8108, DOI 10.17487/RFC8108, March 2017, 1435 . 1437 Appendix A. Changes From Earlier Versions 1439 NOTE TO RFC EDITOR: Please remove this section prior to publication. 1441 A.1. Modifications Between WG Version -07 and -08 1443 o Correcting syntax of SDP examples in section 6.6.1, as found by 1444 Inaki Baz Castillo. 1446 o Changing ABNF to only define the sc-value, not the SDP attribute 1447 itself, as suggested by Paul Kyzivat. 1449 o Changing I-D reference to newly published RFC 8108. 1451 o Adding list of modifications between -06 and -07. 1453 A.2. Modifications Between WG Version -06 and -07 1455 o A scope clarification, as result of the discussion with Roni Even. 1457 o A reformulation of the identification requirements for simulcast 1458 stream. 1460 o Correcting the statement related to source specific signalling 1461 (RFC 5576) to address Roni Even's comment. 1463 o Update of the last paragraph in Section 6.2 regarding simulcast 1464 stream differences as well as forbidding multiple instances of the 1465 same SCID within a single a=simulcast line. 1467 o Removal of note in Section 6.4 as result of issue raised by Roni 1468 Even. 1470 o Use of "m=" has been changed to media description and a few other 1471 editorial improvements and clarifications. 1473 A.3. Modifications Between WG Version -05 and -06 1475 o Added section on RTP Aspects 1477 o Added a requirement (5-4) on that capability exchange must be 1478 capable of handling multi RTP stream cases. 1480 o Added extmap attribute also on first signalling example as it is a 1481 recommended to use mechanism. 1483 o Clarified the definition of the simulcast attribute and how 1484 simulcast streams relates to simulcast formats and SCIDs. 1486 o Updated References list and moved around some references between 1487 informative and normative categories. 1489 o Editorial improvements and corrections. 1491 A.4. Modifications Between WG Version -04 and -05 1493 o Aligned with recent changes in draft-ietf-mmusic-rid and draft- 1494 ietf-avtext-rid. 1496 o Modified the SDP offer/answer section to follow the generally 1497 accepted structure, also adding a brief text on modifying the 1498 session that is aligned with draft-ietf-mmusic-rid. 1500 o Improved text around simulcast stream identification (as opposed 1501 to the simulcast stream itself) to consistently use the acronym 1502 SCID and defined that in the Terminology section. 1504 o Changed references for RTP-level pause/resume and VP8 payload 1505 format that are now published as RFC. 1507 o Improved IANA registration text. 1509 o Removed unused reference to draft-ietf-payload-flexible-fec- 1510 scheme. 1512 o Editorial improvements and corrections. 1514 A.5. Modifications Between WG Version -03 and -04 1516 o Changed to only use RID identification, as was consensus during 1517 IETF 94. 1519 o ABNF improvements. 1521 o Clarified offer-answer rules for initially paused streams. 1523 o Changed references for RTP topologies and RTP taxonomy documents 1524 that are now published as RFC. 1526 o Added reference to the new RID draft in AVTEXT. 1528 o Re-structured section 6 to provide an easy reference by the 1529 updated IANA section. 1531 o Added a sub-section 7.1 with a discussion of bitrate adaptation. 1533 o Editorial improvements. 1535 A.6. Modifications Between WG Version -02 and -03 1537 o Removed text on multicast / broadcast from use cases, since it is 1538 not supported by the solution. 1540 o Removed explicit references to unified plan draft. 1542 o Added possibility to initiate simulcast streams in paused mode. 1544 o Enabled an offerer to offer multiple stream identification (pt or 1545 rid) methods and have the answerer choose which to use. 1547 o Added a preference indication also in send direction offers. 1549 o Added a section on limitations of the current proposal, including 1550 identification method specific limitations. 1552 A.7. Modifications Between WG Version -01 and -02 1554 o Relying on the new RID solution for codec constraints and 1555 configuration identification. This has resulted in changes in 1556 syntax to identify if pt or RID is used to describe the simulcast 1557 stream. 1559 o Renamed simulcast version and simulcast version alternative to 1560 simulcast stream and simulcast format respectively, and improved 1561 definitions for them. 1563 o Clarification that it is possible to switch between simulcast 1564 version alternatives, but that only a single one be used at any 1565 point in time. 1567 o Changed the definition so that ordering of simulcast formats for a 1568 specific simulcast stream do have a preference order. 1570 A.8. Modifications Between WG Version -00 and -01 1572 o No changes. Only preventing expiry. 1574 A.9. Modifications Between Individual Version -00 and WG Version -00 1576 o Added this appendix. 1578 Authors' Addresses 1579 Bo Burman 1580 Ericsson 1581 Gronlandsgatan 31 1582 SE-164 60 Stockholm 1583 Sweden 1585 Email: bo.burman@ericsson.com 1587 Magnus Westerlund 1588 Ericsson 1589 Farogatan 2 1590 SE-164 80 Stockholm 1591 Sweden 1593 Phone: +46 10 714 82 87 1594 Email: magnus.westerlund@ericsson.com 1596 Suhas Nandakumar 1597 Cisco 1598 170 West Tasman Drive 1599 San Jose, CA 95134 1600 USA 1602 Email: snandaku@cisco.com 1604 Mo Zanaty 1605 Cisco 1606 170 West Tasman Drive 1607 San Jose, CA 95134 1608 USA 1610 Email: mzanaty@cisco.com