idnits 2.17.1 draft-ietf-mmusic-sdp-simulcast-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (October 31, 2016) is 2731 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-15) exists of draft-ietf-mmusic-rid-08 == Outdated reference: A later version (-54) exists of draft-ietf-mmusic-sdp-bundle-negotiation-36 == Outdated reference: A later version (-19) exists of draft-ietf-mmusic-sdp-mux-attributes-14 ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) == Outdated reference: A later version (-12) exists of draft-ietf-avtcore-multiplex-guidelines-03 -- Obsolete informational reference (is this intentional?): RFC 5285 (Obsoleted by RFC 8285) Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group B. Burman 3 Internet-Draft M. Westerlund 4 Intended status: Standards Track Ericsson 5 Expires: May 4, 2017 S. Nandakumar 6 M. Zanaty 7 Cisco 8 October 31, 2016 10 Using Simulcast in SDP and RTP Sessions 11 draft-ietf-mmusic-sdp-simulcast-06 13 Abstract 15 In some application scenarios it may be desirable to send multiple 16 differently encoded versions of the same media source in different 17 RTP streams. This is called simulcast. This document describes how 18 to accomplish simulcast in RTP and how to signal it in SDP. The 19 described solution uses an RTP/RTCP identification method to identify 20 RTP streams belonging to the same media source, and makes an 21 extension to SDP to relate those RTP streams as being different 22 simulcast formats of that media source. The SDP extension consists 23 of a new media level SDP attribute that expresses capability to send 24 and/or receive simulcast RTP streams. 26 Status of This Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on May 4, 2017. 43 Copyright Notice 45 Copyright (c) 2016 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 61 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 62 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 63 2.2. Requirements Language . . . . . . . . . . . . . . . . . . 4 64 3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 4 65 3.1. Reaching a Diverse Set of Receivers . . . . . . . . . . . 5 66 3.2. Application Specific Media Source Handling . . . . . . . 7 67 3.3. Receiver Media Source Preferences . . . . . . . . . . . . 7 68 4. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 7 69 5. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 9 70 6. Detailed Description . . . . . . . . . . . . . . . . . . . . 9 71 6.1. Simulcast Attribute . . . . . . . . . . . . . . . . . . . 9 72 6.2. Simulcast Capability . . . . . . . . . . . . . . . . . . 11 73 6.3. Offer/Answer Use . . . . . . . . . . . . . . . . . . . . 13 74 6.3.1. Generating the Initial SDP Offer . . . . . . . . . . 13 75 6.3.2. Creating the SDP Answer . . . . . . . . . . . . . . . 14 76 6.3.3. Offerer Processing the SDP Answer . . . . . . . . . . 15 77 6.3.4. Modifying the Session . . . . . . . . . . . . . . . . 15 78 6.4. Use with Declarative SDP . . . . . . . . . . . . . . . . 15 79 6.5. Relating Simulcast Streams . . . . . . . . . . . . . . . 16 80 6.6. Signaling Examples . . . . . . . . . . . . . . . . . . . 16 81 6.6.1. Single-Source Client . . . . . . . . . . . . . . . . 17 82 6.6.2. Multi-Source Client . . . . . . . . . . . . . . . . . 18 83 7. RTP Aspects . . . . . . . . . . . . . . . . . . . . . . . . . 21 84 7.1. Outgoing from Endpoint with Media Source . . . . . . . . 21 85 7.2. RTP Middlebox to Receiver . . . . . . . . . . . . . . . . 21 86 7.2.1. Media-Switching Mixer . . . . . . . . . . . . . . . . 23 87 7.2.2. Selective Forwarding Middlebox . . . . . . . . . . . 24 88 7.3. RTP Middlebox to RTP Middlebox . . . . . . . . . . . . . 25 89 8. Network Aspects . . . . . . . . . . . . . . . . . . . . . . . 26 90 8.1. Bitrate Adaptation . . . . . . . . . . . . . . . . . . . 26 91 9. Limitation . . . . . . . . . . . . . . . . . . . . . . . . . 26 92 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 27 93 11. Security Considerations . . . . . . . . . . . . . . . . . . . 27 94 12. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 28 95 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 28 96 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 28 97 14.1. Normative References . . . . . . . . . . . . . . . . . . 28 98 14.2. Informative References . . . . . . . . . . . . . . . . . 29 99 Appendix A. Changes From Earlier Versions . . . . . . . . . . . 31 100 A.1. Modifications Between WG Version -05 and -06 . . . . . . 32 101 A.2. Modifications Between WG Version -04 and -05 . . . . . . 32 102 A.3. Modifications Between WG Version -03 and -04 . . . . . . 32 103 A.4. Modifications Between WG Version -02 and -03 . . . . . . 33 104 A.5. Modifications Between WG Version -01 and -02 . . . . . . 33 105 A.6. Modifications Between WG Version -00 and -01 . . . . . . 34 106 A.7. Modifications Between Individual Version -00 and WG 107 Version -00 . . . . . . . . . . . . . . . . . . . . . . . 34 108 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 34 110 1. Introduction 112 Most of today's multiparty video conference solutions make use of 113 centralized servers to reduce the bandwidth and CPU consumption in 114 the endpoints. Those servers receive RTP streams from each 115 participant and send some suitable set of possibly modified RTP 116 streams to the rest of the participants, which usually have 117 heterogeneous capabilities (screen size, CPU, bandwidth, codec, etc). 118 One of the biggest issues is how to perform RTP stream adaptation to 119 different participants' constraints with the minimum possible impact 120 on both video quality and server performance. 122 Simulcast is defined in this memo as the act of simultaneously 123 sending multiple different encoded streams of the same media source, 124 e.g. the same video source encoded with different video encoder types 125 or image resolutions. This can be done in several ways and for 126 different purposes. This document focuses on the case where it is 127 desirable to provide a media source as multiple encoded streams over 128 RTP [RFC3550] towards an intermediary so that the intermediary can 129 provide the wanted functionality by selecting which RTP stream(s) to 130 forward to other participants in the session, and more specifically 131 how the identification and grouping of the involved RTP streams are 132 done. 134 This document describes a few scenarios where it is motivated to use 135 simulcast, and also defines the needed RTP/RTCP and SDP signaling for 136 it. 138 2. Definitions 139 2.1. Terminology 141 This document makes use of the terminology defined in RTP Taxonomy 142 [RFC7656], and RTP Topologies [RFC7667]. The following terms are 143 especially noted or here defined: 145 RTP Mixer: An RTP middle node, defined in [RFC7667] (Section 3.6 to 146 3.9). 148 RTP Switch: A common short term for the terms "switching RTP mixer", 149 "source projecting middlebox", and "video switching MCU" as 150 discussed in [RFC7667]. 152 Simulcast Stream: One encoded stream or dependent stream from a set 153 of concurrently transmitted encoded streams and optional dependent 154 streams, all sharing a common media source, as defined in 155 [RFC7656]. For example, HD and thumbnail video simulcast versions 156 of a single media source sent concurrently as separate RTP 157 Streams. 159 Simulcast Format: Different formats of a simulcast stream serve the 160 same purpose as alternative RTP payload types in non-simulcast 161 SDP: to allow multiple alternative media formats for a given RTP 162 stream. As for multiple RTP payload types on the m-line in offer/ 163 answer [RFC3264], any one of the negotiated alternative formats 164 can be used in a single RTP stream at a given point in time, but 165 not more than one (based on RTP timestamp). What format is used 166 can change dynamically from one RTP packet to another. 168 Simulcast Stream Identifier (SCID): The identification value used to 169 refer to an individual simulcast format, identical to the "rid-id" 170 identification value for an RTP Payload Format Restriction 171 [I-D.ietf-mmusic-rid] and the corresponding content of 172 "RtpStreamId" RTCP SDES Item [I-D.ietf-avtext-rid]. 174 2.2. Requirements Language 176 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 177 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 178 document are to be interpreted as described in RFC 2119 [RFC2119]. 180 3. Use Cases 182 Many use cases of simulcast as described in this document relate to a 183 multi-party communication session where one or more central nodes are 184 used to adapt the view of the communication session towards 185 individual participants, and facilitate the media transport between 186 participants. Thus, these cases target the RTP Mixer type of 187 topology. 189 There are two principle approaches for an RTP Mixer to provide this 190 adapted view of the communication session to each receiving 191 participant: 193 o Transcoding (decoding and re-encoding) received RTP streams with 194 characteristics adapted to each receiving participant. This often 195 include mixing or composition of media sources from multiple 196 participants into a mixed media source originated by the RTP 197 Mixer. The main advantage of this approach is that it achieves 198 close to optimal adaptation to individual receiving participants. 199 The main disadvantages are that it can be very computationally 200 expensive to the RTP Mixer, typically degrades media Quality of 201 Experience (QoE) such as end-to-end delay for the receiving 202 participants, and requires RTP Mixer access to media content. 204 o Switching a subset of all received RTP streams or sub-streams to 205 each receiving participant, where the used subset is typically 206 specific to each receiving participant. The main advantages of 207 this approach are that it is computationally cheap to the RTP 208 Mixer, has very limited impact on media QoE, and does not require 209 RTP Mixer (full) access to media content. The main disadvantage 210 is that it can be difficult to combine a subset of received RTP 211 streams into a perfect fit to the resource situation of a 212 receiving participant. 214 The use of simulcast relates to the latter approach, where it is more 215 important to reduce the load on the RTP Mixer and/or minimize QoE 216 impact than to achieve an optimal adaptation of resource usage. 218 3.1. Reaching a Diverse Set of Receivers 220 The media sources provided by a sending participant potentially need 221 to reach several receiving participants that differ in terms of 222 available resources. The receiver resources that typically differ 223 include, but are not limited to: 225 Codec: This includes codec type (such as SDP MIME type) and can 226 include codec configuration options (e.g. SDP fmtp parameters). 227 A couple of codec resources that differ only in codec 228 configuration will be "different" if they are somehow not 229 "compatible", like if they differ in video codec profile, or the 230 transport packetization configuration. 232 Sampling: This relates to how the media source is sampled, in 233 spatial as well as in temporal domain. For video streams, spatial 234 sampling affects image resolution and temporal sampling affects 235 video frame rate. For audio, spatial sampling relates to the 236 number of audio channels and temporal sampling affects audio 237 bandwidth. This may be used to suit different rendering 238 capabilities or needs at the receiving endpoints, as well as a 239 method to achieve different transport capabilities, bitrates and 240 eventually QoE by controlling the amount of source data. 242 Bitrate: This relates to the amount of bits spent per second to 243 transmit the media source as an RTP stream, which typically also 244 affects the Quality of Experience (QoE) for the receiving user. 246 Letting the sending participant create a simulcast of a few 247 differently configured RTP streams per media source can be a good 248 tradeoff when using an RTP switch as middlebox, instead of sending a 249 single RTP stream and using an RTP mixer to create individual 250 transcodings to each receiving participant. 252 This requires that the receiving participants can be categorized in 253 terms of available resources and that the sending participant can 254 choose a matching configuration for a single RTP stream per category 255 and media source. 257 For example, assume for simplicity a set of receiving participants 258 that differ only in that some have support to receive Codec A, and 259 the others have support to receive Codec B. Further assume that the 260 sending participant can send both Codec A and B. It can then reach 261 all receivers by creating two simulcasted RTP streams from each media 262 source; one for Codec A and one for Codec B. 264 In another simple example, a set of receiving participants differ 265 only in screen resolution; some are able to display video with at 266 most 360p resolution and some support 720p resolution. A sending 267 participant can then reach all receivers with best possible 268 resolution by creating a simulcast of RTP streams with 360p and 720p 269 resolution for each sent video media source. 271 In more elaborate cases, the receiving participants differ both in 272 available sampling and bitrate, and maybe also codec, and it is up to 273 the RTP switch to find a good trade-off in which simulcasted stream 274 to choose for each intended receiver. It is also the responsibility 275 of the RTP switch to negotiate a good fit of simulcast streams with 276 the sending participant. 278 The maximum number of simulcasted RTP streams that can be sent is 279 mainly limited by the amount of processing and uplink network 280 resources available to the sending participant. 282 3.2. Application Specific Media Source Handling 284 The application logic that controls the communication session may 285 include special handling of some media sources. It is, for example, 286 commonly the case that the media from a sending participant is not 287 sent back to itself. 289 It is also common that a currently active speaker participant is 290 shown in larger size or higher quality than other participants (the 291 sampling or bitrate aspects of Section 3.1). Not sending the active 292 speaker media back to itself means there is some other participant's 293 media that instead has to receive special handling towards the active 294 speaker; typically the previous active speaker. This way, the 295 previously active speaker is needed both in larger size (to current 296 active speaker) and in small size (to the rest of the participants), 297 which can be solved with a simulcast from the previously active 298 speaker to the RTP switch. 300 3.3. Receiver Media Source Preferences 302 The application logic that controls the communication session may 303 allow receiving participants to apply preferences to the 304 characteristics of the RTP stream they receive, for example in terms 305 of the aspects listed in Section 3.1. Sending a simulcast of RTP 306 streams is one way of accommodating receivers with conflicting or 307 otherwise incompatible preferences. 309 4. Requirements 311 The following requirements need to be met to support the use cases in 312 previous sections: 314 REQ-1: Identification. It must be possible to identify a set of 315 simulcasted RTP streams as originating from the same media source: 317 REQ-1.1: In SDP signaling. 319 REQ-1.2: On RTP/RTCP level, at least with prior knowledge of SDP 320 (or similar) signaling. 322 REQ-2: Transport usage. The solution must work when using: 324 REQ-2.1: Legacy SDP with separate media transports per SDP media 325 description. 327 REQ-2.2: Bundled [I-D.ietf-mmusic-sdp-bundle-negotiation] SDP 328 media descriptions. 330 REQ-3: Capability negotiation. It must be possible that: 332 REQ-3.1: Sender can express capability of sending simulcast. 334 REQ-3.2: Receiver can express capability of receiving simulcast. 336 REQ-3.3: Sender can express maximum number of simulcast streams 337 that can be provided. 339 REQ-3.4: Receiver can express maximum number of simulcast streams 340 that can be received. 342 REQ-3.5: Sender can detail the characteristics of the simulcast 343 streams that can be provided. 345 REQ-3.6: Receiver can detail the characteristics of the simulcast 346 streams that it prefers to receive. 348 REQ-4: Distinguishing features. It must be possible to have 349 different simulcast streams use different codec parameters, as can 350 be expressed by SDP format values and RTP payload types. 352 REQ-5: Compatibility. It must be possible to use simulcast in 353 combination with other RTP mechanisms that generate additional RTP 354 streams: 356 REQ-5.1: RTP Retransmission [RFC4588]. 358 REQ-5.2: RTP Forward Error Correction [RFC5109]. 360 REQ-5.3: Related payload types such as audio Comfort Noise and/or 361 DTMF. 363 REQ-5.4: A single simulcast stream can consist of multiple RTP 364 streams, to support codecs where a dependent stream is 365 dependent on a set of encoded and dependent streams, each 366 potentially carried in their own RTP stream. 368 REQ-6: Interoperability. The solution must be possible to use in: 370 REQ-6.1: Interworking with non-simulcast legacy clients using a 371 single media source per media type. 373 REQ-6.2: WebRTC environment with a single media source per SDP 374 media description. 376 5. Overview 378 As an overview, the above requirements are met by signaling simulcast 379 capability and configurations in SDP [RFC4566]: 381 o An offer or answer can contain a number of simulcast streams, 382 separate for send and receive directions. 384 o An offer or answer can contain multiple, alternative simulcast 385 stream formats in the same fashion as multiple, alternative 386 formats can be offered in a media description. 388 o A single media source per SDP media description is assumed, which 389 is aligned with the concepts defined in [RFC7656] and will 390 specifically work in a WebRTC context, both with and without 391 BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation] grouping. 393 o The codec configuration for a simulcast stream is expressed 394 through use of separately specified RTP payload format 395 restrictions [I-D.ietf-mmusic-rid] with an associated RTP-level 396 identification mechanism [I-D.ietf-avtext-rid] to identify which 397 RTP payload format restrictions an RTP stream adheres to. This 398 complements and effectively extends simulcast stream 399 identification and configuration possibilities that could be 400 provided by using only SDP formats as identifier. Use of multiple 401 RTP streams with the same (non-redundancy) media type in the 402 context of a single media source, where those RTP streams are 403 using different RtpStreamId, is a strong but not totally 404 unambiguous indication of those RTP streams being part of a 405 simulcast. 407 o It is possible, but not required to use source-specific signaling 408 [RFC5576] with the proposed solution. 410 6. Detailed Description 412 This section further details the overview above (Section 5). First, 413 formal syntax is provided (Section 6.1), followed by the rest of the 414 SDP attribute definition in Section 6.2. Relating Simulcast Streams 415 (Section 6.5) provides the definition of the RTP/RTCP mechanisms 416 used. The section is concluded with a number of examples. 418 6.1. Simulcast Attribute 420 This document defines a new SDP media-level "a=simulcast" attribute 421 with the following ABNF [RFC5234] syntax: 423 sc-attr = "a=simulcast:" sc-value 424 sc-value = sc-str-list [SP sc-str-list] 425 sc-str-list = sc-dir SP sc-alt-list *( ";" sc-alt-list ) 426 sc-dir = "send" / "recv" 427 sc-alt-list = sc-id *( "," sc-id ) 428 sc-id-paused = "~" 429 sc-id = [sc-id-paused] rid-identifier 430 ; SP defined in [RFC5234] 431 ; rid-identifier defined in [I-D.ietf-mmusic-rid] 433 Figure 1: ABNF for Simulcast 435 The "a=simulcast" attribute has a parameter in the form of one or two 436 simulcast stream descriptions, each consisting of a direction ("send" 437 or "recv"), followed by a list of one or more simulcast streams. 438 Each simulcast stream consists of one or more alternative simulcast 439 formats. Each simulcast format is identified by a simulcast stream 440 identification (SCID). The SCID MUST have the form of an RTP stream 441 identifier, as described by RTP Payload Format Restrictions 442 [I-D.ietf-mmusic-rid]. 444 In the list of simulcast streams, each simulcast stream is separated 445 by a semicolon (";"). Each simulcast stream can in turn be offered 446 in one or more alternative formats, represented by SCIDs, separated 447 by a comma (","). Each SCID can also be specified as initially 448 paused [RFC7728], indicated by prepending a "~" to the SCID. The 449 reason to allow separate initial pause states for each SCID is that 450 pause capability can be specified individually for each RTP payload 451 type referenced by an SCID. Since pause capability specified via the 452 "a=rtcp-fb" attribute and SCID specified by "a=rid" can refer to 453 common payload types, it is unfeasible to pause streams with SCID 454 where any of the related RTP payload type(s) do not have pause 455 capability. 457 Examples: 459 a=simulcast:send 1,2,3;~4,~5 recv 6;~7,~8 460 a=simulcast:recv 1;4,5 send 6;7 462 Figure 2: Simulcast Examples 464 Above are two examples of different "a=simulcast" lines. 466 The first line is an example offer to send two simulcast streams and 467 to receive two simulcast streams. The first simulcast stream in send 468 direction can be sent in three different alternative formats (SCID 1, 469 2, 3), and the second simulcast stream in send direction can be sent 470 in two different alternative formats (SCID 4, 5). Both of the second 471 simulcast stream alternative formats in send direction are offered as 472 initially paused. The first simulcast stream in receive direction 473 has no alternative formats (SCID 6). The second simulcast stream in 474 receive direction has two alternative formats (SCID 7, 8) that are 475 both offered as initially paused. 477 The second line is an example answer to the first line, accepting to 478 send and receive the two offered simulcast streams, however send and 479 receive directions are specified in opposite order compared to the 480 first line, which lets the answer keep the same order of simulcast 481 streams in the SDP as in the offer, for convenience, even though 482 directionality is reversed. This example answer has removed all 483 offered alternative formats for the first simulcast stream (keeping 484 only SCID 1), but kept alternative formats for the second simulcast 485 stream in receive direction (4, 5). The answer thus accepts to send 486 two simulcast streams, without alternatives. The answer does not 487 accept initial pause of any simulcast streams, in either direction. 488 More examples can be found in Section 6.6. 490 6.2. Simulcast Capability 492 Simulcast capability is expressed through a new media level SDP 493 attribute, "a=simulcast" (Section 6.1). The meaning of the attribute 494 on SDP session level is undefined, MUST NOT be used by 495 implementations of this specification and MUST be ignored if received 496 on session level. Extensions to this specification MAY define such 497 session level usage. The meaning of including multiple "a=simulcast" 498 lines in a single SDP media description is undefined, MUST NOT be 499 used by implementations of this specification and any additional 500 "a=simulcast" lines beyond the first under an "m=" line MUST be 501 ignored if received. 503 There are separate and independent sets of simulcast streams in send 504 and receive directions. When listing multiple directions, each 505 direction MUST NOT occur more than once on the same line. 507 Simulcast streams using undefined SCID MUST NOT be used as valid 508 simulcast streams by an RTP stream receiver. The direction for an 509 SCID MUST be aligned with the direction specified for the 510 corresponding RTP stream identifier on the "a=rid" line. 512 The listed number of simulcast streams for a direction sets a limit 513 to the number of supported simulcast streams in that direction. The 514 order of the listed simulcast streams in the "send" direction 515 suggests a proposed order of preference, in decreasing order: the 516 SCID listed first is the most preferred and subsequent streams have 517 progressively lower preference. The order of the listed SCID in the 518 "recv" direction expresses a preference which simulcast streams that 519 are preferred, with the leftmost being most preferred. This can be 520 of importance if the number of actually sent simulcast streams have 521 to be reduced for some reason. 523 SCID that have explicit dependencies [RFC5583] [I-D.ietf-mmusic-rid] 524 to other SCID (even in the same media description) MAY be used. 526 Use of more than a single, alternative simulcast format for a 527 simulcast stream MAY be specified as part of the attribute parameters 528 by expressing the simulcast stream as a comma-separated list of 529 alternative SCID. In this case, it is not possible to align what 530 alternative SCID that are used across different simulcast streams, 531 like requiring all simulcast streams to use SCID alternatives 532 referring to the same codec format. The order of the SCID 533 alternatives within a simulcast stream is significant; the SCID 534 alternatives are listed from (left) most preferred to (right) least 535 preferred. For the use of simulcast, this overrides the normal codec 536 preference as expressed by format type ordering on the "m=" line, 537 using regular SDP rules. This is to enable a separation of general 538 codec preferences and simulcast stream configuration preferences. 540 A simulcast stream can use a codec defined such that the same RTP 541 SSRC can change RTP payload type multiple times during a session, 542 possibly even on a per-packet basis. A typical example can be a 543 speech codec that makes use of Comfort Noise [RFC3389] and/or DTMF 544 [RFC4733] formats. In those cases, such "related" formats MUST NOT 545 be defined as having their own SCID listed explicitly in the 546 attribute parameters, since they are not strictly simulcast streams 547 of the media source, but rather a specific way of generating the RTP 548 stream of a single simulcast stream with varying RTP payload type. 550 If RTP stream pause/resume [RFC7728] is supported, any SCID MAY be 551 prefixed by a "~" character to indicate that the corresponding 552 simulcast stream is initially paused already from start of the RTP 553 session. In this case, support for RTP stream pause/resume MUST also 554 be included under the same "m=" line where "a=simulcast" is included. 555 All RTP payload types related to such initially paused simulcast 556 stream MUST be listed in the SDP as pause/resume capable as specified 557 by [RFC7728], e.g. by using the "*" wildcard format for "a=rtcp-fb". 559 An initially paused simulcast stream in "send" direction MUST be 560 considered equivalent to an unsolicited locally paused stream, and be 561 handled accordingly. Initially paused simulcast streams are resumed 562 as described by the RTP pause/resume specification. An RTP stream 563 receiver that wishes to resume an unsolicited locally paused stream 564 needs to know the SSRC of that stream. The SSRC of an initially 565 paused simulcast stream can be obtained from an RTP stream sender 566 RTCP Sender Report (SR) including both the desired SSRC as "SSRC of 567 sender", and the SCID value in an RtpStreamId RTCP SDES item 568 [I-D.ietf-avtext-rid]. 570 Including an initially paused simulcast stream in "recv" direction in 571 an SDP towards an RTP sender, SHOULD cause the remote RTP sender to 572 put the stream as unsolicited locally paused, unless there are other 573 RTP stream receivers that do not mark the simulcast stream as 574 initially paused. The reason to require an initially paused "recv" 575 stream to be considered locally paused by the remote RTP sender, 576 instead of making it equivalent to implicitly sending a pause 577 request, is because the pausing RTP sender cannot know which 578 receiving SSRC owns the restriction when TMMBR/TMMBN are used for 579 pause/resume signaling since the RTP receiver's SSRC in send 580 direction is sometimes not yet known. 582 Use of the redundant audio data [RFC2198] format could be seen as a 583 form of simulcast for loss protection purposes, but is not considered 584 conflicting with the mechanisms described in this memo and MAY 585 therefore be used as any other format. In this case the "red" 586 format, rather than the carried formats, SHOULD be the one to list as 587 a simulcast stream on the "a=simulcast" line. 589 The media formats and corresponding characteristics of simulcast 590 streams SHOULD be chosen such that they are different, either as 591 different SDP formats with differing "a=rtpmap" and/or "a=fmtp" 592 lines, as differently defined RTP payload format restrictions, or 593 both. If this difference is not required, RTP duplication [RFC7104] 594 procedures SHOULD be considered instead of simulcast. 596 6.3. Offer/Answer Use 598 Note: The inclusion of "a=simulcast" or the use of simulcast does 599 not change any of the interpretation or Offer/Answer procedures 600 for other SDP attributes, like "a=fmtp" or "a=rid". 602 6.3.1. Generating the Initial SDP Offer 604 An offerer wanting to use simulcast SHALL include the "a=simulcast" 605 attribute in the offer. An offerer listing a set of receive 606 simulcast streams and/or alternative formats as SCID in the offer 607 MUST be prepared to receive RTP streams for any of those simulcast 608 streams and/or alternative formats from the answerer. 610 6.3.2. Creating the SDP Answer 612 An answerer that does not understand the concept of simulcast will 613 also not know the attribute and will remove it in the SDP answer, as 614 defined in existing SDP Offer/Answer [RFC3264] procedures. 615 Similarly, an answerer that receives an offer with the "a=simulcast" 616 attribute on session level SHALL remove it in the answer. An 617 answerer that understands the attribute but receives multiple 618 "a=simulcast" attributes under the same "m=" line SHALL ignore and 619 remove all but the first in the answer. 621 An answerer that does understand the attribute and that wants to 622 support simulcast in an indicated direction SHALL reverse 623 directionality of the unidirectional direction parameters; "send" 624 becomes "recv" and vice versa, and include it in the answer. 626 An answerer that receives an offer with simulcast containing an 627 "a=simulcast" attribute listing alternative SCID MAY keep all the 628 alternative SCID in the answer, but it MAY also choose to remove any 629 non-desirable alternative SCID in the answer. The answerer MUST NOT 630 add any alternative SCID in send direction in the answer that were 631 not present in the offer receive direction. The answerer MUST be 632 prepared to receive any of the receive direction SCID alternatives, 633 and MAY send any of the send direction alternatives that are kept in 634 the answer. 636 An answerer that receives an offer with simulcast that lists a number 637 of simulcast streams, MAY reduce the number of simulcast streams in 638 the answer, but MUST NOT add simulcast streams. 640 An answerer that receives an offer without RTP stream pause/resume 641 capability MUST NOT mark any simulcast streams as initially paused in 642 the answer. 644 An RTP stream pause/resume capable answerer that receives an offer 645 with RTP stream pause/resume capability MAY mark any SCID that refer 646 to pause/resume capable formats as initially paused in the answer. 648 An answerer that receives indication in an offer of an SCID being 649 initially paused SHOULD mark that SCID as initially paused also in 650 the answer, regardless of direction, unless it has good reason for 651 the SCID not being initially paused. One such reason could, for 652 example, be that the answerer would otherwise initially not receive 653 any media of that type at all. 655 6.3.3. Offerer Processing the SDP Answer 657 An offerer that receives an answer without "a=simulcast" MUST NOT use 658 simulcast towards the answerer. An offerer that receives an answer 659 with "a=simulcast" without any SCID in a specified direction MUST NOT 660 use simulcast in that direction. 662 An offerer that receives an answer where some SCID alternatives are 663 kept MUST be prepared to receive any of the kept send direction SCID 664 alternatives, and MAY send any of the kept receive direction SCID 665 alternatives. 667 An offerer that receives an answer where some of the SCID are removed 668 compared to the offer MAY release the corresponding resources (codec, 669 transport, etc) in its receive direction and MUST NOT send any RTP 670 packets corresponding to the removed SCID. 672 An offerer that offered some of its SCID as initially paused and that 673 receives an answer that does not indicate RTP stream pause/resume 674 capability, MUST NOT initially pause any simulcast streams. 676 An offerer with RTP stream pause/resume capability that receives an 677 answer where some SCID are marked as initially paused, SHOULD 678 initially pause those RTP streams regardless if they were marked as 679 initially paused also in the offer, unless it has good reason for 680 those RTP streams not being initially paused. One such reason could, 681 for example, be that the answerer would otherwise initially not 682 receive any media of that type at all. 684 6.3.4. Modifying the Session 686 Offers and answers inside an existing session follow the rules for 687 initial session negotiation, with the additional restriction that any 688 SCID marked as initially paused in such offer or answer MUST already 689 be paused, thus a new offer/answer MUST NOT replace use of RTP stream 690 pause/resume [RFC7728] in the session. Session modification 691 restrictions in section 6.5 of RTP payload format restrictions 692 [I-D.ietf-mmusic-rid] also apply. 694 6.4. Use with Declarative SDP 696 This document does not define the use of "a=simulcast" in declarative 697 SDP, partly motivated by use of the simulcast format identification 698 [I-D.ietf-mmusic-rid] not being defined for use in declarative SDP. 699 If concrete use cases for simulcast in declarative SDP are identified 700 in the future, we expect that additional specifications will address 701 such use. 703 Note: It may not be beneficial for declarative use to be limited 704 to a single media source per "m=" line, as elaborated further in 705 Section 9. 707 6.5. Relating Simulcast Streams 709 Simulcast RTP streams MUST be related on RTP level through 710 RtpStreamId [I-D.ietf-avtext-rid], as specified in the SDP 711 "a=simulcast" attribute (Section 6.2) parameters. This is sufficient 712 as long as there is only a single media source per SDP media 713 description. When using BUNDLE 714 [I-D.ietf-mmusic-sdp-bundle-negotiation], where multiple SDP media 715 descriptions jointly specify a single RTP session, the SDES MID 716 identification mechanism in BUNDLE allows relating RTP streams back 717 to individual media descriptions, after which the above described 718 RtpStreamId relations can be used. Use of the RTP header extension 719 [RFC5285] for both MID and RtpStreamId identifications can be 720 important to ensure rapid initial reception, required to correctly 721 interpret and process the RTP streams. Implementers of this 722 specification MUST support the RTCP source description (SDES) item 723 method and SHOULD support RTP header extension method to signal 724 RtpStreamId on RTP level. 726 RTP streams MUST only use a single alternative SCID at a time (based 727 on RTP timestamps), but MAY change format on a per-RTP packet basis. 728 This corresponds to the existing (non-simulcast) SDP offer/answer 729 case when multiple formats are included on the "m=" line in the SDP 730 answer. 732 6.6. Signaling Examples 734 These examples describe a client to video conference service, using a 735 centralized media topology with an RTP mixer. 737 +---+ +-----------+ +---+ 738 | A |<---->| |<---->| B | 739 +---+ | | +---+ 740 | Mixer | 741 +---+ | | +---+ 742 | F |<---->| |<---->| J | 743 +---+ +-----------+ +---+ 745 Figure 3: Four-party Mixer-based Conference 747 6.6.1. Single-Source Client 749 Alice is calling in to the mixer with a simulcast-enabled client 750 capable of a single media source per media type. The client can send 751 a simulcast of 2 video resolutions and frame rates: HD 1280x720p 752 30fps and thumbnail 320x180p 15fps. This is defined below using the 753 "imageattr" [RFC6236]. In this example, only the "pt" "a=rid" 754 parameter is used, effectively achieving a 1:1 mapping between 755 RtpStreamId and media formats (RTP payload types), to describe 756 simulcast stream formats. Alice's Offer: 758 v=0 759 o=alice 2362969037 2362969040 IN IP4 192.0.2.156 760 s=Simulcast Enabled Client 761 t=0 0 762 c=IN IP4 192.0.2.156 763 m=audio 49200 RTP/AVP 0 764 a=rtpmap:0 PCMU/8000 765 m=video 49300 RTP/AVP 97 98 766 a=rtpmap:97 H264/90000 767 a=rtpmap:98 H264/90000 768 a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000 769 a=fmtp:98 profile-level-id=42c00b; max-fs=240; max-mbps=3600 770 a=imageattr:97 send [x=1280,y=720] recv [x=1280,y=720] 771 a=imageattr:98 send [x=320,y=180] recv [x=320,y=180] 772 a=rid:1 pt=97 send 773 a=rid:2 pt=98 send 774 a=rid:3 pt=97 recv 775 a=simulcast:send 1;2 recv 3 776 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:RtpStreamId 778 Figure 4: Single-Source Simulcast Offer 780 The only thing in the SDP that indicates simulcast capability is the 781 line in the video media description containing the "simulcast" 782 attribute. The included "a=fmtp" and "a=imageattr" parameters 783 indicates that sent simulcast streams can differ in video resolution. 784 The RTP header extension for RtpStreamId is offered to avoid issues 785 with the initial binding between RTP streams (SSRCs) and the 786 RtpStreamId identifying the simulcast stream and its format. 788 The Answer from the server indicates that it too is simulcast 789 capable. Should it not have been simulcast capable, the 790 "a=simulcast" line would not have been present and communication 791 would have started with the media negotiated in the SDP. Also the 792 usage of the RtpStreamId RTP header extension is accepted. 794 v=0 795 o=server 823479283 1209384938 IN IP4 192.0.2.2 796 s=Answer to Simulcast Enabled Client 797 t=0 0 798 c=IN IP4 192.0.2.43 799 m=audio 49672 RTP/AVP 0 800 a=rtpmap:0 PCMU/8000 801 m=video 49674 RTP/AVP 97 98 802 a=rtpmap:97 H264/90000 803 a=rtpmap:98 H264/90000 804 a=fmtp:97 profile-level-id=42c01f; max-fs=3600; max-mbps=108000 805 a=fmtp:98 profile-level-id=42c00b; max-fs=240; max-mbps=3600 806 a=imageattr:97 send [x=1280,y=720] recv [x=1280,y=720] 807 a=imageattr:98 send [x=320,y=180] recv [x=320,y=180] 808 a=rid:1 pt=97 recv 809 a=rid:2 pt=98 recv 810 a=rid:3 pt=97 send 811 a=simulcast:recv 1;2 send 3 812 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:RtpStreamId 814 Figure 5: Single-Source Simulcast Answer 816 Since the server is the simulcast media receiver, it reverses the 817 direction of the "simulcast" and "rid" attribute parameters. 819 6.6.2. Multi-Source Client 821 Fred is calling in to the same conference as in the example above 822 with a two-camera, two-display system, thus capable of handling two 823 separate media sources in each direction, where each media source is 824 simulcast-enabled in the send direction. Fred's client is restricted 825 to a single media source per media description. 827 The first two simulcast streams for the first media source use 828 different codecs, H264-SVC [RFC6190] and H264 [RFC6184]. These two 829 simulcast streams also have a temporal dependency. Two different 830 video codecs, VP8 [RFC7741] and H264, are offered as alternatives for 831 the third simulcast stream for the first media source. Only the 832 highest fidelity simulcast stream is sent from start, the lower 833 fidelity streams being initially paused. 835 The second media source is offered with three different simulcast 836 streams. All video streams of this second media source are loss 837 protected by RTP retransmission [RFC4588]. Also here, all but the 838 highest fidelity simulcast stream are initially paused. 840 Fred's client is also using BUNDLE to send all RTP streams from all 841 media descriptions in the same RTP session on a single media 842 transport. Although using many different simulcast streams in this 843 example, the use of RtpStreamId as simulcast stream identification 844 enables use of a low number of RTP payload types. Note that the use 845 of both BUNDLE [I-D.ietf-mmusic-sdp-bundle-negotiation] and "a=rid" 846 [I-D.ietf-mmusic-rid] recommends using the RTP header extension 847 [RFC5285] for carrying these RTP stream identification fields, which 848 is consequently also included in the SDP. Note also that for 849 "a=rid", the corresponding SDES attribute is named RtpStreamId 850 [I-D.ietf-avtext-rid]. 852 v=0 853 o=fred 238947129 823479223 IN IP6 2001:db8::c000:27d 854 s=Offer from Simulcast Enabled Multi-Source Client 855 t=0 0 856 c=IN IP6 2001:db8::c000:27d 857 a=group:BUNDLE foo bar zen 859 m=audio 49200 RTP/AVP 99 860 a=mid:foo 861 a=rtpmap:99 G722/8000 863 m=video 49600 RTP/AVPF 100 101 103 864 a=mid:bar 865 a=rtpmap:100 H264-SVC/90000 866 a=rtpmap:101 H264/90000 867 a=rtpmap:103 VP8/90000 868 a=fmtp:100 profile-level-id=42400d; max-fs=3600; max-mbps=108000; \ 869 mst-mode=NI-TC 870 a=fmtp:101 profile-level-id=42c00d; max-fs=3600; max-mbps=54000 871 a=fmtp:103 max-fs=900; max-fr=30 872 a=rid:1 send pt=100;max-width=1280;max-height=720;max-fps=60;depend=2 873 a=rid:2 send pt=101;max-width=1280;max-height=720;max-fps=30 874 a=rid:3 send pt=101;max-width=640;max-height=360 875 a=rid:4 send pt=103;max-width=640;max-height=360 876 a=depend:100 lay bar:101 877 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid 878 a=extmap:2 urn:ietf:params:rtp-hdrext:sdes:RtpStreamId 879 a=rtcp-fb:* ccm pause nowait 880 a=simulcast:send 1;2;~4,3 882 m=video 49602 RTP/AVPF 96 104 883 a=mid:zen 884 a=rtpmap:96 VP8/90000 885 a=fmtp:96 max-fs=3600; max-fr=30 886 a=rtpmap:104 rtx/90000 887 a=fmtp:104 apt=96;rtx-time=200 888 a=rid:1 send pt=96;max-fs=921600;max-fps=30 889 a=rid:2 send pt=96;max-fs=614400;max-fps=15 890 a=rid:3 send pt=96;max-fs=230400;max-fps=30 891 a=extmap:1 urn:ietf:params:rtp-hdrext:sdes:mid 892 a=extmap:2 urn:ietf:params:rtp-hdrext:sdes:RtpStreamId 893 a=rtcp-fb:* ccm pause nowait 894 a=simulcast:send 1;~2;~3 896 Figure 6: Fred's Multi-Source Simulcast Offer 898 Note: Empty lines in the SDP above are added only for readability 899 and would not be present in an actual SDP. 901 7. RTP Aspects 903 This section discusses what the different entities in a simulcast 904 media path can expect to happen on RTP level. This is explored from 905 source to sink by starting in an endpoint with a media source that is 906 simulcasted to a RTP middlebox. That RTP middlebox sends media 907 sources both to other RTP middleboxes (cascaded middleboxes), as well 908 as selecting some simulcast format of the media source and sending it 909 to receiving endpoints. Different types of RTP middleboxes and their 910 usage of the different simulcast formats results in several different 911 behaviors. 913 7.1. Outgoing from Endpoint with Media Source 915 The most straightforward simulcast case is the RTP streams being 916 emitted from the endpoint that originates a media source. When 917 simulcast has been negotiated in the sending direction, the endpoint 918 can transmit up to the number of RTP streams needed for the 919 negotiated simulcast streams for that media source. Each RTP stream 920 (SSRC) is identified by associating (Section 6.5) it with an 921 RtpStreamId SDES item, transmitted in RTCP and possibly also as an 922 RTP header extension. In cases where multiple media sources have 923 been negotiated for the same RTP session and thus BUNDLE 924 [I-D.ietf-mmusic-sdp-bundle-negotiation] is used, also the MID SDES 925 item will be sent similarly to the RtpStreamId. 927 Each RTP stream may not be continuously transmitted due to any of the 928 following reasons; temporarily paused using Pause/Resume [RFC7728], 929 sender side application logic temporarily pausing it, or lack of 930 network resources to transmit this simulcast stream. However, all 931 simulcast streams that have been negotiated have active and 932 maintained SSRC (at least in regular RTCP reports), even if no RTP 933 packets are currently transmitted. The relation between an RTP 934 Stream (SSRC) and a particular simulcast stream is not expected to 935 change, except in exceptional situations such as SSRC collisions. At 936 SSRC changes, the usage of MID and RtpStreamId should enable the 937 receiver to correctly identify the RTP streams even after an SSRC 938 change. 940 7.2. RTP Middlebox to Receiver 942 RTP streams in a multi-party RTP session can be used in multiple 943 different ways, when the session utilizes simulcast at least on the 944 media source to middlebox legs. This is to a large degree due to the 945 different RTP middlebox behaviors, but also the needs of the 946 application. This text assumes that the RTP middlebox will select a 947 media source and choose which simulcast stream for that media source 948 to deliver to a specific receiver. In many cases, at most one 949 simulcast stream per media source will be forwarded to a particular 950 receiver at any instant in time, even if the selected simulcast 951 stream may vary. For cases where this does not hold due to 952 application needs, then the RTP stream aspects will fall under the 953 middlebox to middlebox case Section 7.3. 955 The selection of which simulcast streams to forward towards the 956 receiver, is application specific. However, in conferencing 957 applications, active speaker selection is common. In case the number 958 of media sources possible to forward, N, is less than the total 959 amount of media sources available in an multi-media session, the 960 current and previous speakers (up to N in total) are often the ones 961 forwarded. To avoid the need for media specific processing to 962 determine the current speaker(s) in the RTP middlebox, the endpoint 963 providing a media source may include meta data, such as the RTP 964 Header Extension for Client-to-Mixer Audio Level Indication 965 [RFC6464]. 967 The possibilities for stream switching are media type specific, but 968 for media types with significant interframe dependencies in the 969 encoding, like most video coding, the switching needs to be made at 970 suitable switching points in the media stream that breaks or 971 otherwise deals with the dependency structure. Even if switching 972 points can be included periodically, it is common to use mechanisms 973 like Full Intra Requests [RFC5104] to request switching points from 974 the endpoint performing the encoding of the media source. 976 Inclusion of the RtpStreamId SDES item for an SSRC in the middlebox 977 to receiver direction should only occur when use of RtpStreamId has 978 been negotiated in that direction. It is worth noting that one can 979 signal multiple RtpStreamIds when simulcast signalling indicates only 980 a single simulcast stream, allowing one to use all of the 981 RtpStreamIds as alternatives for that simulcast stream. One reason 982 for including the RtpStreamId in the middlebox to receiver direction 983 for an RTP stream is to let the receiver know which restrictions 984 apply to the currently delivered RTP stream. In case the RtpStreamId 985 is negotiated to be used, it is important to remember that the used 986 identifiers will be specific to each signalling session. Even if the 987 central entity can attempt to coordinate, it is likely that the 988 RtpStreamIds need to be translated to the leg specific values. The 989 below cases will have as base line that RtpStreamId is not used in 990 the mixer to receiver direction. 992 7.2.1. Media-Switching Mixer 994 This section discusses the behavior in cases where the RTP middlebox 995 behaves like the Media-Switching Mixer (Section 3.6.2) in RTP 996 Topologies [RFC7667]. The fundamental aspect here is that the media 997 sources delivered from the middlebox will be the mixer's conceptual 998 or functional ones. For example, one media source may be the main 999 speaker in high resolution video, while a number of other media 1000 sources are thumbnails of each participant. 1002 The above results in that the RTP stream produced by the mixer is one 1003 that switches between a number of received incoming RTP streams for 1004 different media sources and in different simulcast versions. The 1005 mixer selects the media source to be sent as one of the RTP streams, 1006 and then selects among the available simulcast streams for the most 1007 appropriate one. The selection criteria include available bandwidth 1008 on the mixer to receiver path and restrictions based on the 1009 functional usage of the RTP stream delivered to the receiver. An 1010 example of the latter, is that it is unnecessary to forward a full HD 1011 video to a receiver if the display area is just a thumbnail. Thus, 1012 restrictions may exist to not allow some simulcast streams to be 1013 forwarded for some of the mixer's media sources. 1015 This will result in a single RTP stream being used for a particular 1016 of the RTP mixer's media sources. This RTP stream is at any point in 1017 time a selection of one particular RTP stream arriving to the mixer, 1018 where the RTP header field values are rewritten to provide a 1019 consistent, single RTP stream. If the RTP mixer doesn't receive any 1020 incoming stream matched to this media source, the SSRC will not 1021 transmit, but be kept alive using RTCP. The SSRC and thus RTP stream 1022 for the mixer's media source is expected to be long term stable. It 1023 will only be changed by signalling or other disruptive events. Note 1024 that although the above talks about a single RTP stream, there can in 1025 some cases be multiple RTP streams carrying the selected simulcast 1026 stream for the originating media source, including repair or other 1027 auxiliary RTP streams. 1029 The mixer may communicate the identity of the originating media 1030 source to the receiver by including the CSRC field with the 1031 originating media source's SSRC value. Note that due to the 1032 possibility that the RTP mixer switches between simulcast versions of 1033 the media source, the CSRC value may change, even if the media source 1034 is kept the same. 1036 It is important to note that any MID SDES item from the originating 1037 media source needs to be removed and not be associated with the RTP 1038 stream's SSRC. This as there is nothing in the signalling between 1039 the mixer and the receiver that is structured around the originating 1040 media sources, only the mixer's media sources. If they would be 1041 associated with the SSRC, the receiver would likely believe that 1042 there has been an SSRC collision, and that the RTP stream is spurious 1043 as it doesn't carry the identifiers used to relate it to the correct 1044 context. However, this is not true for CSRC values, as long as they 1045 are never used as SSRC. In these cases one could provide CNAME and 1046 MID as SDES items. A receiver could use this to determine which CSRC 1047 values that are associated with the same originating media source. 1049 If RtpStreamIds are used in this scenario, it should be noted that 1050 the RtpStreamId on a particular SSRC will change based on the actual 1051 simulcast stream selected for switching. These RtpStreamId 1052 identifiers will be local to this leg's signalling context. In 1053 addition, the defined RtpStreamIds and their parameters need to cover 1054 all the media sources and simulcast streams that can be switched into 1055 this media source. 1057 7.2.2. Selective Forwarding Middlebox 1059 This section discusses the behavior in cases where the RTP middlebox 1060 behaves like the Selective Forwarding Middlebox (Section 3.7) in RTP 1061 Topologies [RFC7667]. Applications for this type of RTP middlebox 1062 results in that each originating media source will have a 1063 corresponding media source on the leg between the middlebox and the 1064 receiver. A SFM could go as far as exposing all the simulcast 1065 streams for an media source, however this section will focus on 1066 having a single simulcast stream that can contain any of the 1067 simulcast formats. This section will assume that the SFM projection 1068 mechanism works on media source level, and maps one of the media 1069 source's simulcast streams onto one RTP stream from the SFM to the 1070 receiver. 1072 This usage will result in that the individual RTP stream(s) for one 1073 media source can switch between being active to paused, based on the 1074 subset of media sources the SFM wants to provide the receiver for the 1075 moment. With SFMs there exist no reasons to use CSRC to indicate the 1076 originating stream, as there is a one to one media source mapping. 1077 If the application requires knowing the simulcast version received to 1078 function well, then RtpStreamId should be negotiated on the SFM to 1079 receiver leg. Which simulcast stream that is being forwarded is not 1080 made explicit unless RtpStreamId is used on the leg. 1082 Any MID SDES items being sent by the SFM to the receiver are only 1083 those agreed between the SFM and the receiver, and no MID values from 1084 the originating side of the SFM are to be forwarded. 1086 A SFM could expose corresponding RTP streams for all the media 1087 sources and their simulcast streams, and then for any media source 1088 that is to be provided forward one selected simulcast stream. 1089 However, this is not recommended as it would unnecessarily increase 1090 the number of RTP streams and require the receiver to timely detect 1091 switching between simulcast streams. The above usage requires the 1092 same SFM functionality for switching, while avoiding the 1093 uncertainties of timely detecting that a RTP stream ends. The 1094 benefit would be that the received simulcast stream would be 1095 implicitly provided by which RTP stream would be active for a media 1096 source. However, using RtpStreamId to make this explicit also 1097 exposes which alternative format is used. The conclusion is that 1098 using one RTP stream per simulcast stream is unnecessary. The issue 1099 with timely detecting end of streams, independent if they are stopped 1100 temporarily or long term, is that there is no explicit indication 1101 that the transmission has intentionally been stopped. The RTCP based 1102 Pause and Resume mechanism [RFC7728] includes a PAUSED indication 1103 that provides the last RTP sequence number transmitted prior to the 1104 pause. Due to usage, the timeliness of this solution depends on when 1105 delivery using RTCP can occur in relation to the transmission of the 1106 last RTP packet. If no explicit information is provided at all, then 1107 detection based on non increasing RTCP SR field values and timers 1108 need to be used to determine pause in RTP packet delivery. This 1109 results in that one can usually not determine when the last RTP 1110 packet arrives (if it arrives) that this will be the last. That it 1111 was the last is something that one learns later. 1113 7.3. RTP Middlebox to RTP Middlebox 1115 This relates to the transmission of simulcast streams between RTP 1116 middleboxes or other usages where one wants to enable the delivery of 1117 multiple simultaneous simulcast streams per media source, but the 1118 transmitting entity is not the originating endpoint. For a 1119 particular direction between middlebox A and B, this looks very 1120 similar to the originating to middlebox case on a media source basis. 1121 However, in this case there is usually multiple media sources, 1122 originating from multiple endpoints. This can create situations 1123 where limitations in the number of simultaneous received media 1124 streams can arise, for example due to limitation in network 1125 bandwidth. In this case, a subset of not only the simulcast streams, 1126 but also media sources can be selected. This results in that 1127 individual RTP streams can be become paused at any point and later 1128 being resumed based on various criteria. 1130 The MIDs used between A and B are the ones agreed between these two 1131 identities in signalling. The RtpStreamId values will also be 1132 provided to ensure explicit information about which simulcast stream 1133 they are. The RTP stream to MID and RtpStreamId associations should 1134 here be long term stable. 1136 8. Network Aspects 1138 Simulcast is in this memo defined as the act of sending multiple 1139 alternative encoded streams of the same underlying media source. 1140 When transmitting multiple independent streams that originate from 1141 the same source, it could potentially be done in several different 1142 ways using RTP. A general discussion on considerations for use of 1143 the different RTP multiplexing alternatives can be found in 1144 Guidelines for Multiplexing in RTP 1145 [I-D.ietf-avtcore-multiplex-guidelines]. Discussion and 1146 clarification on how to handle multiple streams in an RTP session can 1147 be found in [I-D.ietf-avtcore-rtp-multi-stream]. 1149 The network aspects that are relevant for simulcast are: 1151 Quality of Service: When using simulcast it might be of interest to 1152 prioritize a particular simulcast stream, rather than applying 1153 equal treatment to all streams. For example, lower bit-rate 1154 streams may be prioritized over higher bit-rate streams to 1155 minimize congestion or packet losses in the low bit-rate streams. 1156 Thus, there is a benefit to use a simulcast solution with good QoS 1157 support. 1159 NAT/FW Traversal: Using multiple RTP sessions incurs more cost for 1160 NAT/FW traversal unless they can re-use the same transport flow, 1161 which can be achieved by Multiplexing Negotiation Using SDP Port 1162 Numbers [I-D.ietf-mmusic-sdp-bundle-negotiation]. 1164 8.1. Bitrate Adaptation 1166 Use of multiple simulcast streams can require a significant amount of 1167 network resources. If the amount of available network resources 1168 varies during an RTP session such that it does not match what is 1169 negotiated in SDP, the bitrate used by the different simulcast 1170 streams may have to be reduced dynamically. What simulcast streams 1171 to prioritize when allocating available bitrate among the simulcast 1172 streams in such adaptation SHOULD be taken from the simulcast stream 1173 order on the "a=simulcast" line. Simulcast streams that have pause/ 1174 resume capability and that would be given such low bitrate by the 1175 adaptation process that they are considered not really useful can be 1176 temporarily paused until the limiting condition clears. 1178 9. Limitation 1180 The chosen approach has a limitation that relates to the use of a 1181 single RTP session for all simulcast formats of a media source, which 1182 comes from sending all simulcast streams related to a media source 1183 under the same SDP media description. 1185 It is not possible to use different simulcast streams on different 1186 media transports, limiting the possibilities to apply different QoS 1187 to different simulcast streams. When using unicast, QoS mechanisms 1188 based on individual packet marking are feasible, since they do not 1189 require separation of simulcast streams into different RTP sessions 1190 to apply different QoS. 1192 It is also not possible to separate different simulcast streams into 1193 different multicast groups to allow a multicast receiver to pick the 1194 stream it wants, rather than receive all of them. In this case, the 1195 only reasonable implementation is to use different RTP sessions for 1196 each multicast group so that reporting and other RTCP functions 1197 operate as intended. Such simulcast usage in multicast context is 1198 out of scope for the current document and would require additional 1199 specification. 1201 10. IANA Considerations 1203 This document requests to register a new media-level SDP attribute, 1204 "simulcast", in the "att-field (media level only)" registry within 1205 the SDP parameters registry, according to the procedures of [RFC4566] 1206 and [I-D.ietf-mmusic-sdp-mux-attributes]. 1208 Contact name, email: IETF, contacted via mmusic@ietf.org, or a 1209 successor address designated by IESG 1211 Attribute name: simulcast 1213 Long-form attribute name: Simulcast stream description 1215 Charset dependent: No 1217 Attribute value: See Section 6.1 of RFC XXXX. 1219 Purpose: Signals simulcast capability for a set of RTP streams 1221 MUX category: NORMAL 1223 Note to RFC Editor: Please replace "RFC XXXX" with the assigned 1224 number of this RFC. 1226 11. Security Considerations 1228 The simulcast capability, configuration attributes, and parameters 1229 are vulnerable to attacks in signaling. 1231 A false inclusion of the "a=simulcast" attribute may result in 1232 simultaneous transmission of multiple RTP streams that would 1233 otherwise not be generated. The impact is limited by the media 1234 description joint bandwidth, shared by all simulcast streams 1235 irrespective of their number. There may however be a large number of 1236 unwanted RTP streams that will impact the share of bandwidth 1237 allocated for the originally wanted RTP stream. 1239 A hostile removal of the "a=simulcast" attribute will result in 1240 simulcast not being used. 1242 Neither of the above will likely have any major consequences and can 1243 be mitigated by signaling that is at least integrity and source 1244 authenticated to prevent an attacker to change it. 1246 Security considerations related to the use of "a=rid" and the 1247 RtpStreamId SDES item is covered in [I-D.ietf-mmusic-rid] and 1248 [I-D.ietf-avtext-rid]. There are no additional security concerns 1249 related to their use in this specification. 1251 12. Contributors 1253 Morgan Lindqvist and Fredrik Jansson, both from Ericsson, have 1254 contributed with important material to the first versions of this 1255 document. Robert Hansen and Cullen Jennings, from Cisco, Peter 1256 Thatcher, from Google, and Adam Roach, from Mozilla, contributed 1257 significantly to subsequent versions. 1259 13. Acknowledgements 1261 The authors would like to thank Bernard Aboba, Thomas Belling, Roni 1262 Even, and Adam Roach for the feedback they provided during the 1263 development of this document. 1265 14. References 1267 14.1. Normative References 1269 [I-D.ietf-avtext-rid] 1270 Roach, A., Nandakumar, S., and P. Thatcher, "RTP Stream 1271 Identifier Source Description (SDES)", draft-ietf-avtext- 1272 rid-09 (work in progress), October 2016. 1274 [I-D.ietf-mmusic-rid] 1275 Thatcher, P., Zanaty, M., Nandakumar, S., Burman, B., 1276 Roach, A., and B. Campen, "RTP Payload Format 1277 Restrictions", draft-ietf-mmusic-rid-08 (work in 1278 progress), October 2016. 1280 [I-D.ietf-mmusic-sdp-bundle-negotiation] 1281 Holmberg, C., Alvestrand, H., and C. Jennings, 1282 "Negotiating Media Multiplexing Using the Session 1283 Description Protocol (SDP)", draft-ietf-mmusic-sdp-bundle- 1284 negotiation-36 (work in progress), October 2016. 1286 [I-D.ietf-mmusic-sdp-mux-attributes] 1287 Nandakumar, S., "A Framework for SDP Attributes when 1288 Multiplexing", draft-ietf-mmusic-sdp-mux-attributes-14 1289 (work in progress), September 2016. 1291 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1292 Requirement Levels", BCP 14, RFC 2119, 1293 DOI 10.17487/RFC2119, March 1997, 1294 . 1296 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1297 Jacobson, "RTP: A Transport Protocol for Real-Time 1298 Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, 1299 July 2003, . 1301 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1302 Description Protocol", RFC 4566, DOI 10.17487/RFC4566, 1303 July 2006, . 1305 [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 1306 Specifications: ABNF", STD 68, RFC 5234, 1307 DOI 10.17487/RFC5234, January 2008, 1308 . 1310 [RFC7728] Burman, B., Akram, A., Even, R., and M. Westerlund, "RTP 1311 Stream Pause and Resume", RFC 7728, DOI 10.17487/RFC7728, 1312 February 2016, . 1314 14.2. Informative References 1316 [I-D.ietf-avtcore-multiplex-guidelines] 1317 Westerlund, M., Perkins, C., and H. Alvestrand, 1318 "Guidelines for using the Multiplexing Features of RTP to 1319 Support Multiple Media Streams", draft-ietf-avtcore- 1320 multiplex-guidelines-03 (work in progress), October 2014. 1322 [I-D.ietf-avtcore-rtp-multi-stream] 1323 Lennox, J., Westerlund, M., Wu, Q., and C. Perkins, 1324 "Sending Multiple RTP Streams in a Single RTP Session", 1325 draft-ietf-avtcore-rtp-multi-stream-11 (work in progress), 1326 December 2015. 1328 [RFC2198] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., 1329 Handley, M., Bolot, J., Vega-Garcia, A., and S. Fosse- 1330 Parisis, "RTP Payload for Redundant Audio Data", RFC 2198, 1331 DOI 10.17487/RFC2198, September 1997, 1332 . 1334 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1335 with Session Description Protocol (SDP)", RFC 3264, 1336 DOI 10.17487/RFC3264, June 2002, 1337 . 1339 [RFC3389] Zopf, R., "Real-time Transport Protocol (RTP) Payload for 1340 Comfort Noise (CN)", RFC 3389, DOI 10.17487/RFC3389, 1341 September 2002, . 1343 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 1344 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 1345 DOI 10.17487/RFC4588, July 2006, 1346 . 1348 [RFC4733] Schulzrinne, H. and T. Taylor, "RTP Payload for DTMF 1349 Digits, Telephony Tones, and Telephony Signals", RFC 4733, 1350 DOI 10.17487/RFC4733, December 2006, 1351 . 1353 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 1354 "Codec Control Messages in the RTP Audio-Visual Profile 1355 with Feedback (AVPF)", RFC 5104, DOI 10.17487/RFC5104, 1356 February 2008, . 1358 [RFC5109] Li, A., Ed., "RTP Payload Format for Generic Forward Error 1359 Correction", RFC 5109, DOI 10.17487/RFC5109, December 1360 2007, . 1362 [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP 1363 Header Extensions", RFC 5285, DOI 10.17487/RFC5285, July 1364 2008, . 1366 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 1367 Media Attributes in the Session Description Protocol 1368 (SDP)", RFC 5576, DOI 10.17487/RFC5576, June 2009, 1369 . 1371 [RFC5583] Schierl, T. and S. Wenger, "Signaling Media Decoding 1372 Dependency in the Session Description Protocol (SDP)", 1373 RFC 5583, DOI 10.17487/RFC5583, July 2009, 1374 . 1376 [RFC6184] Wang, Y., Even, R., Kristensen, T., and R. Jesup, "RTP 1377 Payload Format for H.264 Video", RFC 6184, 1378 DOI 10.17487/RFC6184, May 2011, 1379 . 1381 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 1382 "RTP Payload Format for Scalable Video Coding", RFC 6190, 1383 DOI 10.17487/RFC6190, May 2011, 1384 . 1386 [RFC6236] Johansson, I. and K. Jung, "Negotiation of Generic Image 1387 Attributes in the Session Description Protocol (SDP)", 1388 RFC 6236, DOI 10.17487/RFC6236, May 2011, 1389 . 1391 [RFC6464] Lennox, J., Ed., Ivov, E., and E. Marocco, "A Real-time 1392 Transport Protocol (RTP) Header Extension for Client-to- 1393 Mixer Audio Level Indication", RFC 6464, 1394 DOI 10.17487/RFC6464, December 2011, 1395 . 1397 [RFC7104] Begen, A., Cai, Y., and H. Ou, "Duplication Grouping 1398 Semantics in the Session Description Protocol", RFC 7104, 1399 DOI 10.17487/RFC7104, January 2014, 1400 . 1402 [RFC7656] Lennox, J., Gross, K., Nandakumar, S., Salgueiro, G., and 1403 B. Burman, Ed., "A Taxonomy of Semantics and Mechanisms 1404 for Real-Time Transport Protocol (RTP) Sources", RFC 7656, 1405 DOI 10.17487/RFC7656, November 2015, 1406 . 1408 [RFC7667] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, 1409 DOI 10.17487/RFC7667, November 2015, 1410 . 1412 [RFC7741] Westin, P., Lundin, H., Glover, M., Uberti, J., and F. 1413 Galligan, "RTP Payload Format for VP8 Video", RFC 7741, 1414 DOI 10.17487/RFC7741, March 2016, 1415 . 1417 Appendix A. Changes From Earlier Versions 1419 NOTE TO RFC EDITOR: Please remove this section prior to publication. 1421 A.1. Modifications Between WG Version -05 and -06 1423 o Added section on RTP Aspects 1425 o Added a requirement (5-4) on that capability exchange must be 1426 capable of handling multi RTP stream cases. 1428 o Added extmap attribute also on first signalling example as it is a 1429 recommended to use mechanism. 1431 o Clarified the definition of the simulcast attribute and how 1432 simulcast streams relates to simulcast formats and SCIDs. 1434 o Updated References list and moved around some references between 1435 informative and normative categories. 1437 o Editorial improvements and corrections. 1439 A.2. Modifications Between WG Version -04 and -05 1441 o Aligned with recent changes in draft-ietf-mmusic-rid and draft- 1442 ietf-avtext-rid. 1444 o Modified the SDP offer/answer section to follow the generally 1445 accepted structure, also adding a brief text on modifying the 1446 session that is aligned with draft-ietf-mmusic-rid. 1448 o Improved text around simulcast stream identification (as opposed 1449 to the simulcast stream itself) to consistently use the acronym 1450 SCID and defined that in the Terminology section. 1452 o Changed references for RTP-level pause/resume and VP8 payload 1453 format that are now published as RFC. 1455 o Improved IANA registration text. 1457 o Removed unused reference to draft-ietf-payload-flexible-fec- 1458 scheme. 1460 o Editorial improvements and corrections. 1462 A.3. Modifications Between WG Version -03 and -04 1464 o Changed to only use RID identification, as was consensus during 1465 IETF 94. 1467 o ABNF improvements. 1469 o Clarified offer-answer rules for initially paused streams. 1471 o Changed references for RTP topologies and RTP taxonomy documents 1472 that are now published as RFC. 1474 o Added reference to the new RID draft in AVTEXT. 1476 o Re-structured section 6 to provide an easy reference by the 1477 updated IANA section. 1479 o Added a sub-section 7.1 with a discussion of bitrate adaptation. 1481 o Editorial improvements. 1483 A.4. Modifications Between WG Version -02 and -03 1485 o Removed text on multicast / broadcast from use cases, since it is 1486 not supported by the solution. 1488 o Removed explicit references to unified plan draft. 1490 o Added possibility to initiate simulcast streams in paused mode. 1492 o Enabled an offerer to offer multiple stream identification (pt or 1493 rid) methods and have the answerer choose which to use. 1495 o Added a preference indication also in send direction offers. 1497 o Added a section on limitations of the current proposal, including 1498 identification method specific limitations. 1500 A.5. Modifications Between WG Version -01 and -02 1502 o Relying on the new RID solution for codec constraints and 1503 configuration identification. This has resulted in changes in 1504 syntax to identify if pt or RID is used to describe the simulcast 1505 stream. 1507 o Renamed simulcast version and simulcast version alternative to 1508 simulcast stream and simulcast format respectively, and improved 1509 definitions for them. 1511 o Clarification that it is possible to switch between simulcast 1512 version alternatives, but that only a single one be used at any 1513 point in time. 1515 o Changed the definition so that ordering of simulcast formats for a 1516 specific simulcast stream do have a preference order. 1518 A.6. Modifications Between WG Version -00 and -01 1520 o No changes. Only preventing expiry. 1522 A.7. Modifications Between Individual Version -00 and WG Version -00 1524 o Added this appendix. 1526 Authors' Addresses 1528 Bo Burman 1529 Ericsson 1530 Gronlandsgatan 31 1531 SE-164 60 Stockholm 1532 Sweden 1534 Email: bo.burman@ericsson.com 1536 Magnus Westerlund 1537 Ericsson 1538 Farogatan 2 1539 SE-164 80 Stockholm 1540 Sweden 1542 Phone: +46 10 714 82 87 1543 Email: magnus.westerlund@ericsson.com 1545 Suhas Nandakumar 1546 Cisco 1547 170 West Tasman Drive 1548 San Jose, CA 95134 1549 USA 1551 Email: snandaku@cisco.com 1553 Mo Zanaty 1554 Cisco 1555 170 West Tasman Drive 1556 San Jose, CA 95134 1557 USA 1559 Email: mzanaty@cisco.com