idnits 2.17.1 draft-westerlund-avtcore-rtp-simulcast-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 16, 2012) is 4292 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-03) exists of draft-westerlund-avtext-rtcp-sdes-srcname-01 ** Obsolete normative reference: RFC 4566 (Obsoleted by RFC 8866) == Outdated reference: A later version (-03) exists of draft-westerlund-avtcore-multiplex-architecture-02 == Outdated reference: A later version (-07) exists of draft-westerlund-avtcore-transport-multiplexing-03 -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Westerlund 3 Internet-Draft B. Burman 4 Intended status: Standards Track M. Lindqvist 5 Expires: January 17, 2013 F. Jansson 6 Ericsson 7 July 16, 2012 9 Using Simulcast in RTP sessions 10 draft-westerlund-avtcore-rtp-simulcast-01 12 Abstract 14 In some applications it may be necessary to send multiple media 15 streams derived from the same media source. This is called 16 Simulcast. This document discusses the best way of accomplishing 17 this in RTP. It is concluded that a session based solution provides 18 best support for simulcast, and a solution for that is defined. 19 There are two necessary extensions. The first extension is how to 20 group RTP sessions belonging to the same simulcast source using the 21 grouping framework, and the second is how to identify which SSRCs 22 that are the same media source by using a new RTCP SDES item SRCNAME. 24 Status of this Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on January 17, 2013. 41 Copyright Notice 43 Copyright (c) 2012 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 60 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 5 61 2.2. Requirements Language . . . . . . . . . . . . . . . . . . 5 62 3. Simulcast and Applicability . . . . . . . . . . . . . . . . . 5 63 3.1. Simulcasting to RTP Mixer . . . . . . . . . . . . . . . . 5 64 3.1.1. Simulcast Combined with Scalable Encoding . . . . . . 7 65 3.2. Multicast Transported Simulcasted Media . . . . . . . . . 7 66 3.2.1. Diversity in Receiver Population . . . . . . . . . . . 7 67 3.2.2. Bit-rate Adaptation . . . . . . . . . . . . . . . . . 8 68 3.3. Simulcasting to a Consuming End-Point . . . . . . . . . . 9 69 3.4. Same Encoding to Multiple Destinations . . . . . . . . . . 9 70 3.5. Different Encoding to Independent Destinations . . . . . . 10 71 4. Simulcast Alternatives . . . . . . . . . . . . . . . . . . . . 10 72 4.1. Using the Payload Type . . . . . . . . . . . . . . . . . . 11 73 4.2. Using Single RTP session . . . . . . . . . . . . . . . . . 11 74 4.3. Using Multiple RTP sessions . . . . . . . . . . . . . . . 11 75 5. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 76 5.1. RTP/RTCP Aspects . . . . . . . . . . . . . . . . . . . . . 12 77 5.2. Signalling Aspects . . . . . . . . . . . . . . . . . . . . 13 78 5.3. Network Aspects . . . . . . . . . . . . . . . . . . . . . 13 79 5.4. Security Aspects . . . . . . . . . . . . . . . . . . . . . 14 80 5.5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . 14 81 6. Signaling Support for Multiple RTP session based Simulcast . . 15 82 6.1. Grouping Simulcast RTP Sessions . . . . . . . . . . . . . 15 83 6.1.1. Declarative Use . . . . . . . . . . . . . . . . . . . 15 84 6.1.2. Offer/Answer Use . . . . . . . . . . . . . . . . . . . 16 85 6.2. Media Stream Requirements . . . . . . . . . . . . . . . . 16 86 6.3. Relating Alternative Encodings . . . . . . . . . . . . . . 16 87 6.4. Multiple Stream handling . . . . . . . . . . . . . . . . . 16 88 7. Simulcast Signalling Examples . . . . . . . . . . . . . . . . 17 89 7.1. Alice: Desktop Client . . . . . . . . . . . . . . . . . . 17 90 7.2. Bob: Telepresence Room . . . . . . . . . . . . . . . . . . 19 91 7.3. Fred: Dial-out to Legacy Client . . . . . . . . . . . . . 23 92 7.4. Joe: Dial-out to Desktop Client . . . . . . . . . . . . . 26 93 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 94 9. Security Considerations . . . . . . . . . . . . . . . . . . . 30 95 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 30 96 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 30 97 11.1. Normative References . . . . . . . . . . . . . . . . . . . 30 98 11.2. Informative References . . . . . . . . . . . . . . . . . . 31 99 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 32 101 1. Introduction 103 Simulcast is the act of simultaneously sending multiple different 104 versions of the same media content, e.g. the same video source 105 encoded with different video encoders. This can be done in several 106 ways and for different purposes. This document focuses on the case 107 where one wants to provide multiple streams with different encodings 108 over RTP [RFC3550] towards an intermediary so that the intermediary 109 can select which encoding to forward to other participants in the 110 session, and more specifically how the grouping of the streams is 111 defined. 113 The different encodings of a media content considered in this 114 document can differ in: 116 Bit-rate: The difference is the amount of bits spent to encode the 117 media thus giving different quality. 119 Codec: Different media codecs are used to ensure that different 120 receivers that do not have a common set of decoders can decode at 121 least one of the versions. This can include codec configuration 122 options that are not compatible, like video encoder profiles, or 123 the capability of receiving the transport packetization. 125 Sampling: Different sampling of media, in spatial as well as in 126 temporal domain, may be used to suit different rendering 127 capabilities or needs at the receiving endpoints, as well as a 128 method to achieve different bit-rates. For video streams, spatial 129 sampling affects image resolution and temporal sampling affects 130 video frame rate. For audio, spatial sampling relates to the 131 number of audio channels and temporal sampling affects audio 132 bandwidth. Obviously, a difference in sampling may result in 133 difference in bit-rate. 135 There are different reasons for an application to provide a single 136 media source in different encodings. As soon as an application has 137 the need to send multiple encodings, there is a potential need for 138 simulcast. This need can arise even when using media codecs that 139 have scalability features built in. The purpose of this document is 140 to find the most suitable solution for the non-trivial variants of 141 simulcast and in order to do this, different ways of multiplexing the 142 different encodings are discussed. Following the presentation of the 143 alternatives, an analysis is performed on how different aspects like 144 RTP mechanisms, signaling possibilities, and network features are 145 affected by the alternatives. This is a specific application of the 146 aspects discussed in RTP Multiplexing Architecture 147 [I-D.westerlund-avtcore-multiplex-architecture]. The discussion 148 results in a conclusion, a solution, and a proposal for the 149 standardization work required to support simulcast. 151 2. Definitions 153 2.1. Terminology 155 The following terms and abbreviations are used in this document: 157 Encoding: A particular encoding is the choice of the media encoder 158 (codec) that has been used to compress the media and the fidelity 159 of that encoding through the choice of sampling, bit-rate and 160 other codec configuration parameters. 162 Different encodings: An encoding is different when some parameter 163 that characterize the encoding of a particular media source is 164 changed. Such changes can be one or more of the following 165 parameters; codec, codec configuration, bit-rate, sampling. 167 Simulcast versions: Media streams used for simulcast that use 168 different encodings and thus constitute different versions of the 169 same media source. 171 2.2. Requirements Language 173 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 174 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 175 document are to be interpreted as described in RFC 2119 [RFC2119]. 177 3. Simulcast and Applicability 179 This section discusses different usage scenarios for the term 180 simulcast and clarifies which of those this document focuses on. It 181 also reviews why simulcast and scalable codecs can be a useful 182 combination. 184 3.1. Simulcasting to RTP Mixer 186 This scenario relates to a multi-party session where one or more 187 central nodes are used to facilitate the media transport between the 188 session participants. Thus, this targets the RTP Mixer Topology 189 defined in [RFC5117] (Section 3.4: Topo-Mixer). This scenario is 190 targeted for further discussion in this document. 192 Simulcasting different media encodings of video that differ both in 193 resolution and in bit-rate is highly applicable to video conferencing 194 scenarios. For example, an RTP mixer selects the video of the most 195 active speaker and sends that participant's video stream as a high 196 resolution stream to the other participants, and in addition also 197 sends a number of low resolution video streams of the other 198 participants, enabling the receiving user to both display the current 199 speaker in high quality and monitor the other participants in lower 200 quality/resolution/size. As the participants should not receive the 201 stream showing themselves, the set of streams will be unique to all 202 participants. 204 A number of alternatives exist to provide both high and low 205 resolutions from an RTP Mixer: 207 Simulcast: The clients send one stream for the low resolution and 208 another for the high resolution. 210 Scalable Video Coding: The clients are using a video encoder that 211 can provide one stream that is both providing the high resolution 212 and also enables the mixer to extract a low resolution 213 representation from that single stream. 215 Transcoding in the Mixer: The clients send a high resolution stream 216 to the RTP Mixer which performs a transcoding to a lower 217 resolution stream. 219 The Transcoding alternative requires that the RTP mixer has 220 sufficient amount of transcoding resources to produce the number of 221 low resolution streams required. In worst case, all participants' 222 streams may need to be transcoded. If the resources are not 223 available, a different solution is needed. There will also normally 224 be a quality loss and an increase in latency associated with the 225 transcoding operation. 227 Scalable video encoding requires a more complex encoder compared to 228 non-scalable encoding. Also, if the resolution difference between 229 the streams is large, a scalable codec may in fact be only marginally 230 more bandwidth efficient than the simulcast case where the different 231 resolutions are sent as separate streams from the clients to the 232 mixer. At the same time, with scalable video encoding, the 233 transmission of all but the lowest resolution will consume more 234 bandwidth from the mixer to the other participants than with a non- 235 scalable encoding. 237 Simulcasting has the benefit that it is conceptually simple. It 238 enables the use of any media codec that the participants agree on, 239 allowing the RTP mixer to be codec-agnostic. With the currently 240 available video encoders, simulcasting may be less bit-rate efficient 241 in the path from the sending client to the mixer but more efficient 242 in the mixer to receiver path compared to Scalable Video Coding. 244 +------------+ +---+ 245 +---+ | |----->| B | 246 | |=====>| | +---+ 247 | A | | Mixer | 248 | |----->| | +---+ 249 +---+ | |=====>| C | 250 +------------+ +---+ 252 Figure 1: RTP Mixer selecting from simulcast versions 254 The sender A provides the mixer with both a high resolution version 255 "===>" and a low resolution version "--->". The mixer selects who in 256 it's receiver population should get a particular version. 258 3.1.1. Simulcast Combined with Scalable Encoding 260 As explained in the previous section, a scalable codec is not always 261 more bandwidth efficient than simulcast, especially in the path from 262 the mixer to the receiver. 264 There are however cases where a combination of simulcast and scalable 265 encoding can be beneficial. By using simulcast in cases where the 266 scalable codec is less efficient, one can optimize the efficiency of 267 the complete system. A good example of this usage would be where the 268 video is encoded using SVC transported in RTP [RFC6190], where each 269 simulcast stream has a different resolution, and each SVC media 270 stream uses temporal scalability and signal to noise ratio (SNR) 271 scalability within that single media stream. If only resolution and 272 temporal variations are needed, this can be implemented using the 273 non-scalable part of H.264, as each simulcast version provides the 274 different resolution, and each media stream within a simulcast 275 encoding has temporal scalability through the use of non-reference 276 frames. 278 3.2. Multicast Transported Simulcasted Media 280 When using multicast, particularly Source-Specific Multicast (SSM) 281 [RFC3569] to distribute RTP/RTCP packets to a large receiver 282 population one faces some issues. There are at least two different 283 issues where simulcast can potentially be useful. 285 3.2.1. Diversity in Receiver Population 287 If there is any diversity in the receivers regarding e.g. capability, 288 codec support or code base, there are potentially restrictions in 289 what streams can be delivered to the receivers. If using the lowest 290 common denominator over a diverse receiver population isn't 291 acceptable, simulcast can be one possible solution. By offering 292 different stream alternatives, it is possible to let the receivers 293 choose the simulcast version that matches their capabilities. By 294 using explicit signalling for simulcast, it is not necessary for the 295 stream distributor to handle multiple receiver configurations 296 individually for a multi-media session, nor to ensure that each 297 receiver gets an encoding that matches their capabilities. 299 The simulcast version granularity the receivers can select will be on 300 multicast group level. Thus, this use case puts a strict requirement 301 on supporting RTP session multiplexing. The reason being that having 302 a single RTP session straddle several multicast groups makes any 303 reporting on the received sources very difficult to interpret. Using 304 one RTP session per simulcast version instead provides consistency. 306 3.2.2. Bit-rate Adaptation 308 If the network paths from the media sender to the receivers can 309 support different bit-rates, there is a need to support media streams 310 encoded to different bit-rates. If these path differences are of a 311 more static nature, for example depending primarily on the underlying 312 link layers, using simulcast has an advantage over scalable encoding. 313 The reason is that the efficiency of scalable coding will never be 314 better than encoding to a single target rate. When the receiver can 315 determine current network interface connectivity, it can choose 316 simulcast version with certainty. That choice will also be correct 317 until the event of another network interface becoming the active one. 318 This assumes that the multicast transmission uses dedicated resources 319 and will thus not be congested due to other network traffic. To 320 support this behavior, the signalling must support indication of 321 which media streams that are alternatives to each other, and it is 322 also necessary to be able to determine aggregate bit-rate for the 323 selected multicast group(s) compared to available network properties. 325 Simulcast is possible to use also in more dynamic situations where 326 each receiver continuously gathers reception statistics to detect 327 path congestion and based on that may change which version to 328 receive. The main issue with such usage is how to achieve a switch 329 from one version to another with minimal playback interruption and 330 also avoiding to put extra load on the network during the actual 331 switch. Here, scalable encoding in general have better 332 characteristics since scalability layers are typically synchronized. 334 When comparing simulcast and scalable encoding, the trade-offs are 335 different and the down-sides occur at different places. Simulcast 336 will have a higher bit-rate load at a media sender and that will also 337 be the case for any network path shared between receivers of multiple 338 simulcast versions. However, for parts of the network path where 339 there is only a single simulcast version, the achievable quality at a 340 given bit-rate will be slightly higher for simulcast. It will also 341 be more difficult to seamlessly switch between simulcast versions 342 than between different scalable encodings, as simulcast actually 343 switches from one media stream version to another instead of adding 344 or removing some enhancement layers. 346 3.3. Simulcasting to a Consuming End-Point 348 This scenario is based on an RTP Transport Translator (Section 3.3: 349 Topo-Trn-Translator) [RFC5117]. The transport translator functions 350 as a relay and transmits all streams received from one participant to 351 all other participants. For example, when simulcasting a low 352 resolution and a high resolution video stream, the RTP Translator 353 would send all the streams to all clients. This clearly increases 354 the bit-rate transmitted on the paths to the clients compared to the 355 mixer case in the previous section. The only simulcast benefit for 356 the receiving client over a single stream scenario would be reduced 357 decoding complexity for the low resolution streams. A single stream 358 scenario which only transmits the high resolution stream would allow 359 the receiver to decode it and scale it down to the desired 360 resolution. 362 The usage of transport translator and simulcast becomes efficient if 363 each receiving client is allowed to control or configure the relay 364 with respect to which version it wants to receive. However, such 365 usage of RTP has some potential issues with RTCP. One example is 366 when a receiver has indicated to the transport translator that it 367 does not want to receive a particular stream, but at the same time it 368 is receiving and reporting on other streams from the same sender. In 369 this case, the sender will receive no RTCP messages about the non- 370 forwarded stream and therefore get the impression that the stream 371 somehow is lost. Thus some consideration and mechanism are needed to 372 support such a use case in order not to break RTCP reception 373 reporting. 375 This scenario is considered in the continuation of the document but 376 with less emphasis than on the RTP mixer case. 378 3.4. Same Encoding to Multiple Destinations 380 One interpretation of simulcast is when one encoding is sent to 381 multiple receivers. This is well supported in RTP by simply copying 382 all outgoing RTP and RTCP traffic to several transport destinations, 383 if the intention is to create a common RTP session. As long as all 384 participants do the same, a full mesh is constructed and everyone in 385 the multi party session have a similar view of the joint RTP session. 386 This is analog to an Any Source Multicast (ASM) session but without 387 the traffic optimization as multiple copies of the same content is 388 likely to have to pass over the same link. 390 +---+ +---+ 391 | A |<---->| B | 392 +---+ +---+ 393 ^ ^ 394 \ / 395 \ / 396 v v 397 +---+ 398 | C | 399 +---+ 401 Figure 2: Full Mesh / Multi-unicast 403 As this type of simulcast is analog to ASM usage and RTP has good 404 support for ASM sessions, no further consideration for this scenario 405 is made in this document. 407 3.5. Different Encoding to Independent Destinations 409 Another alternative interpretation of simulcast is multiple 410 destinations, where each destination gets a specifically tailored 411 version, but where the destinations are independent. A typical 412 example for this would be a streaming server distributing the same 413 live session to a number of receivers, adapting the quality and 414 resolution of the multi-media session to each receiver's capability 415 and available bit-rate. This case can be solved in RTP by having 416 independent RTP sessions between the sender and the receivers. Thus 417 this case is not considered further. 419 4. Simulcast Alternatives 421 Simulcast is defined in this document as the act of sending multiple 422 alternative encodings of the same underlying media source. When 423 transmitting multiple independent streams that originate from the 424 same source, it could potentially be done in several different ways 425 using RTP. The below sub-sections describe potential ways of 426 achieving stream multiplexing and identification of which streams are 427 alternative encodings of the same source. In the following 428 descriptions it is also included how this interacts with multiple 429 sources (SSRCs) in the same RTP session for other reasons than 430 simulcast. Multiple SSRCs may occur for various reasons such as 431 multiple participants in multipoint topologies such as multicast, 432 transport relays or full mesh transport simulcasting, multiple source 433 devices, such as multiple cameras or microphones at one end-point, or 434 other RTP mechanisms such as RTP Retransmission [RFC4588]. 436 4.1. Using the Payload Type 438 This alternative uses only the RTP payload type to identify the 439 different simulcast streams. Thus all simulcast streams would be 440 sent in the same RTP session using only a single SSRC per actual 441 media source. However, as discussed in Guidelines for using the 442 Multiplexing Features of RTP 443 [I-D.westerlund-avtcore-multiplex-architecture], using Payload Type 444 Multiplexing does not work and is hereby dismissed as potential 445 solution. 447 4.2. Using Single RTP session 449 This idea is based on using a unique SSRC for each alternative 450 encoding of an actual media source within a single RTP session. The 451 identification of how streams are considered to be alternative needs 452 an additional mechanism, for example using SSRC grouping [RFC5576] 453 and a new SDES item such as SRCNAME proposed in 454 [I-D.westerlund-avtext-rtcp-sdes-srcname] with a semantics that 455 indicate them as alternatives of a particular media source. When 456 there are multiple actual media sources in a session, each media 457 source will have to use a number of SSRCs to represent the different 458 alternatives it produces. For example, if all actual media sources 459 are similar and produce the same number of simulcast versions, there 460 will be n*m SSRCs in use in the RTP session, where n is the number of 461 actual media sources and m the number of simulcast versions they can 462 produce. Each SSRC can use any of the configured payload types for 463 this RTP session. All session level attributes and parameters that 464 are not source specific will apply and must function with all the 465 alternative encodings intended to be used. 467 4.3. Using Multiple RTP sessions 469 Using multiple RTP sessions means that each different simulcast 470 version of an actual media source is transmitted in a separate RTP 471 session, using whatever session identifier to distinguish the 472 different versions. This solution needs explicit session grouping 473 [RFC5888] with a semantics that indicate them as alternatives. It is 474 also important to identify the SSRCs in the different sessions that 475 are alternative encodings of the same media source. This could be 476 accomplished using the same SSRC across the sessions, but that is not 477 robust against SSRC collisions and could potentially force cascading 478 SSRC changes between sessions. A better choice would be to use the 479 same value for the a new SDES item proposed in 480 [I-D.westerlund-avtext-rtcp-sdes-srcname]. Each RTP session will 481 have its own set of configured RTP payload types available for use 482 with any SSRC in that session. In addition, all other attributes for 483 sessions or sources can be used as normal to indicate the 484 configuration of that particular alternative. 486 5. Analysis 488 This section provides an analysis of simulcast as a specific case of 489 the aspects discussed in Guidelines for using the Multiplexing 490 Features of RTP [I-D.westerlund-avtcore-multiplex-architecture] to 491 determine what is the most suitable solution. The below section 492 discusses the relevant points for simulcast and contrasts using only 493 SSRCs with using both RTP sessions and SSRC. 495 5.1. RTP/RTCP Aspects 497 The RTP/RTCP aspects of relevance are: 499 RTP Specification: From a base RTP specification point of view, 500 there is no real difference between a single RTP session or using 501 multiple RTP sessions. 503 Multiple SSRC Legacy Considerations: Dealing with legacy handling of 504 multiple SSRCs in one RTP session for simulcast is a minor issue 505 as end-points supporting simulcast will implement the necessary 506 support. They should also determine if there is necessary support 507 based on signalling. However, for cases where usage of simulcast 508 is combined with legacy in the same scenario, multiple RTP 509 sessions will have an advantage as the number of SSRCs in each 510 session does not increase due to simulcast, only the number of 511 sessions. 513 Cross Session RTCP Requests: In the case of simulcast, the findings 514 in the architecture document stands and might be relevant when 515 switching between simulcast versions to configure current code 516 control state. 518 Binding Related Sources: Simulcast will require a clear binding 519 between the SSRCs carrying the different simulcast versions. This 520 issue will be independent of using one or multiple RTP sessions. 522 Transport Translators: Transport translators and simulcast is not 523 the best match. This as the core of the functionality desired in 524 simulcast is usually to be able to switch between alternatives, 525 which is not really possible with transport translators as they do 526 not manipulate the media streams. However, if one uses multiple 527 RTP sessions, a session participant can control the simulcast 528 version it receives in a very coarse grained fashion by joining 529 the right RTP session. However, it is not capable of switching 530 individual sources within the sessions. 532 Regarding RTP/RTCP aspects, multiple RTP sessions based solution can 533 handle legacy better, while an single RTP seesion solution has some 534 advantage if there is need for synchronized requests across multiple 535 stream versions, but there are no major differences. 537 5.2. Signalling Aspects 539 The signalling aspects is one of the major issues for simulcast. In 540 the currently used signalling system based on SDP [RFC4566] and 541 Offer/Answer [RFC3264], the properties of media streams are 542 negotiated on RTP session level. This is discussed in Section 7.3.1 543 of the Guidelines for using the Multiplexing Features of RTP 544 [I-D.westerlund-avtcore-multiplex-architecture]. 546 As simulcast is all about being able to signal and negotiate what the 547 different simulcast versions should be, it becomes important that the 548 signalling supports such usage. A SSRC only solution does not 549 prevent such signalling to be developed, but SSRC centric signalling 550 is currently almost non-existent. If Session and SSRC based solution 551 is used instead, it is already possible to signal and negotiate the 552 version properties on a session level. Negotiated media properties 553 will apply to all media sources sent in the same RTP session, which 554 is likely not an issue in most cases. For example, using a common 555 simulcast version definition across all media sources at one end- 556 point will allow an RTP mixer choose both which media sources and 557 which simulcast versions of them to forward towards the other end- 558 points. 560 From a signalling perspective, the only rapid way forward is multiple 561 RTP sessions based solution. 563 5.3. Network Aspects 565 The network aspects that have any relevance for simulcast are: 567 Quality of Service: When using simulcast it might be of interest to 568 prioritize a particular simulcast version, rather than applying 569 equal treatment of all versions. For example, lower bit-rate 570 versions may be prioritized over higher bit-rate versions to 571 minimize congestion or packet losses in the low bit-rate versions. 572 Thus, there is a benefit to use a simulcast solution that supports 573 QoS as good as possible. By using RTP sessions over different 574 transport flows, a simulcast version can be prioritized by flow 575 based QoS mechanisms. If the application would like to prioritize 576 a particular media source in one simulcast version then the two 577 proposals are equal. 579 NAT/FW Traversal: Using multiple RTP sessions will incur more cost 580 for NAT/FW traversal unless the solution for multiplexing multiple 581 RTP sessions on a single lower layer transport 582 [I-D.westerlund-avtcore-transport-multiplexing] is used, in which 583 cases they are basically equal. That is both from NAT/FW 584 traversal perspective and QoS possibilities. If flow based QoS 585 with any differentiation is desirable, the cost for additional 586 transport flows is likely necessary. 588 Multicast: To enable simulcast to be combined with multicast, it 589 will be required to use multiple RTP sessions. Multicast groups 590 need be separate for the different versions to allow a multicast 591 receiver to pick the version it wants, rather than receive all of 592 them. In this case, the only reasonable implementation is to use 593 different RTP sessions for each multicast group so that reporting 594 and other RTCP functions operate as intended. 596 Using multiple RTP Sessions are clearly the better choice when taking 597 network aspects into account. Multiple RTP Sessions are required to 598 support any multicast usage. In addition, it can provide support for 599 differentiated flow based QoS. The extra NAT/FW traversal costs can 600 be mitigated completely by multiplexing all RTP sessions over a 601 single transport. 603 5.4. Security Aspects 605 The discussed security aspects has the following applicability or 606 considerations when it comes to simulcast: 608 Security Context Scope: Both issues may be applicable to simulcast 609 usage. If differentiation enforcement is based on encryption and 610 keying then multiple RTP session based simulcast has a slight 611 benefit. 613 Key-Management: There is no significant difference in the solution 614 except that multiple RTP sessions may require keying more 615 contexts. Having more contexts is also what brings additional 616 freedom to make differentiation. 618 There is a small difference in security aspects where multiple RTP 619 sessions provides more freedom, but also a higher cost in the amount 620 of contexts needing to be keyed. 622 5.5. Summary 624 Defining multiple RTP sessions based simulcast appears to be the best 625 choice. It supports the most use cases including the multicast based 626 one, it has better support for flow based QoS, and the NAT/FW costs 627 can be mitigated. When it comes to signalling, multiple RTP sessions 628 based simulcast appears to require a modest set of extensions to 629 work, while a single RTP session seems to require large amounts of 630 extensions to enable sets of SSRC to negotiate different parameters 631 that differentiate the simulcast versions. Multiple RTP sessions 632 also provide greater flexibility when it comes to key-management 633 choices for the applications. 635 A single RTP session solution, as a complement to the multiple RTP 636 sessions, is not considered due to the large amount of extensions 637 required for signalling. The needed extensions to support single RTP 638 session simulcast may be defined in the future. 640 6. Signaling Support for Multiple RTP session based Simulcast 642 To enable the usage of multiple RTP sessions based simulcast, some 643 minimal additional signaling support is required. That support is 644 discussed in this section. First of all, there is a need for a 645 mechanism to identify the RTP sessions carrying simulcast versions 646 from the same media source. Secondly, a receiver needs to be able to 647 identify the SSRCs in the different sessions belonging to the same 648 media source. Beyond the necessary signaling support for simulcast, 649 some very useful optimizations regarding transmission of media 650 streams are described that will also help RTP mixers to select which 651 stream alternatives to deliver to a specific client, or request a 652 client to encode in a particular way. 654 6.1. Grouping Simulcast RTP Sessions 656 The proposal is to define a new grouping semantics for the session 657 groupings framework [RFC5888]. There is a need to separate the 658 semantics of intent to send simulcast streams from the capability to 659 recognize and receive simulcast streams. For that reason two new 660 simulcast grouping semantics are defined, "SimulCast Receive" (SCR) 661 and "SimulCast Send" (SCS). They both act as an indicator that 662 session level simulcast is desired and provide one set of RTP 663 sessions that carries simulcast versions of media sources. There may 664 be multiple sets of RTP Sessions that carries simulcast versions. 666 6.1.1. Declarative Use 668 When used as a declarative media description, SCR indicates the 669 configured end-point's required capability to recognize and receive a 670 specified set of RTP streams as simulcast streams. In the same 671 fashion, SCS requests the end-point to send a specified set of RTP 672 streams as simulcast streams. SCR and SCS MAY be used independently 673 and at the same time and they need not specify the same or even the 674 same number of RTP sessions in the group. 676 6.1.2. Offer/Answer Use 678 When used in an offer, SCS indicates the SDP providing agent's intent 679 of sending simulcast and the particular set of RTP sessions, and SCR 680 indicates the agent's capability of receiving simulcast streams 681 within the configured set of RTP Sessions. SCS and SCR MAY be used 682 independently and at the same time and they need not specify the same 683 or even the same number of RTP sessions in the group. The answerer 684 MUST change SCS to SCR and SCR to SCS in the answer, given that it 685 has and wants to use the corresponding (reverse) capability. An 686 answerer not supporting the SCS or SCR direction, or not supporting 687 SCS or SCR grouping semantics at all, will remove that grouping 688 attribute altogether, according to the grouping framework [RFC5888]. 689 An offerer that receives an answer indicating lack of simulcast 690 support in one or both directions, where SCR and/or SCS grouping are 691 removed, MUST NOT use simulcast in the non-supported direction(s). 693 6.2. Media Stream Requirements 695 When doing simulcast, the media streams that are alternatives need 696 certain considerations to ensure that switching between alternative 697 streams are as issue-free as possible. The following considerations 698 are needed: 700 Same Clock Base: To enable correct alignment of media packets on the 701 source time-line, all alternative streams (SSRCs) MUST use the 702 same underlying clock to relate their RTP timestamp values with 703 the network time protocol (NTP) formatted sender time in the RTCP 704 Sender Reports. 706 6.3. Relating Alternative Encodings 708 To ensure that simulcast streams can be related correctly, the usage 709 of the SDES SRCNAME [I-D.westerlund-avtext-rtcp-sdes-srcname] with 710 the same value across simulcast versions is belonging to the same 711 media source is REQUIRED. 713 6.4. Multiple Stream handling 715 The grouping semantics SCR and SCS SHOULD be combined with the SDP 716 attributes "a=max-send-ssrc" and "a=max-recv-ssrc" 717 [I-D.westerlund-avtcore-max-ssrc] to indicate the number of 718 simultaneous streams of each encoding that may be sent or that can be 719 handled in the receive direction. 721 7. Simulcast Signalling Examples 723 This example is for a case of client to video conference service 724 using a centralized media topology with an RTP mixer. Alice and Bob 725 calls into a conference server for a conference call with audio and 726 video sent to the RTP mixer, these clients being capable to send a 727 few video simulcast versions. The conference server also dials out 728 to Fred, which is a legacy client resulting in fallback behavior. 729 When dialing out to Joe, more functionality is enabled as Joe is a 730 client similar to Alice. 732 +---+ +-----------+ +---+ 733 | A |<---->| |<---->| B | 734 +---+ | | +---+ 735 | Mixer | 736 +---+ | | +---+ 737 | F |<---->| |<---->| J | 738 +---+ +-----------+ +---+ 740 Figure 3: Four-party Mixer-based Conference 742 Example of Media plane for RTP mixer based multi-party conference 743 with 4 participants. 745 7.1. Alice: Desktop Client 747 Alice is calling in to the mixer with an audiovisual single stream 748 desktop client, only adding capability to send simulcast and announce 749 SRCNAME, compared to a legacy client. The offer from Alice looks 750 like 751 v=0 752 o=alice 2362969037 2362969040 IN IP4 192.0.2.156 753 s=Simulcast enabled Desktop Client 754 t=0 0 755 c=IN IP4 192.0.2.156 756 b=AS:825 757 a=group:SCS 2 3 758 m=audio 49200 RTP/AVP 96 97 9 8 759 b=AS:145 760 a=rtpmap:96 G719/48000/2 761 a=rtpmap:97 G719/48000 762 a=rtpmap:9 G722/8000 763 a=rtpmap:8 PCMA/8000 764 a=ssrc:521923924 cname:alice@foo.example.com 765 a=ssrc:521923924 srcname:a 766 a=mid:1 767 m=video 49300 RTP/AVP 96 768 b=AS:520 769 a=rtpmap:96 H264/90000 770 a=fmtp:96 profile-level-id=42c01e 771 a=imageattr:* send [x=640,y=360] recv [x=640,y=360] [x=320,y=180] 772 a=ssrc:192392452 cname:alice@foo.example.com 773 a=ssrc:192392452 srcname:v 774 a=mid:2 775 a=content:main 776 m=video 49400 RTP/AVP 96 777 b=AS:160 778 a=rtpmap:96 H264/90000 779 a=fmtp:96 profile-level-id=42c00d 780 a=imageattr:96 send [x=320,y=180] 781 a=ssrc:239245219 cname:alice@foo.example.com 782 a=ssrc:239245219 srcname:v 783 a=mid:3 784 a=sendonly 786 Figure 4: Alice Offer for a Simulcast Conference 788 As can be seen from the SDP, Alice has a simulcast-enabled client and 789 offers two different simulcast versions sent from her single camera, 790 indicated by the SCS grouping tag and the two media IDs (2 and 3). 791 The first video version with media ID 2 prefers 360p resolution 792 (signaled via imageattr) and the second video version with media ID 3 793 prefers 180p resolution. The first video media line also acts as the 794 single receive video (making media line sendrecv), while the second 795 video media line is only related to simulcast transmission and is 796 thus offered sendonly. The two simulcast encoding streams and its 797 related audio stream are bound together using SRCNAME SDES item with 798 the identifier "v", a single level is required in this case. We also 799 declare the end-point CNAME as all sources belong to the same 800 synchronization context. 802 7.2. Bob: Telepresence Room 804 Bob is calling in to the mixer with a telepresence client that has 805 capability for both sending multi-stream, receiving and local 806 rendering of those multiple streams, as well as sending simulcast 807 versions to the mixer. More specifically, in this example the client 808 has three cameras, each being sent in three different simulcast 809 versions. In the receive direction, up to two main screens can show 810 video from a (multi-stream) conference participant being active 811 speaker, and still more screen estate can be used to show videos from 812 up to 16 other conference listeners. Each camera has a corresponding 813 (stereo) microphone that can also be negotiated down to mono by 814 removing the stereo payload type from the answer. The capability to 815 send and receive multiple SSRC in the same RTP session is explicitly 816 announced through use of RTP multi-stream signalling 817 [I-D.westerlund-avtcore-max-ssrc]. 818 v=0 819 o=bob 129384719 9834727 IN IP4 192.0.2.35 820 s=Simulcast Enabled Multi Stream Telepresence Client 821 t=0 0 822 c=IN IP4 192.0.2.35 823 b=AS:6035 824 a=group:SCS 2 3 4 825 m=audio 49200 RTP/AVP 96 97 9 8 826 b=AS:435 827 a=rtpmap:96 G719/48000/2 828 a=rtpmap:97 G719/48000 829 a=rtpmap:9 G722/8000 830 a=rtpmap:8 PCMA/8000 831 a=max-send-ssrc:* 3 832 a=max-recv-ssrc:* 3 833 a=ssrc:724847850 cname:bob@foo.example.com 834 a=ssrc:724847850 srcname:a1 835 a=ssrc:2847529901 cname:bob@foo.example.com 836 a=ssrc:2847529901 srcname:a2 837 a=ssrc:57289389 cname:bob@foo.example.com 838 a=ssrc:57289389 srcname:a3 839 a=mid:1 840 m=video 49300 RTP/AVP 96 841 b=AS:4500 842 a=rtpmap:96 H264/90000 843 a=fmtp:96 profile-level-id=42c01f 844 a=imageattr:* send [x=1280,y=720] recv [x=1280,y=720] 845 [x=640,y=360] [x=320,y=180] 846 a=max-send-ssrc:96 3 847 a=max-recv-ssrc:96 2 848 a=ssrc:75384768 cname:bob@foo.example.com 849 a=ssrc:75384768 srcname:v1 850 a=ssrc:2934825991 cname:bob@foo.example.com 851 a=ssrc:2934825991 srcname:v2 852 a=ssrc:3582594238 cname:bob@foo.example.com 853 a=ssrc:3582594238 srcname:v3 854 a=mid:2 855 a=content:main 856 m=video 49400 RTP/AVP 96 857 b=AS:1560 858 a=rtpmap:96 H264/90000 859 a=fmtp:96 profile-level-id=42c01e 860 a=imageattr:* send [x=640,y=360] 861 a=max-send-ssrc:96 3 862 a=ssrc:1371234978 cname:bob@foo.example.com 863 a=ssrc:1371234978 srcname:v1 864 a=ssrc:897234694 cname:bob@foo.example.com 865 a=ssrc:897234694 srcname:v2 866 a=ssrc:239263879 cname:bob@foo.example.com 867 a=ssrc:239263879 srcname:v3 868 a=mid:3 869 a=sendonly 870 m=video 49500 RTP/AVP 96 871 b=AS:420 872 a=rtpmap:96 H264/90000 873 a=fmtp:96 profile-level-id=42c00d 874 a=imageattr:96 send [x=320,y=180] 875 a=max-send-ssrc:96 3 876 a=ssrc:485723998 cname:bob@foo.example.com 877 a=ssrc:485723998 srcname:v1 878 a=ssrc:2345798212 cname:bob@foo.example.com 879 a=ssrc:2345798212 srcname:v2 880 a=ssrc:1295729848 cname:bob@foo.example.com 881 a=ssrc:1295729848 srcname:v3 882 a=mid:4 883 a=sendonly 884 m=video 49600 RTP/AVP 96 97 98 885 b=AS:2600 886 a=rtpmap:96 H264/90000 887 a=fmtp:96 profile-level-id=42c01f 888 a=imageattr:96 recv [x=1280,y=720] 889 a=rtpmap:97 H264/90000 890 a=fmtp:97 profile-level-id=42c01e 891 a=imageattr:97 recv [x=640,y=360] 892 a=rtpmap:98 H264/90000 893 a=fmtp:98 profile-level-id=42c00d 894 a=imageattr:98 recv [x=320,y=180] 895 a=max-recv-ssrc:96 1 896 a=max-recv-ssrc:97 4 897 a=max-recv-ssrc:98 16 898 a=max-recv-ssrc:* 16 899 a=mid:5 900 a=recvonly 901 a=content:alt 903 Figure 5: Bob Offer for a Multi-stream and Simulcast Telepresence 904 Conference 906 Bob has a three-camera, three-screen, simulcast-enabled client with 907 even higher performance than Alice's and can additionally support 908 720p video, as well as multiple receive streams of various 909 resolutions. The client implementor has thus decided to offer three 910 simulcast streams for each camera, indicated by the SCS grouping tag 911 and the three media IDs (2, 3, and 4) in the SDP. 913 The first video media line with media ID 2 indicates the ability to 914 send video from three simultaneous video sources (cameras) through 915 the max-send-ssrc attribute with value 3. This media line is also 916 marked as the main video by using the content attribute from 917 [RFC4796]. Also the receive direction has declared ability to handle 918 multiple video sources, and in this example it is 2. The 919 interpretation of content:main for those two streams in the receive 920 direction is that the client expects and can present (in prime 921 position) at most two main (active speaker) video streams from 922 another multi-camera client. 924 The second and third video media lines with media ID 3 and 4 are the 925 sendonly simulcast streams. Through the grouping, they can 926 implicitly be interpreted as also being content:main for the send 927 direction, but is not marked as such since multiple media blocks with 928 content:main could be confusing for a legacy client. 930 The fourth video media line with media ID 5 is recvonly and is marked 931 with content:alt. That media line should, as was intended for that 932 content attribute value, receive alternative content to the main 933 speaker, such as "audience". In a multi-party conference, that could 934 for example be the next-to-most-active and/or non-active speakers. 935 The SDP describes that those streams can be presented in a set of 936 different resolutions, indicated through the different payload types. 937 The maximum number of streams per payload type is indicated through 938 the max-recv-ssrc attribute. In this example, at most one stream can 939 have payload type 96, preferably 720p, as indicated by the related 940 imageattr line. Similarly, at most 4 streams can have payload type 941 97, preferably using 360p resolution, and at most 16 streams can have 942 payload type 98, preferably of 180p resolution. In any case, there 943 must never be more than 16 simultaneous streams of any payload type, 944 but combinations of payload types may occur, such as for example two 945 streams using payload type 97 and 8 streams using payload type 98. 947 The answer from a simulcast-enabled RTP mixer to this last SDP could 948 look like: 950 v=0 951 o=server 238947290 239573929 IN IP4 192.0.2.2 952 s=Multi stream and Simulcast Telepresence Bob Answer 953 c=IN IP4 192.0.2.43 954 b=AS:7065 955 a=group:SCR 2 3 4 956 m=audio 49200 RTP/AVP 96 957 b=AS:435 958 a=rtpmap:96 G719/48000/2 959 a=max-send-ssrc:96 3 960 a=max-recv-ssrc:96 3 961 a=ssrc:4111848278 cname:server@conf1.example.com 962 a=ssrc:4111848278 srcname:r1 963 a=ssrc:835978294 cname:server@conf1.example.com 964 a=ssrc:835978294 srcname:r2 965 a=ssrc:2938491278 cname:server@conf1.example.com 966 a=ssrc:2938491278 srcname:r3 967 a=mid:1 968 m=video 49300 RTP/AVP 96 969 b=AS:4650 970 a=rtpmap:96 H264/90000 971 a=fmtp:96 profile-level-id=42c01f 972 a=imageattr:* send [x=1280,y=720] [x=640,y=360] [x=320,y=180] 973 recv [x=1280,y=720] 974 a=max-recv-ssrc:96 3 975 a=max-send-ssrc:96 2 976 a=ssrc:2938746293 cname:server@conf1.example.com 977 a=ssrc:2938746293 srcname:t1 978 a=ssrc:1207102398 cname:server@conf1.example.com 979 a=ssrc:1207102398 srcname:t2 980 a=mid:2 981 a=content:main 982 m=video 49400 RTP/AVP 96 983 b=AS:1560 984 a=rtpmap:96 H264/90000 985 a=fmtp:96 profile-level-id=42c01e 986 a=imageattr:* recv [x=640,y=360] 987 a=max-recv-ssrc:96 3 988 a=mid:3 989 a=recvonly 990 m=video 49500 RTP/AVP 96 991 b=AS:420 992 a=rtpmap:96 H264/90000 993 a=fmtp:96 profile-level-id=42c00d 994 a=imageattr:96 recv [x=320,y=180] 995 a=max-recv-ssrc:96 3 996 a=mid:4 997 a=recvonly 998 m=video 49600 RTP/AVP 96 97 98 999 b=AS:2600 1000 a=rtpmap:96 H264/90000 1001 a=fmtp:96 profile-level-id=42c01f 1002 a=imageattr:96 send [x=1280,y=720] 1003 a=rtpmap:97 H264/90000 1004 a=fmtp:97 profile-level-id=42c01e 1005 a=imageattr:97 send [x=640,y=360] 1006 a=rtpmap:98 H264/90000 1007 a=fmtp:98 profile-level-id=42c00d 1008 a=imageattr:98 send [x=320,y=180] 1009 a=max-send-ssrc:96 1 1010 a=max-send-ssrc:97 4 1011 a=max-send-ssrc:98 8 1012 a=max-send-ssrc:* 8 1013 a=ssrc:2981523948 cname:server@conf1.example.com 1014 a=ssrc:2938237 cname:server@conf1.example.com 1015 a=ssrc:1230495879 cname:server@conf1.example.com 1016 a=ssrc:74835983 cname:server@conf1.example.com 1017 a=ssrc:3928594835 cname:server@conf1.example.com 1018 a=ssrc:948753 cname:server@conf1.example.com 1019 a=ssrc:1293456934 cname:server@conf1.example.com 1020 a=ssrc:4134923746 cname:server@conf1.example.com 1021 a=mid:5 1022 a=sendonly 1023 a=content:alt 1025 Figure 6: Server Answer for Bob Multi-stream and Simulcast 1026 Telepresence Conference 1028 In this SDP answer, the grouping tag is changed to SCR, confirming 1029 that the sent simulcast streams will be received. The directionality 1030 of the streams themselves as well as the directionality of multi- 1031 stream and bandwidth attributes are changed. The number of allowed 1032 streams in the content:alt video session has been reduced from 16 to 1033 8 in the answer. 1035 7.3. Fred: Dial-out to Legacy Client 1037 Fred has a simple legacy client that know nothing of the new 1038 signaling means discussed in this document. In this example, the 1039 multi-stream and simulcast aware RTP mixer is calling out to Fred. 1040 Even though it is never actually sent, this would be Fred's offer 1041 SDP, should he have called in. It is included here to improve the 1042 reader's understanding of Fred's response to the conference SDP. 1044 v=0 1045 o=fred 82342187 237429834 IN IP4 192.0.2.213 1046 s=Legacy Client 1047 t=0 0 1048 c=IN IP4 192.0.2.213 1049 m=audio 50132 RTP/AVP 9 8 1050 a=rtpmap:9 G722/8000 1051 a=rtpmap:8 PCMA/8000 1052 m=video 50134 RTP/AVP 96 97 1053 b=AS:405 1054 a=rtpmap:96 H264/90000 1055 a=fmtp:96 profile-level-id=42c00c 1056 a=rtpmap:97 H263-2000/90000 1057 a=fmtp:97 profile=0;level=30 1059 Figure 7: Legacy Client Hypothetical Offer 1061 Fred would offer a single mono audio and a single video, each with a 1062 couple of different codec alternatives. 1064 The same conference server as in the previous example is calling out 1065 to Fred, offering the full set of multi-stream and simulcast features 1066 based on what the server itself can support. 1068 v=0 1069 o=server 323439283 2384192332 IN IP4 192.0.2.2 1070 s=Multi stream and Simulcast Dial-out Offer 1071 c=IN IP4 192.0.2.43 1072 b=AS:7065 1073 a=group:SCR 2 3 4 1074 m=audio 49200 RTP/AVP 96 97 9 8 1075 b=AS:435 1076 a=rtpmap:96 G719/48000/2 1077 a=rtpmap:97 G719/48000 1078 a=rtpmap:9 G722/8000 1079 a=rtpmap:8 PCMA/8000 1080 a=max-send-ssrc:* 4 1081 a=max-recv-ssrc:* 3 1082 a=ssrc:3293472833 cname:server@conf1.example.com 1083 a=ssrc:3293472833 srcname:q9 1084 a=ssrc:1734728348 cname:server@conf1.example.com 1085 a=ssrc:1734728348 srcname:Gr 1086 a=ssrc:1054453769 cname:server@conf1.example.com 1087 a=ssrc:1054453769 srcname:SO 1088 a=ssrc:3923447729 cname:server@conf1.example.com 1089 a=ssrc:3923447729 srcname:AJ 1090 a=mid:1 1091 m=video 49300 RTP/AVP 96 1092 b=AS:4650 1093 a=rtpmap:96 H264/90000 1094 a=fmtp:96 profile-level-id=42c01f 1095 a=imageattr:* send [x=1280,y=720] [x=640,y=360] [x=320,y=180] 1096 recv [x=1280,y=720] 1097 a=max-recv-ssrc:96 3 1098 a=max-send-ssrc:96 3 1099 a=ssrc:78456398 cname:server@conf1.example.com 1100 a=ssrc:78456398 srcname:bj 1101 a=ssrc:3284726348 cname:server@conf1.example.com 1102 a=ssrc:3284726348 srcname:ON 1103 a=ssrc:2394871293 cname:server@conf1.example.com 1104 a=ssrc:2394871293 srcname:ya 1105 a=mid:2 1106 a=content:main 1107 m=video 49400 RTP/AVP 96 1108 b=AS:1560 1109 a=rtpmap:96 H264/90000 1110 a=fmtp:96 profile-level-id=42c01e 1111 a=imageattr:* recv [x=640,y=360] 1112 a=max-recv-ssrc:96 3 1113 a=mid:3 1114 a=recvonly 1115 m=video 49500 RTP/AVP 96 1116 b=AS:420 1117 a=rtpmap:96 H264/90000 1118 a=fmtp:96 profile-level-id=42c00d 1119 a=imageattr:96 recv [x=320,y=180] 1120 a=max-recv-ssrc:96 3 1121 a=mid:4 1122 a=recvonly 1123 m=video 49600 RTP/AVP 96 97 98 1124 b=AS:2600 1125 a=rtpmap:96 H264/90000 1126 a=fmtp:96 profile-level-id=42c01f 1127 a=imageattr:96 send [x=1280,y=720] 1128 a=rtpmap:97 H264/90000 1129 a=fmtp:97 profile-level-id=42c01e 1130 a=imageattr:97 send [x=640,y=360] 1131 a=rtpmap:98 H264/90000 1132 a=fmtp:98 profile-level-id=42c00d 1133 a=imageattr:98 send [x=320,y=180] 1134 a=max-send-ssrc:96 1 1135 a=max-send-ssrc:97 4 1136 a=max-send-ssrc:98 8 1137 a=max-send-ssrc:* 8 1138 a=ssrc:2342872394 cname:server@conf1.example.com 1139 a=ssrc:1283741823 cname:server@conf1.example.com 1140 a=ssrc:3294823947 cname:server@conf1.example.com 1141 a=ssrc:1020408838 cname:server@conf1.example.com 1142 a=ssrc:1999343791 cname:server@conf1.example.com 1143 a=ssrc:2934192349 cname:server@conf1.example.com 1144 a=ssrc:2234347728 cname:server@conf1.example.com 1145 a=ssrc:3224283479 cname:server@conf1.example.com 1146 a=mid:5 1147 a=sendonly 1148 a=content:alt 1150 Figure 8: Server Dial-out Offer with Multi-stream and Simulcast 1152 The answer from Fred to this offer would look like: 1154 v=0 1155 o=fred 9842793823 239482793 IN IP4 192.0.2.213 1156 s=Legacy Client Answer to Server Dial-out 1157 t=0 0 1158 c=IN IP4 192.0.2.213 1159 m=audio 50132 RTP/AVP 9 1160 b=AS:80 1161 a=rtpmap:9 G722/8000 1162 m=video 50134 RTP/AVP 96 1163 b=AS:405 1164 a=rtpmap:96 H264/90000 1165 a=fmtp:96 profile-level-id=42c00c 1166 m=video 0 RTP/AVP 96 1167 m=video 0 RTP/AVP 96 1168 m=video 0 RTP/AVP 96 1170 Figure 9: Legacy Client Answer to Server Dial-out 1172 as can be seen from the hypothetical offer, Fred does not understand 1173 any of the multistream or simulcast attributes, and does also not 1174 understand the grouping framework. Thus, all those lines are removed 1175 from the answer SDP and any surplus video media blocks except for the 1176 first are rejected. The media bandwidth are adjusted down to what 1177 Fred actually accepts to receive. 1179 7.4. Joe: Dial-out to Desktop Client 1181 This example is almost identical to the one above, with the 1182 difference that the answering end-point has some limited simulcast 1183 and multi-stream capability. As above, this is the offer SDP that 1184 Joe would have used, should he have called in. 1186 v=0 1187 o=joe 82342187 237429834 IN IP4 192.0.2.117 1188 s=Simulcast and Multistream enabled Desktop Client 1189 t=0 0 1190 c=IN IP4 192.0.2.117 1191 b=AS:985 1192 a=group:SCS 2 3 1193 m=audio 49200 RTP/AVP 96 97 9 8 1194 b=AS:145 1195 a=rtpmap:96 G719/48000/2 1196 a=rtpmap:97 G719/48000 1197 a=rtpmap:9 G722/8000 1198 a=rtpmap:8 PCMA/8000 1199 a=ssrc:1223883729 cname:joe@foo.example.com 1200 a=ssrc:1223883729 srcname:jV 1201 a=mid:1 1202 m=video 49300 RTP/AVP 96 1203 b=AS:520 1204 a=rtpmap:96 H264/90000 1205 a=fmtp:96 profile-level-id=42c01e 1206 a=imageattr:96 send [x=640,y=360] recv [x=640,y=360] [x=320,y=180] 1207 a=ssrc:3842394823 cname:joe@foo.example.com 1208 a=ssrc:3842394823 srcname:BD 1209 a=mid:2 1210 a=content:main 1211 m=video 49400 RTP/AVP 96 1212 b=AS:160 1213 a=rtpmap:96 H264/90000 1214 a=fmtp:96 profile-level-id=42c00d 1215 a=imageattr:96 send [x=320,y=180] 1216 a=ssrc:1214232284 cname:joe@foo.example.com 1217 a=ssrc:1214232284 srcname:BD 1218 a=mid:3 1219 a=sendonly 1220 m=video 49300 RTP/AVP 96 1221 b=AS:320 1222 a=rtpmap:96 H264/90000 1223 a=fmtp:96 profile-level-id=42c00c 1224 a=imageattr:96 recv [x=320,y=180] 1225 a=max-recv-ssrc:* 2 1226 a=mid:4 1227 a=recvonly 1228 a=content:alt 1230 Figure 10: Desktop Client Hypothetical Offer 1232 Joe would send two versions of simulcast, 360p and 180p, from a 1233 single camera and can receive three sources of multi-stream, one 360p 1234 and two 180p streams. 1236 Again, the same conference server is calling out to Joe and the offer 1237 SDP from the server would be almost identical to the one in the 1238 previous example. It is therefore not included here. The response 1239 from Joe would look like: 1241 v=0 1242 o=joe 239482639 4702341992 IN IP4 192.0.2.117 1243 s=Answer from Desktop Client to Server Dial-out 1244 t=0 0 1245 c=IN IP4 192.0.2.117 1246 b=AS:985 1247 a=group:SCS 2 3 1248 m=audio 49200 RTP/AVP 96 1249 b=AS:145 1250 a=rtpmap:96 G719/48000/2 1251 a=ssrc:1223883729 cname:joe@foo.example.com 1252 a=ssrc:1223883729 srcname:iJ 1253 a=mid:1 1254 m=video 49300 RTP/AVP 96 1255 b=AS:520 1256 a=rtpmap:96 H264/90000 1257 a=fmtp:96 profile-level-id=42c01e 1258 a=imageattr:96 send [x=640,y=360] recv [x=640,y=360] [x=320,y=180] 1259 a=ssrc:3842394823 cname:joe@foo.example.com 1260 a=ssrc:3842394823 srcname:YD 1261 a=mid:2 1262 a=content:main 1263 m=video 0 RTP/AVP 96 1264 a=mid:3 1265 m=video 49400 RTP/AVP 96 1266 b=AS:160 1267 a=rtpmap:96 H264/90000 1268 a=fmtp:96 profile-level-id=42c00d 1269 a=imageattr:96 send [x=320,y=180] 1270 a=ssrc:1214232284 cname:joe@foo.example.com 1271 a=ssrc:1214232284 srcname:YD 1272 a=mid:4 1273 a=sendonly 1274 m=video 49300 RTP/AVP 96 1275 b=AS:320 1276 a=rtpmap:96 H264/90000 1277 a=fmtp:96 profile-level-id=42c00c 1278 a=imageattr:96 recv [x=320,y=180] 1279 a=max-recv-ssrc:* 2 1280 a=mid:5 1281 a=recvonly 1282 a=content:alt 1284 Figure 11: Desktop Client Answer to Server Dial-out 1286 Since the RTP mixer supports all of the features that Joe does and 1287 more, the SDP does not differ much from what it should have been in 1288 an offer. It can be noted that as stated in [RFC5888], all media 1289 lines need mid attributes, even the rejected ones, which is why mid:3 1290 is present even though the mid quality simulcast version offered by 1291 the mixer is rejected by Joe. 1293 8. IANA Considerations 1295 This document requests that two new SDP grouping semantics, SCS and 1296 SCR, are registered. 1298 Formal registrations to be written. 1300 9. Security Considerations 1302 The Simulcast grouping semantics are vulnerable to attacks in the 1303 signalling. 1305 A false grouping of non-simulcast streams as simulcast would risk 1306 that some streams are incorrectly ignored by receivers that know 1307 simulcast and that are uninterested in the assumed simulcast streams. 1309 A hostile removal of simulcast grouping will prevent streams from 1310 being interpreted as simulcast, which obviously prevents use of the 1311 simulcast functionality. It will also risk that intended simulcast 1312 streams are instead presented as separate, independent streams to a 1313 receiver. 1315 Neither of the above will likely have any major consequences and can 1316 be mitigated by signaling that is at least integrity and source 1317 authenticated to prevent an attacker to change it. 1319 10. Acknowledgements 1321 11. References 1323 11.1. Normative References 1325 [I-D.westerlund-avtcore-max-ssrc] 1326 Westerlund, M., Burman, B., and F. Jansson, "Multiple 1327 Synchronization sources (SSRC) in RTP Session Signaling", 1328 draft-westerlund-avtcore-max-ssrc-02 (work in progress), 1329 July 2012. 1331 [I-D.westerlund-avtext-rtcp-sdes-srcname] 1332 Westerlund, M., Burman, B., and P. Sandgren, "RTCP SDES 1333 Item SRCNAME to Label Individual Sources", 1334 draft-westerlund-avtext-rtcp-sdes-srcname-01 (work in 1335 progress), July 2012. 1337 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1338 Requirement Levels", BCP 14, RFC 2119, March 1997. 1340 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1341 Jacobson, "RTP: A Transport Protocol for Real-Time 1342 Applications", STD 64, RFC 3550, July 2003. 1344 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 1345 Description Protocol", RFC 4566, July 2006. 1347 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 1348 Media Attributes in the Session Description Protocol 1349 (SDP)", RFC 5576, June 2009. 1351 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 1352 Protocol (SDP) Grouping Framework", RFC 5888, June 2010. 1354 11.2. Informative References 1356 [I-D.westerlund-avtcore-multiplex-architecture] 1357 Westerlund, M., Burman, B., Perkins, C., and H. 1358 Alvestrand, "Guidelines for using the Multiplexing 1359 Features of RTP", 1360 draft-westerlund-avtcore-multiplex-architecture-02 (work 1361 in progress), July 2012. 1363 [I-D.westerlund-avtcore-transport-multiplexing] 1364 Westerlund, M. and C. Perkins, "Multiple RTP Sessions on a 1365 Single Lower-Layer Transport", 1366 draft-westerlund-avtcore-transport-multiplexing-03 (work 1367 in progress), July 2012. 1369 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 1370 with Session Description Protocol (SDP)", RFC 3264, 1371 June 2002. 1373 [RFC3569] Bhattacharyya, S., "An Overview of Source-Specific 1374 Multicast (SSM)", RFC 3569, July 2003. 1376 [RFC4588] Rey, J., Leon, D., Miyazaki, A., Varsa, V., and R. 1377 Hakenberg, "RTP Retransmission Payload Format", RFC 4588, 1378 July 2006. 1380 [RFC4796] Hautakorpi, J. and G. Camarillo, "The Session Description 1381 Protocol (SDP) Content Attribute", RFC 4796, 1382 February 2007. 1384 [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, 1385 January 2008. 1387 [RFC6190] Wenger, S., Wang, Y., Schierl, T., and A. Eleftheriadis, 1388 "RTP Payload Format for Scalable Video Coding", RFC 6190, 1389 May 2011. 1391 Authors' Addresses 1393 Magnus Westerlund 1394 Ericsson 1395 Farogatan 6 1396 SE-164 80 Kista 1397 Sweden 1399 Phone: +46 10 714 82 87 1400 Email: magnus.westerlund@ericsson.com 1402 Bo Burman 1403 Ericsson 1404 Farogatan 6 1405 SE-164 80 Kista 1406 Sweden 1408 Phone: +46 10 714 13 11 1409 Email: bo.burman@ericsson.com 1411 Morgan Lindqvist 1412 Ericsson 1413 Farogatan 6 1414 Kista, SE-164 80 1415 Sweden 1417 Phone: +46 10 719 00 00 1418 Fax: 1419 Email: morgan.lindqvist@ericsson.com 1420 URI: 1422 Fredrik Jansson 1423 Ericsson 1424 Farogatan 6 1425 Kista, SE-164 80 1426 Sweden 1428 Phone: +46 10 719 00 00 1429 Fax: 1430 Email: fredrik.k.jansson@ericsson.com 1431 URI: