idnits 2.17.1 draft-westerlund-clue-multistream-conference-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 3, 2012) is 4428 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-25) exists of draft-ietf-clue-framework-02 == Outdated reference: A later version (-04) exists of draft-lennox-clue-rtp-usage-01 == Outdated reference: A later version (-02) exists of draft-westerlund-avtcore-max-ssrc-00 == Outdated reference: A later version (-04) exists of draft-westerlund-avtcore-rtp-simulcast-00 == Outdated reference: A later version (-07) exists of draft-westerlund-avtcore-transport-multiplexing-01 == Outdated reference: A later version (-03) exists of draft-westerlund-avtext-rtcp-sdes-srcname-00 == Outdated reference: A later version (-05) exists of draft-westerlund-avtext-rtp-stream-pause-00 == Outdated reference: A later version (-02) exists of draft-westerlund-mmusic-sdp-bw-attribute-00 -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 0 errors (**), 0 flaws (~~), 9 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Westerlund 3 Internet-Draft B. Burman 4 Intended status: Informational Ericsson 5 Expires: August 6, 2012 February 3, 2012 7 Multi-Stream Media Conferencing 8 draft-westerlund-clue-multistream-conference-00 10 Abstract 12 This memo describes a multimedia multi-party conferencing 13 architecture based on use of multiple Real-Time Transport Protocol 14 (RTP) streams. 16 Status of this Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF). Note that other groups may also distribute 23 working documents as Internet-Drafts. The list of current Internet- 24 Drafts is at http://datatracker.ietf.org/drafts/current/. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 This Internet-Draft will expire on August 6, 2012. 33 Copyright Notice 35 Copyright (c) 2012 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 51 2. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 3. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 4 53 3.1. Point to Point . . . . . . . . . . . . . . . . . . . . . . 5 54 3.2. RTP Mixer . . . . . . . . . . . . . . . . . . . . . . . . 5 55 3.2.1. Incompatible Codecs . . . . . . . . . . . . . . . . . 5 56 3.2.2. Low Quality End-Point . . . . . . . . . . . . . . . . 6 57 3.2.3. Medium Quality End-Point . . . . . . . . . . . . . . . 7 58 3.2.4. Single Channel High Quality End-Point . . . . . . . . 9 59 3.2.5. Dual Channel High Quality End-Point . . . . . . . . . 10 60 3.2.6. Mixer Source and Sink Selection . . . . . . . . . . . 11 61 3.2.7. Media Composition . . . . . . . . . . . . . . . . . . 12 62 3.3. Multicast . . . . . . . . . . . . . . . . . . . . . . . . 14 63 4. RTP Usage . . . . . . . . . . . . . . . . . . . . . . . . . . 15 64 4.1. Use of SSRC and CSRC . . . . . . . . . . . . . . . . . . . 15 65 4.2. Signaling Extensions . . . . . . . . . . . . . . . . . . . 16 66 4.3. Optimizations . . . . . . . . . . . . . . . . . . . . . . 17 67 5. Security Considerations . . . . . . . . . . . . . . . . . . . 18 68 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 69 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 70 8. Informative References . . . . . . . . . . . . . . . . . . . . 18 71 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 73 1. Introduction 75 Multimedia multi-party conferencing is being improved through the use 76 of multiple media streams per media type. At the same time, the 77 number of different types of end-points that are capable of 78 participating in a multimedia conference increases, and so does the 79 number of access types that end-points may use to connect. This memo 80 describes a number of use cases relevant to that scenario. The use 81 cases aims to use as high media quality as possible, while making as 82 efficient use of available resources as possible and accommodating a 83 high degree of end-point and network diversity. 85 2. Requirements 87 The use cases in this memo should handle diversity in end-point 88 capability such as media quality, processing power, and network 89 interface. The perceived end-user media quality is in turn impacted 90 by media stream bitrate and also number of streams per media type, as 91 well as end-to-end delay, which should also be taken into account. 92 Diversity in network capacity, network quality, and user preferences 93 based on location, device, etc, should also be handled. These 94 properties should be expected to change between conference sessions 95 and even within the same session. 97 Different types of conference Topologies must be supported, including 98 centralized, multicast, and point-to-point. It should also be 99 possible to use cascaded, centralized conferences as well as 100 combinations of unicast and multicast. The relevant RTP base 101 topologies are described in [RFC5117]. 103 Use cases including an RTP Mixer should avoid adding delay or 104 reducing quality by forwarding streams as unmodified as possible, 105 with reduced processing requirements in the RTP Mixer as added 106 advantage. 108 The conference use cases should strive for continuous presence, 109 presenting media from as many participants as reasonably possible. 110 At the same time, it should strive to use as high quality media as 111 possible from each participant. The bandwidth used for the 112 conference should at the same time be as low as possible. Those 113 three requirements are in conflict and there is no generally 114 accepted, optimum trade-off. 116 The number of participants in the conference shall assumed to be 117 unlimited, and it may thus not be possible to create a continuous 118 presence experience with media from all participants being presented 119 to all other participants, and only media from a limited sub-set of 120 participants may be presented to each individual participant. This 121 is seen as the hardest conferencing use case and most other use cases 122 should also be covered by that use case. How many other 123 participants' media that are presented simultaneously on a certain 124 end-point is allowed to vary and may depend on a number of 125 preconditions. 127 3. Use Cases 129 The sub-sections below describe multi-stream conferencing use cases 130 relevant to each RTP base topology. Each use case is constructed as 131 to cover as many requirements as possible and must consequently 132 include a number of different end-point types. 134 In the figures below, the involved end-points are for clarity drawn 135 as only sending or only receiving, but may in a real scenario of 136 course have the possibility to both send and receive. 138 The common figure legends for the end-point capability categories 139 used in all use cases are: 141 D: Dual channel, high quality, for example conference room client. 142 Streams in figures are denoted by '||' or '='. The term 'channel' 143 is chosen as a generic term and may apply to any media. A channel 144 is related to a single media source in the sender and a media sink 145 in the receiver. For video media, the channel source is typically 146 a camera and the sink a screen or window. For audio, the channel 147 source is typically a microphone and the sink a loudspeaker. A 148 dual channel sending end-point thus has two independent media 149 sources that can be sent simultaneously to a receiving end-point. 150 Nothing prevents those dual channels from using other qualities 151 than 'high', but that is chosen as an example in this memo to 152 simplify the description. 154 H: High quality, single channel, for example meeting room client. 155 Streams in figures are denoted by '|' or '-'. 157 M: Medium quality, single channel, for example desktop client. 158 Streams in figures are denoted by ':' or '..'. 160 L: Low quality, single channel, for example mobile client. Streams 161 in figures are denoted by '''. 163 It is of course possible to divide end-points into more categories, 164 but the chosen ones should make it possible to highlight most of the 165 relevant topics. 167 Note also that the use cases are kept as separated and clean-cut as 168 possible to simplify the description. Most use cases, especially the 169 RTP Mixer ones (Section 3.2), could be combined into larger, more 170 complex scenarios. 172 3.1. Point to Point 174 The point to point case is close to trivial, since it is assumed that 175 capability exchange during setup will be able to negotiate the best 176 quality between the two end-points. When the number of channels 177 differ between sender and receiver, the channel aspects of 178 Section 3.2.5 apply. 180 3.2. RTP Mixer 182 Each link between the RTP Mixer and an end-point is in principle a 183 Point to Point connection (Section 3.1) as described above. One 184 major difference from a Point to Point Connection is that the RTP 185 Mixer should represent and act according to some combination of the 186 wishes and needs of multiple end-points on the other side of the 187 Mixer. That may include handling of conflicting or partly 188 conflicting requirements, and the way to resolve those is not 189 generally defined but will typically depend on RTP Mixer design, 190 configuration, applied policies, or some combination. 192 3.2.1. Incompatible Codecs 194 This use case is the only feasible one when end-points have 195 incompatible codecs, and transcoding is in that case always necessary 196 and cannot be avoided. The incompatibility is not necessarily only 197 due to different codec types, but may also be caused by limited codec 198 capacity, or limitations in media stream transport between the end- 199 points. This use case is trivial in the sense that the Mixer is 200 assumed to always be capable of creating a dedicated media mix to 201 each receiving end-point. It may be used as an overall strategy, or 202 as part of other use cases. 204 A special case of transcoding is when the sender is configured to use 205 scalable encoding and receivers do not support scalability. 206 Transcoding of scalable streams to non-scalable streams is often a 207 less complex operation than transcoding in general. 209 +---------+ 210 | Codec 1 | 211 +---------+ 212 | 213 v 214 +-------+ 215 | Mixer | 216 +-------+ 217 / \ Trans- 218 / \ coded 219 v v stream 220 +---------+ +---------+ 221 | Codec 1 | | Codec 2 | 222 +---------+ +---------+ 224 Figure 1: Transcoding Incompatible Codecs 226 3.2.2. Low Quality End-Point 228 A low quality end-point has per definition the lowest media quality 229 in the conference, and as a sender it is assumed that all other end- 230 points can receive and present the media without restrictions. Some 231 of the more capable end-points will have to choose how to present the 232 received media in the best way, but it can always be presented. 234 As a receiver, a low quality end-point is only capable of receiving 235 streams from other low quality end-points (without transcoding). 237 It is conceivable that a receiver of a certain quality category (not 238 only low quality) can receive higher quality streams and reduce the 239 quality locally such that it is feasible for presentation, but that 240 will in general waste both bandwidth and processing resources. 242 +---+ 243 | L | 244 +---+ 245 ' 246 v 247 +-------+ +---+---+ 248 | Mixer |'''>| D | 249 +-------+ +---+---+ 250 ' ' ' 251 ' ' ' 252 v v v 253 +---+ +---+ +---+ 254 | L | | M | | H | 255 +---+ +---+ +---+ 257 Figure 2: Low Quality Stream 259 3.2.3. Medium Quality End-Point 261 Similar to above, the medium quality sender media stream is assumed 262 to be possible to receive without restrictions in all but the low 263 quality end-point. For the medium quality media stream to reach also 264 the low quality end-point, there are three options that are described 265 in the sub-sections below. 267 When receiving, a medium quality end-point is capable of receiving 268 other medium quality streams, as well as low quality streams (without 269 transcoding). 271 3.2.3.1. Transcoding 273 Transcode the stream when sent towards low quality receivers, as 274 described above (Section 3.2.1). This could sometimes be feasible 275 quality-wise, especially if the quality difference between the medium 276 quality and the low quality streams are large, making the reduced 277 medium quality stream be relatively close quality-wise to un-encoded 278 low quality media. End-to-end delay will however always suffer. 280 +---+ 281 | M | 282 +---+ 283 : 284 v 285 +-------+ +---+---+ 286 | Mixer |...>| D | 287 +-------+ +---+---+ 288 ' : : 289 T' : : 290 v v v 291 +---+ +---+ +---+ 292 | L | | M | | H | 293 +---+ +---+ +---+ 294 T: Transcoded stream 296 Figure 3: Medium Quality Stream Transcoding 298 3.2.3.2. Simulcast 300 Encode both a medium quality and a low quality stream from the same 301 un-encoded source data and simulcast them. The RTP Mixer can, 302 without having to transcode, forward the low quality stream towards 303 the low quality end-points, and forward the medium quality stream 304 towards all other end-points. 306 +---+ 307 | M | 308 +---+ 309 ' : 310 v v 311 +-------+ +---+---+ 312 | Mixer |...>| D | 313 +-------+ +---+---+ 314 ' : : 315 ' : : 316 v v v 317 +---+ +---+ +---+ 318 | L | | M | | H | 319 +---+ +---+ +---+ 321 Figure 4: Medium Quality Stream Simulcast 323 3.2.3.3. Scalable Coding 325 As a variant of simulcast, if it is possible to use a scalable codec, 326 create a scalable stream with one low quality sub-stream and one sub- 327 stream that together with the low quality sub-stream can reconstruct 328 a medium quality stream. Similar but not identical to the above, the 329 RTP Mixer can, without having to transcode, forward the low quality 330 sub-stream towards the low quality end-points, and forward both the 331 low quality and the medium quality sub-streams (jointly describing a 332 medium quality stream) to all other end-points. 334 +---+ 335 | M | 336 +---+ 337 ': 338 vv 339 +-------+ +---+---+ 340 | Mixer |...>| D | 341 +-------+'''>+---+---+ 342 ' ': ': 343 ' ': ': 344 v vv vv 345 +---+ +---+ +---+ 346 | L | | M | | H | 347 +---+ +---+ +---+ 349 Figure 5: Medium Quality Scalable Stream 351 3.2.4. Single Channel High Quality End-Point 353 This use case is very similar to the medium quality end-point case 354 (Section 3.2.3) above. The difference is that there are now two 355 different (sample) categories of end-points that cannot receive the 356 high quality stream instead of one category. Simulcast and scalable 357 streams must thus be extended to three versions or three sub-streams, 358 respectively. 360 As a receiver, all streams from other end-points can be received. 361 The only exception is when multiple streams from a single end-point 362 are used, such as from D. 364 3.2.4.1. Transcoding 366 Similar to Medium Quality (Section 3.2.3.1), just that the 367 transcoding needs to produce two different qualities instead of one. 369 +---+ 370 | H | 371 +---+ 372 | 373 v 374 +-------+ +---+---+ 375 | Mixer |--->| D | 376 +-------+ +---+---+ 377 ' : | 378 T' T: | 379 v v v 380 +---+ +---+ +---+ 381 | L | | M | | H | 382 +---+ +---+ +---+ 383 T: Transcoded stream 385 Figure 6: High Quality Stream Transcoding 387 3.2.4.2. Simulcast 389 Similar to Medium Quality (Section 3.2.3.2), just that three instead 390 of two simulcast streams need to be sent. 392 +---+ 393 | H | 394 +---+ 395 ' : | 396 v v v 397 +-------+ +---+---+ 398 | Mixer |--->| D | 399 +-------+ +---+---+ 400 ' : | 401 ' : | 402 v v v 403 +---+ +---+ +---+ 404 | L | | M | | H | 405 +---+ +---+ +---+ 407 Figure 7: High Quality Stream Simulcast 409 3.2.4.3. Scalable Coding 411 Similar to Medium Quality (Section 3.2.3.3), just that three instead 412 of two scalable layers are used. 414 +---+ 415 | H | 416 +---+ 417 ':| 418 vvv 419 +-------+--->+---+---+ 420 | Mixer |...>| D | 421 +-------+'''>+---+---+ 422 ' ': ':| 423 ' ': ':| 424 v vv vvv 425 +---+ +---+ +---+ 426 | L | | M | | H | 427 +---+ +---+ +---+ 429 Figure 8: High Quality Scalable Stream 431 3.2.5. Dual Channel High Quality End-Point 433 Again, this use case is very similar to the one above 434 (Section 3.2.4). The major difference is that this end-point is 435 capable of sending and receiving dual, high quality streams where 436 each stream has to be treated in a similar way to the previous 437 section. 439 When using multiple inter-related media, such as video with 440 corresponding audio, those media streams need not only be 441 synchronized time-wise, just as for single channel end-points, but 442 their spatial relation need also be established. For example, a left 443 camera with an attached microphone and a right camera with an 444 attached microphone. In general it is likely always desirable to be 445 able to relate streams from a multi-channel end-point in a defined 446 way, representing related sub-parts of a larger scene, both intra- 447 media and inter-media. Description and signaling of stream relations 448 is a complex problem in itself, which is currently work in progress 449 in CLUE WG [I-D.ietf-clue-framework] and will not be elaborated 450 further in this memo. 452 Another major, additional, aspect to account for is that the RTP 453 Mixer needs to choose how to map dual (or multiple) streams onto a 454 single stream, when forwarding towards end-points that has fewer 455 receive channels than the sender. This problem is similar to 456 choosing a limited set of participants from a potentially unlimited 457 set, which is described below (Section 3.2.6). 459 A dual-channel (or multi-channel) receiving end-point that is 460 receiving fewer simultaneous channel streams from a sending end-point 461 than the maximum possible this end-point can handle, will have to 462 decide which one(s) of the available receive channels should be used 463 for each received stream. This decision can also be made by the RTP 464 Mixer, if it knows the concept of multi-channel clients, has 465 information about how many simultaneous channels the individual 466 receiver supports, and knows how those channels should be related. 468 There may of course be end-points that have capability for more than 469 two simultaneous channels. It is also possible to envision end- 470 points where the number of receive channels differ from the number of 471 send channels. 473 3.2.6. Mixer Source and Sink Selection 475 When a Mixer cannot forward all available streams to each client, it 476 has to choose a small set out of a potentially very large set of 477 streams in the conference. Multiple strategies are possible for that 478 choice. The Mixer may use different strategies towards different 479 receivers, depending on for example their capabilities or 480 preferences. 482 Another dimension of selection exist when the conference contains 483 end-points with different number of channels (multiple media streams 484 of the same media type). On the Mixer receive side, it may be 485 necessary to select a few streams from a multi-stream media. On the 486 Mixer send side, it may be necessary to select to which channel in a 487 multi-stream capable receiver a certain stream should be sent. 489 The choice of source may be either algorithmic (pre-configured) or 490 manual (user controlled from one or more end-points). Speech 491 activity is a commonly used algorithmic measure to choose which 492 participants' media streams to forward, but it is not the only 493 conceivable measure. Which and how many streams to select can either 494 be based on some algorithm included in the RTP Mixer (for example the 495 N most active speakers), or it can be controlled by the conference 496 owner, or even by the receiving users individually. 498 The figure below depicts the case where a receiving end-point 499 explicitly selects streams through signaling (*) to the Mixer. Both 500 the media senders and their streams are numbered for clarity, and the 501 conceptual signaling message is contained in {...}. 503 +----+ +----+ +----+ 504 | L1 | | M2 | | S3 | 505 +----+ +----+ +----+ 506 ' : | 507 1' 2: 3| 508 v v v 509 +----+ 4 +-----------+ 5 +----+ 510 | M4 |...>| Mixer |<'''| L5 | 511 +----+ +-----------+ +----+ 512 ^ |1,3,4 513 {1,3,4}* v 514 +---+ 515 | H | 516 +---+ 518 Figure 9: Receiver Stream Selection 520 To be able to make an informed choice on what streams to select, the 521 user at the receiving end-point will need information about which 522 conference participant correspond to which stream, and possibly also 523 other meta-information about the streams and sending end-points. 525 3.2.7. Media Composition 527 When it is desirable that the RTP Mixer selects (Section 3.2.6) and 528 forwards a larger number of simultaneous streams than what the 529 receiving end-point can support, the Mixer has the option to make a 530 composition of multiple streams onto fewer streams, possibly to only 531 a single stream. Which and how many streams to compose is typically 532 based on the selection, as described in the previous section above. 534 The composition operation is basically independent from selection. 535 In general, Mixer composition requires transcoding. A Mixer 536 composition use case example is depicted below. To simplify the 537 picture, only a single receiver is included. Also, both the media 538 senders and their streams are numbered for clarity. In the below 539 example, the Mixer has chosen to compose stream 1, 3 and 4 into the 540 single stream sent to the receiving end-point. 542 +----+ +----+ +----+ 543 | L1 | | M2 | | S3 | 544 +----+ +----+ +----+ 545 ' : | 546 1' 2: 3| 547 v v v 548 +----+ 4 +-----------+ 5 +----+ 549 | M4 |...>| Mixer |<'''| L5 | 550 +----+ +-----------+ +----+ 551 |1,3,4 552 v 553 +---+ 554 | H | 555 +---+ 557 Figure 10: Mixer Composition 559 An alternative to make composition in the Mixer is to let the end- 560 point do local composition by sending it multiple, un-composed, 561 streams. This could avoid transcoding at the cost of sending 562 multiple streams, which is depicted below, where stream 2, 3 and 5 563 are sent to the receiving client for local composition, as an 564 example. 566 +----+ +----+ +----+ 567 | L1 | | M2 | | S3 | 568 +----+ +----+ +----+ 569 ' : | 570 1' 2: 3| 571 v v v 572 +----+ 4 +-----------+ 5 +----+ 573 | M4 |...>| Mixer |<'''| L5 | 574 +----+ +-----------+ +----+ 575 2: 3| 5' 576 : | ' 577 v v v 578 +-----+ 579 | H | 580 +-----+ 582 Figure 11: Local Composition 584 When the senders offer streams of multiple qualities, either the 585 mixer or the local composition can select and combine media of 586 different qualities. Use of multiple qualities could help optimizing 587 resource utilization for transport, decoding and rendering. In the 588 figure below, one high quality (3), one medium quality (4) and two 589 low qualities (1 and 2) are selected for local composition, as an 590 example. 592 +----+ +----+ +----+ 593 | L1 | | M2 | | S3 | 594 +----+ +----+ +----+ 595 ' ' : ' : | 596 1' 2'2: 3'3:3| 597 4 v v v v v v 598 +----+...>+-------------+ 5 +----+ 599 | M4 | 4 | Mixer |<'''| L5 | 600 +----+'''>+-------------+ +----+ 601 3| 4: 1' 2' 602 | : ' ' 603 v v v v 604 +-------+ 605 | H | 606 +-------+ 608 Figure 12: Multi Quality Local Composition 610 This local composition scenario can be further enhanced by the Mixer 611 providing different quality streams to the receiver, based on the 612 Mixer selection algorithm. One example could be to let the Mixer 613 forward the stream from the most active speaker as a high quality 614 stream, and forward the less active speakers as lower quality 615 streams. To use this in the local composition, the receiving end- 616 point must know the streams' different roles, which requires a stream 617 role agreement between the Mixer and the receiving end-point. In the 618 figure above, this can be achieved by tagging for example the 619 leftmost stream from the mixer as having the "most active" role. The 620 role agreement can be made through signaling between Mixer and 621 receiving end-point. 623 When the receiving end-point supports reception and presentation of 624 several channels (for example has several screens), it is possible to 625 combine Mixer composition with local composition of multiple un- 626 composed streams by sending one or more composed streams and one or 627 more un-composed streams from the Mixer. 629 3.3. Multicast 631 In the multicast or multi-unicast case, each media stream from a 632 single sender will reach multiple receivers unmodified. This can be 633 achieved by multicast addressing, or by multi-unicast and RTP 634 Translators [RFC5117]. 636 This use case is similar to when an RTP Mixer is neither performing 637 composition (Section 3.2.7) nor source selection (Section 3.2.6), but 638 is forwarding all streams (and qualities, if more than one) to all 639 receivers. The entire load of stream and quality selection for 640 presentation is put on the receiving end-point. 642 In the figure below, multi-unicast through the use of an RTP 643 Translator is depicted since the figure becomes clearer than with a 644 full mesh multicast. The figure illustrates non-scalable streams, 645 but it is of course also possible to multicast scalable streams. 647 +---+ 648 | H | 649 +---+ 650 ' : | 651 v v v 652 +---------------+--->+---+---+ 653 | Translator |...>| D | 654 +---------------+'''>+---+---+ 655 ' : | ' : | ' : | 656 ' : | ' : | ' : | 657 v v v v v v v v v 658 +---+ +---+ +---+ 659 | L | | M | | H | 660 +---+ +---+ +---+ 662 Figure 13: Multi-Unicast of Multiple Qualities 664 4. RTP Usage 666 This section discusses how RTP transport [RFC3550] could be used with 667 the scenarios discussed in the previous section. It complements, 668 extends and partially presents an alternative solution to what is 669 described in [I-D.lennox-clue-rtp-usage]. 671 4.1. Use of SSRC and CSRC 673 It is assumed that each RTP media stream in the use cases in the 674 previous section is identified by an SSRC. There already exist 675 methods to convey information about each media stream and sending 676 end-point in a conference [RFC4575]. Those methods also provide 677 means to correlate that information with the stream SSRC. This 678 information could be sufficient for a receiving end-point to make 679 informed media stream selection decisions (Section 3.2.6). 681 When the RTP Mixer associates a role to a stream (Section 3.2.7), for 682 example "most active speaker" or "leftmost channel", it is possible 683 to associate that additional property to an SSRC belonging to the 684 Mixer, while also keeping the original SSRC in the RTP packet as 685 CSRC. This way, it is possible to apply special treatment to 686 received streams based on their SSRC without losing the ability to 687 identify the original source, using existing RTP functionality. 689 +----+ +----+ +----+ 690 | L1 | | M2 | | S3 | 691 +----+ +----+ +----+ 692 SSRC1' SSRC2: SSRC3| 693 1' 2: 3| 694 SSRC4 v v v SSRC5 695 +----+ 4 +-----------+ 5 +----+ 696 | M4 |...>| Mixer |<'''| L5 | 697 +----+ +-----------+ +----+ 698 SSRC6 :2 5' SSRC7 699 (CSRC2)v v(CSRC5) 700 +---+ 701 | H | 702 +---+ 704 Figure 14: SSRC and CSRC from Mixer 706 It is also possible in a receiving end-point to let each role (and 707 Mixer SSRC) map towards a specific media decoder, since that Mixer 708 SSRC would rarely (if ever) change during a session other than due to 709 SSRC conflicts, while the CSRC would typically change every time a 710 new stream is selected for that specific role, for example "active 711 speaker". 713 Note that when simulcast is used, different simulcast versions can 714 typically use different SSRC. When scalable coding is used, 715 different layers can sometimes be sent within a single SSRC using a 716 single Payload Type and thus cannot be distinguished on RTP level. 717 Identification of different layers will then have to be codec 718 specific. Some scalable codecs can also send different layers on 719 separate SSRC or using separate Payload Types. 721 4.2. Signaling Extensions 723 End-points and Mixers supporting multiple channels (Section 3.2.5) 724 need to know how many simultaneous channels that can be accepted in a 725 receiver and will be used from a sender. Assuming that each channel 726 is sent as a single SSRC, there should be signaling that limits the 727 number of SSRC in an RTP session [I-D.westerlund-avtcore-max-ssrc]. 728 This maps well with the above suggested relation between SSRC and 729 media decoders, since the suggested limitation then also expresses 730 the maximum simultaneously available decoding resources. 732 When representing a specific media source in several different 733 qualities and when using simulcast to transport (Section 3.2.3.2) 734 them rather than as scalable layers contained in a single stream, 735 those separate streams need to be signaled as simulcast versions 736 [I-D.westerlund-avtcore-rtp-simulcast], in order for the receiver to 737 be able to apply correct selection logic (Section 3.2.6). 739 When a conference is configured to let individual users at receiving 740 end-points choose which streams to receive (Section 3.2.6), 741 responsive selection signaling between end-point and RTP Mixer 742 [I-D.westerlund-dispatch-stream-selection] is needed to initiate the 743 stream selection. This selection is applicable to the media streams 744 included in a compositions also. 746 Note that when simulcast is used, different simulcast versions can 747 typically use different SSRC. When scalable coding is used, 748 different layers can sometimes be sent within a single SSRC using a 749 single Payload Type and thus cannot use the parts of SDP signaling 750 that relies on those identifiers. Identification of different layers 751 will then have to be codec specific. Some scalable codecs can also 752 send different layers on separate SSRC or using separate Payload 753 Types. 755 4.3. Optimizations 757 When it is desirable to minimize the number of UDP ports used by an 758 end-point, for example to reduce the resources for NAT and firewall 759 traversal, it should be possible to send all media streams from all 760 RTP sessions on a single UDP port 761 [I-D.westerlund-avtcore-transport-multiplexing]. This should 762 preferably be done without losing any important RTP functionality. 763 Transport resource priority and Quality of Service handling are 764 typically performed based on 5-tuple (source and destination 765 addresses and ports, and protocol), which together with desired 766 differentiation of media stream priority can require use of more than 767 one UDP port (5-tuple). 769 In a conference use case with multiple sending end-points and where 770 the receiving end-points do not make use of all available streams, 771 there is a risk that some of the sent streams are not used by any 772 receiver. The probability for this increases when end-points provide 773 multiple streams of different qualities. The need for a certain 774 stream can change very quickly, for example when the need is based on 775 conditions of other streams such as speech activity. To save 776 bandwidth and processing resources in the sending end-point, it would 777 thus be desirable for an RTP Mixer to be able to quickly turn off or 778 pause individual streams [I-D.westerlund-avtext-rtp-stream-pause] 779 that are no longer used in any media mix sent to receiving end- 780 points, and even more importantly be able to quickly resume needed 781 streams when they are needed again. 783 In use cases where multiple media streams (Section 3.2.5) are used in 784 a single RTP session, when SDP is used as signaling protocol, and 785 specifically when the number of streams depends on the SDP 786 negotiation outcome (Section 4.2), the currently defined bandwidth 787 signaling attribute is only capable of describing the maximum 788 possible bandwidth usage for the most demanding alternative. It 789 would be desirable to express bandwidth requirements in a more 790 precise way [I-D.westerlund-mmusic-sdp-bw-attribute]. 792 While any RTP stream relations such as for example spatial co- 793 location of related audio and video streams should be possible to 794 express in session signaling or other application signaling protocol, 795 there may be times when it is desirable that RTP stream SSRC 796 relations [I-D.westerlund-avtext-rtcp-sdes-srcname] such as simulcast 797 alternatives or related FEC streams can be seen directly in the RTP 798 or RTCP streams. This would allow for processing of media and 799 related streams in middle boxes, without the need to have access to 800 all higher layer signaling. Keeping protocol layer separation will 801 enable some architectural freedom and may ease future extensions. 803 5. Security Considerations 805 Any security considerations relevant to this memo are described in 806 the RFCs and drafts referenced in the RTP Usage section (Section 4). 808 6. IANA Considerations 810 This document makes no request of IANA. 812 Note to RFC Editor: this section may be removed on publication as an 813 RFC. 815 7. Acknowledgements 817 8. Informative References 819 [I-D.ietf-clue-framework] 820 Romanow, A., Duckworth, M., Pepperell, A., and B. Baldino, 821 "Framework for Telepresence Multi-Streams", 822 draft-ietf-clue-framework-02 (work in progress), 823 January 2012. 825 [I-D.lennox-clue-rtp-usage] 826 Lennox, J., Romanow, A., and P. Witty, "Real-Time 827 Transport Protocol (RTP) Usage for Telepresence Sessions", 828 draft-lennox-clue-rtp-usage-01 (work in progress), 829 October 2011. 831 [I-D.westerlund-avtcore-max-ssrc] 832 Westerlund, M., Burman, B., and F. Jansson, "Multiple 833 Synchronization sources (SSRC) in RTP Session Signaling", 834 draft-westerlund-avtcore-max-ssrc-00 (work in progress), 835 October 2011. 837 [I-D.westerlund-avtcore-rtp-simulcast] 838 Westerlund, M., Burman, B., Lindqvist, M., and F. Jansson, 839 "Using Simulcast in RTP sessions", 840 draft-westerlund-avtcore-rtp-simulcast-00 (work in 841 progress), October 2011. 843 [I-D.westerlund-avtcore-transport-multiplexing] 844 Westerlund, M. and C. Perkins, "Multiple RTP Session on a 845 Single Lower-Layer Transport", 846 draft-westerlund-avtcore-transport-multiplexing-01 (work 847 in progress), October 2011. 849 [I-D.westerlund-avtext-rtcp-sdes-srcname] 850 Westerlund, M., Burman, B., and P. Sandgren, "RTCP SDES 851 Item SRCNAME to Label Individual Sources", 852 draft-westerlund-avtext-rtcp-sdes-srcname-00 (work in 853 progress), October 2011. 855 [I-D.westerlund-avtext-rtp-stream-pause] 856 Akram, A., Burman, B., Grondal, D., and M. Westerlund, 857 "RTP Media Stream Pause and Resume", 858 draft-westerlund-avtext-rtp-stream-pause-00 (work in 859 progress), October 2011. 861 [I-D.westerlund-dispatch-stream-selection] 862 Grondal, D., Burman, B., and M. Westerlund, "Media Stream 863 Selection (MESS)", 864 draft-westerlund-dispatch-stream-selection-00 (work in 865 progress), October 2011. 867 [I-D.westerlund-mmusic-sdp-bw-attribute] 868 Frankkila, T., Westerlund, M., and B. Burman, "Extensible 869 Bandwidth Attribute for SDP", 870 draft-westerlund-mmusic-sdp-bw-attribute-00 (work in 871 progress), October 2011. 873 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 874 Jacobson, "RTP: A Transport Protocol for Real-Time 875 Applications", STD 64, RFC 3550, July 2003. 877 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session 878 Initiation Protocol (SIP) Event Package for Conference 879 State", RFC 4575, August 2006. 881 [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, 882 January 2008. 884 Authors' Addresses 886 Magnus Westerlund 887 Ericsson 888 Farogatan 6 889 SE-164 80 Kista 890 Sweden 892 Phone: +46 10 714 82 87 893 Email: magnus.westerlund@ericsson.com 895 Bo Burman 896 Ericsson 897 Farogatan 6 898 SE-164 80 Kista 899 Sweden 901 Phone: +46 10 714 13 11 902 Email: bo.burman@ericsson.com