idnits 2.17.1 draft-westerlund-avtcore-rtp-topologies-update-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- -- The draft header indicates that this document obsoletes RFC5117, but the abstract doesn't seem to directly say this. It does mention RFC5117 though, so this could be OK. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document date (October 15, 2012) is 4211 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-10) exists of draft-ietf-avtcore-rtp-security-options-00 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group M. Westerlund 3 Internet-Draft Ericsson 4 Obsoletes: 5117 (if approved) S. Wenger 5 Intended status: Informational Vidyo 6 Expires: April 18, 2013 October 15, 2012 8 RTP Topologies 9 draft-westerlund-avtcore-rtp-topologies-update-01 11 Abstract 13 This document discusses multi-endpoint topologies used in Real-time 14 Transport Protocol (RTP)-based environments. In particular, 15 centralized topologies commonly employed in the video conferencing 16 industry are mapped to the RTP terminology. 18 This document is updated with additional topologies and are intended 19 to replace RFC 5117. 21 Status of this Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at http://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on April 18, 2013. 38 Copyright Notice 40 Copyright (c) 2012 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (http://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 56 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 57 2.1. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . 3 58 2.2. Indicating Requirement Levels . . . . . . . . . . . . . . 4 59 3. Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 3.1. Point to Point . . . . . . . . . . . . . . . . . . . . . . 4 61 3.2. Point to Multipoint Using Multicast . . . . . . . . . . . 5 62 3.2.1. Any Source Multicast (ASM) . . . . . . . . . . . . . . 5 63 3.2.2. Source Specific Multicast (SSM) . . . . . . . . . . . 6 64 3.2.3. SSM with Local Unicast Resources . . . . . . . . . . . 8 65 3.3. Point to Multipoint Using Mesh . . . . . . . . . . . . . . 8 66 3.4. Point to Multipoint Using the RFC 3550 Translator . . . . 9 67 3.4.1. Relay - Transport Translator . . . . . . . . . . . . . 10 68 3.4.2. Media Translator . . . . . . . . . . . . . . . . . . . 11 69 3.5. Point to Multipoint Using the RFC 3550 Mixer Model . . . . 13 70 3.5.1. Media Mixing . . . . . . . . . . . . . . . . . . . . . 15 71 3.5.2. Media Switching . . . . . . . . . . . . . . . . . . . 17 72 3.6. Source Projecting Middlebox . . . . . . . . . . . . . . . 19 73 3.7. Point to Multipoint Using Video Switching MCUs . . . . . . 21 74 3.8. Point to Multipoint Using RTCP-Terminating MCU . . . . . . 23 75 3.9. De-composite Endpoint . . . . . . . . . . . . . . . . . . 24 76 3.10. Non-Symmetric Mixer/Translators . . . . . . . . . . . . . 25 77 3.11. Combining Topologies . . . . . . . . . . . . . . . . . . . 26 78 4. Comparing Topologies . . . . . . . . . . . . . . . . . . . . . 26 79 4.1. Topology Properties . . . . . . . . . . . . . . . . . . . 27 80 4.1.1. All to All Media Transmission . . . . . . . . . . . . 27 81 4.1.2. Transport or Media Interoperability . . . . . . . . . 27 82 4.1.3. Per Domain Bit-Rate Adaptation . . . . . . . . . . . . 28 83 4.1.4. Aggregation of Media . . . . . . . . . . . . . . . . . 28 84 4.1.5. View of All Session Participants . . . . . . . . . . . 28 85 4.1.6. Loop Detection . . . . . . . . . . . . . . . . . . . . 28 86 4.2. Comparison of Topologies . . . . . . . . . . . . . . . . . 29 87 5. Security Considerations . . . . . . . . . . . . . . . . . . . 29 88 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 31 89 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 31 90 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 31 91 8.1. Normative References . . . . . . . . . . . . . . . . . . . 31 92 8.2. Informative References . . . . . . . . . . . . . . . . . . 32 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 32 95 1. Introduction 97 When working on the Codec Control Messages [RFC5104], considerable 98 confusion was noticed in the community with respect to terms such as 99 Multipoint Control Unit (MCU), Mixer, and Translator, and their usage 100 in various topologies. This document tries to address this confusion 101 by providing a common information basis for future discussion and 102 specification work. It attempts to clarify and explain sections of 103 the Real-time Transport Protocol (RTP) spec [RFC3550] in an informal 104 way. It is not intended to update or change what is normatively 105 specified within RFC 3550. 107 When the Audio-Visual Profile with Feedback (AVPF) [RFC4585] was 108 developed the main emphasis lay in the efficient support of point to 109 point and small multipoint scenarios without centralized multipoint 110 control. However, in practice, many small multipoint conferences 111 operate utilizing devices known as Multipoint Control Units (MCUs). 112 MCUs may implement Mixer or Translator (in RTP [RFC3550] terminology) 113 functionality and signalling support. They may also contain 114 additional application functionality. This document focuses on the 115 media transport aspects of the MCU that can be realized using RTP, as 116 discussed below. Further considered are the properties of Mixers and 117 Translators, and how some types of deployed MCUs deviate from these 118 properties. 120 2. Definitions 122 2.1. Glossary 124 ASM: Any Source Multicast 126 AVPF: The Extended RTP Profile for RTCP-based Feedback 128 CSRC: Contributing Source 130 Link: The data transport to the next IP hop 132 MCU: Multipoint Control Unit 134 Path: The concatenation of multiple links, resulting in an end-to- 135 end data transfer. 137 PtM: Point to Multipoint 138 PtP: Point to Point 140 SSM: Source-Specific Multicast 142 SSRC: Synchronization Source 144 2.2. Indicating Requirement Levels 146 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 147 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 148 document are to be interpreted as described in RFC 2119 [RFC2119]. 150 The RFC 2119 language is used in this document to highlight those 151 important requirements and/or resulting solutions that are necessary 152 to address the issues raised in this document. 154 3. Topologies 156 This subsection defines several topologies that are relevant for 157 codec control but also RTP usage in other contexts. The first relate 158 to the RTP system model utilizing multicast and/or unicast, as 159 envisioned in RFC 3550. Two later topologies (MCU and RTCP 160 terminating), in contrast, describe the deployed system models as 161 used in many H.323 [H323] video conferences, where both the media 162 streams and the RTP Control Protocol (RTCP) control traffic terminate 163 at the MCU. In these two cases, the media sender does not receive 164 the (unmodified or Translator-modified) Receiver Reports from all 165 sources (which it needs to interpret based on Synchronization Source 166 (SSRC) values) and therefore has no full information about all the 167 endpoint's situation as reported in RTCP Receiver Reports (RRs). 168 More topologies can be constructed by combining any of the models; 169 see Section 3.11. 171 The topologies may be referenced in other documents by a shortcut 172 name, indicated by the prefix "Topo-". 174 For each of the RTP-defined topologies, we discuss how RTP, RTCP, and 175 the carried media are handled. With respect to RTCP, we also 176 introduce the handling of RTCP feedback messages as defined in 177 [RFC4585] and [RFC5104]. Any important differences between the two 178 will be illuminated in the discussion. 180 3.1. Point to Point 182 Shortcut name: Topo-Point-to-Point 184 The Point to Point (PtP) topology (Figure 1) consists of two 185 endpoints, communicating using unicast. Both RTP and RTCP traffic 186 are conveyed endpoint-to-endpoint, using unicast traffic only (even 187 if, in exotic cases, this unicast traffic happens to be conveyed over 188 an IP-multicast address). 190 +---+ +---+ 191 | A |<------->| B | 192 +---+ +---+ 194 Figure 1: Point to Point 196 The main property of this topology is that A sends to B, and only B, 197 while B sends to A, and only A. This avoids all complexities of 198 handling multiple endpoints and combining the requirements from them. 199 Note that an endpoint can still use multiple RTP Synchronization 200 Sources (SSRCs) in an RTP session. The number of RTP sessions in use 201 between A and B can also be of any number. 203 RTCP feedback messages for the indicated SSRCs are communicated 204 directly between the endpoints. Therefore, this topology poses 205 minimal (if any) issues for any feedback messages. 207 3.2. Point to Multipoint Using Multicast 209 Multicast is a IP layer functionality that is available in some 210 networks. It comes in two main flavors, Any Source Multicast (ASM) 211 where any multicast group participant can send to the group address 212 and expect the packet to reach all group participants. The other 213 model is Source Specific Multicast (SSM) where only a particular IP 214 host is allowed to send to the multicast group. Both these models 215 are discussed below in their respective section. 217 3.2.1. Any Source Multicast (ASM) 219 Shortcut name: Topo-ASM (was Topo-Multicast) 221 +-----+ 222 +---+ / \ +---+ 223 | A |----/ \---| B | 224 +---+ / Multi- \ +---+ 225 + Cast + 226 +---+ \ Network / +---+ 227 | C |----\ /---| D | 228 +---+ \ / +---+ 229 +-----+ 231 Figure 2: Point to Multipoint Using Multicast 233 Point to Multipoint (PtM) is defined here as using a multicast 234 topology as a transmission model, in which traffic from any 235 participant reaches all the other participants, except for cases such 236 as: 238 o packet loss, or 240 o when a participant does not wish to receive the traffic for a 241 specific multicast group and therefore has not subscribed to the 242 IP-multicast group in question. This is for the cases where a 243 multi-media session is distributed using two or more multicast 244 groups. 246 In the above context, "traffic" encompasses both RTP and RTCP 247 traffic. The number of participants can vary between one and many, 248 as RTP and RTCP scale to very large multicast groups (the theoretical 249 limit of the number of participants in a single RTP session is 250 approximately two billion). The above can be realized using Any 251 Source Multicast (ASM). 253 For feedback usage it is relevant to make distinction of that subset 254 of multicast sessions wherein the number of participants in the 255 multicast group is so low that it allows the participants to use 256 early or immediate feedback, as defined in AVPF [RFC4585]. This 257 document refers to those groups as "small multicast groups". Some 258 applications may still want to use larger multicast groups where the 259 RTCP feedback possibilities are more limited. 261 RTCP feedback messages in multicast will, like media, reach everyone 262 (subject to packet losses and multicast group subscription). 263 Therefore, the feedback suppression mechanism discussed in [RFC4585] 264 is required. Each individual node needs to process every feedback 265 message it receives to determine if it is affected or if the feedback 266 message applies only to some other participant. 268 3.2.2. Source Specific Multicast (SSM) 270 In Any Source Multicast, any of the participants can send to all the 271 other participants, simply by sending a packet to the multicast 272 group. That is not possible in Source Specific Multicast [RFC4607] 273 where only a single source (Distribution Source) can send to the 274 multicast group, creating a topology that looks like the one below: 276 +--------+ +-----+ 277 |Media | | | Source-specific 278 |Sender 1|<----->| D S | Multicast 279 +--------+ | I O | +--+----------------> R(1) 280 | S U | | | | 281 +--------+ | T R | | +-----------> R(2) | 282 |Media |<----->| R C |->+ | : | | 283 |Sender 2| | I E | | +------> R(n-1) | | 284 +--------+ | B | | | | | | 285 : | U | +--+--> R(n) | | | 286 : | T +-| | | | | 287 : | I | |<---------+ | | | 288 +--------+ | O |F|<---------------+ | | 289 |Media | | N |T|<--------------------+ | 290 |Sender M|<----->| | |<-------------------------+ 291 +--------+ +-----+ RTCP Unicast 293 FT = Feedback Target 294 Transport from the Feedback Target to the Distribution 295 Source is via unicast or multicast RTCP if they are not 296 co-located. 298 Figure 3: Point to Multipoint using Source Specific Multicast 300 In the SSM topology (Figure 3) a number of RTP sources (1 to M) are 301 allowed to send media to the SSM group. These send media to the 302 distribution source which then forwards the media streams to the 303 multicast group. The media streams reach the Receivers (R(1) to 304 R(n)). The Receivers' RTCP cannot be sent to the multicast group. 305 To support RTCP, an RTP extension for SSM [RFC5760] was defined to 306 use unicast transmission to send RTCP from the receivers to one or 307 more Feedback Targets (FT). 309 The RTP extension for SSM deals with how feedback both general 310 reception information and specific feedback events are generally 311 handled. The general problems of multicast that everyone will 312 receive what the distribution source sends needs to be accounted for. 314 The result of this is some common behaviours for RTP multicast: 316 1. Multicast applications often use a group of RTP sessions, not 317 one. Each endpoint will need to be a member of a number of RTP 318 sessions in order to perform well. 320 2. Within each RTP session, the number of media sinks is likely to 321 be much larger than the number of RTP sources. 323 3. Multicast applications need signalling functions to identify the 324 relationships between RTP sessions. 326 4. Multicast applications need signalling functions to identify the 327 relationships between SSRCs in different RTP sessions. 329 All multicast configurations share a signalling requirement; all of 330 the participants will need to have the same RTP and payload type 331 configuration. Otherwise, A could for example be using payload type 332 97 as the video codec H.264 while B thinks it is MPEG-2. 334 Security solutions for this type of group communications are also 335 challenging. First of all the key-management and the security 336 protocol must support group communication. Source authentication 337 becomes more difficult and requires special solutions. For more 338 discussion on this please review Options for Securing RTP Sessions 339 [I-D.ietf-avtcore-rtp-security-options]. 341 3.2.3. SSM with Local Unicast Resources 343 [RFC6285] "Unicast-Based Rapid Acquisition of Multicast RTP Sessions" 344 results in additional extensions to SSM Topology. Should be 345 described. 347 3.3. Point to Multipoint Using Mesh 349 Shortcut name: Topo-Mesh 351 +---+ +---+ 352 | A |<---->| B | 353 +---+ +---+ 354 ^ ^ 355 \ / 356 \ / 357 v v 358 +---+ 359 | C | 360 +---+ 362 Figure 4: Point to Multi-Point using Mesh 364 Based on the RTP session definition, it is clearly possible to have a 365 joint RTP session over multiple unicast transport flows like the 366 above three endpoint joint session. In this case, A needs to send 367 its' media streams and RTCP packets to both B and C over their 368 respective transport flows. As long as all participants do the same, 369 everyone will have a joint view of the RTP session. 371 This doesn't create any additional requirements beyond the need to 372 have multiple transport flows associated with a single RTP session. 373 Note that an endpoint may use a single local port to receive all 374 these transport flows, or it might have separate local reception 375 ports for each of the endpoints. 377 There exists an alternative structure for establishing the above 378 topology which uses independent RTP sessions between each pair of 379 peers, i.e. three different RTP sessions. Unless independently 380 adapted the same RTP media stream could be sent in both of the RTP 381 sessions an endpoint has. The difference exists in the behaviours 382 around RTCP, for example common RTCP bandwidth for one joint session, 383 rather than three independent pools, and the awareness based on RTCP 384 reports between the peers of how that third leg is doing. 386 3.4. Point to Multipoint Using the RFC 3550 Translator 388 Shortcut name: Topo-Translator 390 Two main categories of Translators can be distinguished; Transport 391 Translators and Media translators. Both Translator types share 392 common attributes that separate them from Mixers. For each media 393 stream that the Translator receives, it generates an individual 394 stream in the other domain. A Translator always keeps the SSRC for a 395 stream across the translation, where a Mixer can select a media 396 stream, or send them out mixed, always under its own SSRC, using the 397 CSRC field to indicate the source(s) of the content. 399 As specified in Section 7.1 of [RFC3550], the SSRC space is common 400 for all participants in the session, independent of on which side 401 they are of the Translator. Therefore, it is the responsibility of 402 the participants to run SSRC collision detection, and the SSRC is a 403 field the Translator should not change. 405 A Translator commonly does not use an SSRC of its own, and is not 406 visible as an active participant in the session. One reason is when 407 a Translator acts as a quality monitor that sends RTCP reports and 408 therefore is required to have an SSRC. Another example is the case 409 when a Translator is prepared to use RTCP feedback messages. This 410 may, for example, occur when it suffers packet loss of important 411 video packets and wants to trigger repair by the media sender, by 412 sending feedback messages. This can be done using the SSRC of the 413 target for the translator, but this requires translation of the 414 targets RTCP reports to make them consistent, so it is likely simpler 415 to expose an additional SSRC in the session. 417 In general, a Translator implementation should consider which RTCP 418 feedback messages or codec-control messages it needs to understand in 419 relation to the functionality of the Translator itself. This is 420 completely in line with the requirement to also translate RTCP 421 messages between the domains. 423 The RTCP translation process can be trivial, for example, when 424 Transport Translators just need to adjust IP addresses and transport 425 protocol ports, or they can be quite complex as in the case of media 426 Translators. See Section 7.2 of [RFC3550]. 428 3.4.1. Relay - Transport Translator 430 Transport Translators (Topo-Trn-Translator) do not modify the media 431 stream itself, but are concerned with transport parameters. 432 Transport parameters, in the sense of this section, comprise the 433 transport addresses (to bridge different domains) and the media 434 packetization to allow other transport protocols to be interconnected 435 to a session (in gateways). Of the transport Translators, this memo 436 is primarily interested in those that use RTP on both sides, and this 437 is assumed henceforth. Translators that bridge between different 438 protocol worlds need to be concerned about the mapping of the SSRC/ 439 CSRC (Contributing Source) concept to the non-RTP protocol. When 440 designing a Translator to a non-RTP-based media transport, one 441 crucial factor lies in how to handle different sources and their 442 identities. This problem space is not discussed henceforth. 444 +-----+ 445 +---+ / \ +------------+ +---+ 446 | A |<---/ \ | |<---->| B | 447 +---+ / Multi- \ | | +---+ 448 + Cast +->| Translator | 449 +---+ \ Network / | | +---+ 450 | C |<---\ / | |<---->| D | 451 +---+ \ / +------------+ +---+ 452 +-----+ 454 Figure 5: Point to Multipoint Using Multicast 456 Figure 5 depicts an example of a Transport Translator performing at 457 least IP address translation. It allows the (non-multicast-capable) 458 participants B and D to take part in an any source multicast session 459 by having the Translator forward their unicast traffic to the 460 multicast addresses in use, and vice versa. It must also forward B's 461 traffic to D, and vice versa, to provide each of B and D with a 462 complete view of the session. 464 Also a point to point communication can end up in a situation when 465 the peer it is communicating with needs basic transport translators 466 functions. This include NAT traversal by pinning the media path to a 467 public address domain relay, network topologies where the media flow 468 is required to pass a particular point by employing relaying or 469 preserving privacy by hiding each peers transport addresses to the 470 other party. 472 +---+ +---+ +---+ 473 | A |<------>| T |<------->| B | 474 +---+ +---+ +---+ 476 Point to Point with Translator 478 This type of very basic relay service should in most case need to 479 have no RTP functionality. Thus one can believe that they do not 480 need to included in this document. However, due to that the network 481 level addressing and the RTP identifier view of the RTP session and 482 who the peer is doesn't match as in the PtP unicast scenario depicted 483 above this topology can raise additional requirements. 485 +---+ +------------+ +---+ 486 | A |<---->| |<---->| B | 487 +---+ | | +---+ 488 | Translator | 489 +---+ | | +---+ 490 | C |<---->| |<---->| D | 491 +---+ +------------+ +---+ 493 Figure 6: RTP Translator (Relay) with Only Unicast Paths 495 Another Translator scenario is depicted in Figure 6. Herein, the 496 Translator connects multiple users of a conference through unicast. 497 This can be implemented using a very simple transport Translator, 498 which in this document is called a relay. The relay forwards all 499 traffic it receives, both RTP and RTCP, to all other participants. 500 In doing so, a multicast network is emulated without relying on a 501 multicast-capable network infrastructure. 503 For RTCP feedback this results in a similar considerations that arise 504 for the ASM RTP topology. It also puts some special signalling 505 requirements where common configuration of RTP payload types for 506 example are required. 508 3.4.2. Media Translator 510 Media Translators (Topo-Media-Translator), in contrast, modify the 511 media stream itself. This process is commonly known as transcoding. 512 The modification of the media stream can be as small as removing 513 parts of the stream, and it can go all the way to a full transcoding 514 (down to the sample level or equivalent) utilizing a different media 515 codec. Media Translators are commonly used to connect entities 516 without a common interoperability point. 518 Stand-alone Media Translators are rare. Most commonly, a combination 519 of Transport and Media Translators are used to translate both the 520 media stream and the transport aspects of a stream between two 521 transport domains (or clouds). 523 If B in Figure 5 were behind a limited network path, the Translator 524 may perform media transcoding to allow the traffic received from the 525 other participants to reach B without overloading the path. 527 When, in the example depicted in Figure 5, the Translator acts only 528 as a Transport Translator, then the RTCP traffic can simply be 529 forwarded, similar to the media traffic. However, when media 530 translation occurs, the Translator's task becomes substantially more 531 complex, even with respect to the RTCP traffic. In this case, the 532 Translator needs to rewrite B's RTCP Receiver Report before 533 forwarding them to D and the multicast network. The rewriting is 534 needed as the stream received by B is not the same stream as the 535 other participants receive. For example, the number of packets 536 transmitted to B may be lower than what D receives, due to the 537 different media format and data rate. Therefore, if the Receiver 538 Reports were forwarded without changes, the extended highest sequence 539 number would indicate that B were substantially behind in reception, 540 while it most likely it would not be. Therefore, the Translator must 541 translate that number to a corresponding sequence number for the 542 stream the Translator received. Similar arguments can be made for 543 most other fields in the RTCP Receiver Reports. 545 A media Translator may in some cases act on behalf of the "real" 546 source and respond to RTCP feedback messages. This may occur, for 547 example, when a receiver requests a bandwidth reduction, and the 548 media Translator has not detected any congestion or other reasons for 549 bandwidth reduction between the media source and itself. In that 550 case, it is sensible that the media Translator reacts to the codec 551 control messages itself, for example, by transcoding to a lower media 552 rate. If it were not reacting, the media quality in the media 553 sender's domain may suffer, as a result of the media sender adjusting 554 its media rate (and quality) according to the needs of the slow past- 555 Translator endpoint, at the expense of the rate and quality of all 556 other session participants. 558 A variant of translator behaviour worth pointing out is the one 559 depicted in Figure 7 of an endpoint A sends a media flow to B. On the 560 path there is a device T that on A's behalf does something with the 561 media streams, for example adds an RTP session with FEC information 562 for A's media streams. T will in this case need to bind the new FEC 563 streams to A's media stream, for example by using the same CNAME as 564 A. 566 +------+ +------+ +------+ 567 | | | | | | 568 | A |------->| T |-------->| B | 569 | | | |---FEC-->| | 570 +------+ +------+ +------+ 572 Figure 7: When De-composition is a Translator 574 This type of functionality where T does something with the media 575 stream on behalf of A is clearly covered under the media translator 576 definition. 578 3.5. Point to Multipoint Using the RFC 3550 Mixer Model 580 Shortcut name: Topo-Mixer 582 A Mixer is a middlebox that aggregates multiple RTP streams, which 583 are part of a session, by manipulation of the media data and 584 generating a new RTP stream. One common application for a Mixer is 585 to allow a participant to receive a session with a reduced amount of 586 resources. 588 +-----+ 589 +---+ / \ +-----------+ +---+ 590 | A |<---/ \ | |<---->| B | 591 +---+ / Multi- \ | | +---+ 592 + Cast +->| Mixer | 593 +---+ \ Network / | | +---+ 594 | C |<---\ / | |<---->| D | 595 +---+ \ / +-----------+ +---+ 596 +-----+ 598 Figure 8: Point to Multipoint Using the RFC 3550 Mixer Model 600 A Mixer can be viewed as a device terminating the media streams 601 received from other session participants. Using the media data from 602 the received media streams, a Mixer generates a media stream that is 603 sent to the session participant. 605 The content that the Mixer provides is the mixed aggregate of what 606 the Mixer receives over the PtP or PtM paths, which are part of the 607 same conference session. 609 The Mixer is the content source, as it mixes the content (often in 610 the uncompressed domain) and then encodes it for transmission to a 611 participant. The CSRC Count (CC) and CSRC fields in the RTP header 612 are used to indicate the contributors of to the newly generated 613 stream. The SSRCs of the to-be-mixed streams on the Mixer input 614 appear as the CSRCs at the Mixer output. That output stream uses a 615 unique SSRC that identifies the Mixer's stream. The CSRC should be 616 forwarded between the two domains to allow for loop detection and 617 identification of sources that are part of the global session. Note 618 that Section 7.1 of RFC 3550 requires the SSRC space to be shared 619 between domains for these reasons. 621 The Mixer is responsible for generating RTCP packets in accordance 622 with its role. It is a receiver and should therefore send reception 623 reports for the media streams it receives. In its role as a media 624 sender, it should also generate Sender Reports for those media 625 streams sent. As specified in Section 7.3 of RFC 3550, a Mixer must 626 not forward RTCP unaltered between the two domains. 628 The Mixer depicted in Figure 8 is involved in three domains that need 629 to be separated: the any source multicast network, participant B, and 630 participant D. The Mixer produces different mixed streams to B and D, 631 as the one to B may contain content received from D, and vice versa. 632 However, the Mixer may only need one SSRC per media type in each 633 domain that is the receiving entity and transmitter of mixed content. 635 In the multicast domain, a Mixer still needs to provide a mixed view 636 of the other domains. This makes the Mixer simpler to implement and 637 avoids any issues with advanced RTCP handling or loop detection, 638 which would be problematic if the Mixer were providing non-symmetric 639 behavior. Please see Section 3.10 for more discussion on this topic. 640 However, the mixing operation in each domain could potentially be 641 different. 643 A Mixer is responsible for receiving RTCP feedback messages and 644 handling them appropriately. The definition of "appropriate" depends 645 on the message itself and the context. In some cases, the reception 646 of a codec-control message may result in the generation and 647 transmission of RTCP feedback messages by the Mixer to the 648 participants in the other domain. In other cases, a message is 649 handled by the Mixer itself and therefore not forwarded to any other 650 domain. 652 When replacing the multicast network in Figure 8 (to the left of the 653 Mixer) with individual unicast paths as depicted in Figure 9, the 654 Mixer model is very similar to the one discussed in Section 3.8 655 below. Please see the discussion in Section 3.8 about the 656 differences between these two models. 658 +---+ +------------+ +---+ 659 | A |<---->| |<---->| B | 660 +---+ | | +---+ 661 | Mixer | 662 +---+ | | +---+ 663 | C |<---->| |<---->| D | 664 +---+ +------------+ +---+ 666 Figure 9: RTP Mixer with Only Unicast Paths 668 Lets now discuss in more detail different mixing operations that a 669 mixer can perform and how that can affect the RTP and RTCP. 671 3.5.1. Media Mixing 673 The media mixing mixer is likely the one that most thinks of when 674 they hear the term mixer. Its basic pattern of operation is that it 675 will receive the different participants RTP media stream. Select 676 which that are to be included in a media domain mix of the incoming 677 RTP media streams. Then create a single outgoing stream from this 678 mix. 680 The most commonly deployed media mixer is probably the audio mixer, 681 used in voice conferencing, where the output consists of some mixture 682 of all the input streams; this needs minimal signalling to be 683 successful. Audio mixing is straight forward and commonly possible 684 to do for a number of participants. Lets assume that you want to mix 685 N number of streams from different participants. Then the mixer need 686 to perform N decodings. Then it needs to produce N or N+1 mixes, the 687 reasons that different mixes are needed are so that each contributing 688 source get a mix which don't contain themselves, as this would result 689 in an echo. When N is lower than the number of all participants one 690 may produce a Mix of all N streams for the group that are currently 691 not included in the mix, thus N+1 mixes. These audio streams are 692 then encoded again, RTP packetised and sent out. 694 Video can't really be "mixed" and produce something particular useful 695 for the users, however creating an composition out of the contributed 696 video streams can be done. In fact it can be done in a number of 697 ways, tiling the different streams creating a chessboard, selecting 698 someone as more important and showing them large and a number of 699 other sources as smaller overlays is another. Also here one commonly 700 need to produce a number of different compositions so that the 701 contributing part doesn't need to see themselves. Then the mixer re- 702 encodes the created video stream, RTP packetise it and send it out 704 The problem with media mixing is that it both consume large amount of 705 media processing and encoding resources. The second is the quality 706 degradation created by decoding and re-encoding the RTP media stream. 707 Its advantage is that it is quite simplistic for the clients to 708 handle as they don't need to handle local mixing and composition. 709 +-A---------+ +-MIXER----------------------+ 710 | +-RTP1----| |-RTP1------+ +-----+ | 711 | | +-Audio-| |-Audio---+ | +---+ | | | 712 | | | AA1|--------->|---------+-+-|DEC|->| | | 713 | | | |<---------|MA1 <----+ | +---+ | | | 714 | | | | |(BA1+CA1)|\| +---+ | | | 715 | | +-------| |---------+ +-|ENC|<-| B+C | | 716 | +---------| |-----------+ +---+ | | | 717 +-----------+ | | | | 718 | | M | | 719 +-B---------+ | | E | | 720 | +-RTP2----| |-RTP2------+ | D | | 721 | | +-Audio-| |-Audio---+ | +---+ | I | | 722 | | | BA1|--------->|---------+-+-|DEC|->| A | | 723 | | | |<---------|MA2 <----+ | +---+ | | | 724 | | +-------| |(BA1+CA1)|\| +---+ | | | 725 | +---------| |---------+ +-|ENC|<-| A+C | | 726 +-----------+ |-----------+ +---+ | | | 727 | | M | | 728 +-C---------+ | | I | | 729 | +-RTP3----| |-RTP3------+ | X | | 730 | | +-Audio-| |-Audio---+ | +---+ | E | | 731 | | | CA1|--------->|---------+-+-|DEC|->| R | | 732 | | | |<---------|MA3 <----+ | +---+ | | | 733 | | +-------| |(BA1+CA1)|\| +---+ | | | 734 | +---------| |---------+ +-|ENC|<-| A+B | | 735 +-----------+ |-----------+ +---+ +-----+ | 736 +----------------------------+ 738 Figure 10: Session and SSRC details for Media Mixer 740 From an RTP perspective media mixing can be very straight forward as 741 can be seen in Figure 10. The mixer present one SSRC towards the 742 peer client, e.g. MA1 to Peer A, which is the media mix of the other 743 participants. As each peer receives a different version produced by 744 the mixer there are no actual relation between the different RTP 745 sessions in the actual media or the transport level information. 746 There is however one connection between RTP1-RTP3 in this figure. It 747 has to do with the SSRC space and the identity information. When A 748 receives the MA1 stream which is a combination of BA1 and CA1 749 streams, the mixer may include CSRC information in the MA1 stream to 750 identify the contributing source BA1 and CA1. 752 The CSRC has in its turn utility in RTP extensions, like the in Mixer 753 to Client audio levels RTP header extension [RFC6465]. If the SSRC 754 from endpoint to mixer leg are used as CSRC in another RTP session 755 then RTP1, RTP2 and RTP3 becomes one joint session as they have a 756 common SSRC space. At this stage the mixer also need to consider 757 which RTCP information it need to expose in the different legs. For 758 the above situation commonly nothing more than the Source Description 759 (SDES) information and RTCP BYE for CSRC need to be exposed. The 760 main goal would be to enable the correct binding against the 761 application logic and other information sources. This also enables 762 loop detection in the RTP session. 764 3.5.2. Media Switching 766 An RTP Mixer based on media switching avoids the media decoding and 767 encoding cycle in the mixer, but not the decryption and re-encryption 768 cycle as it rewrites RTP headers. This both reduces the amount of 769 computational resources needed in the mixer and increases the media 770 quality per transmitted bit. This is achieve by letting the mixer 771 have a number of SSRCs that represents conceptual or functional 772 streams the mixer produces. These streams are created by selecting 773 media from one of the by the mixer received RTP media streams and 774 forward the media using the mixers own SSRCs. The mixer can then 775 switch between available sources if that is required by the concept 776 for the source, like currently active speaker. 778 To achieve a coherent RTP media stream from the mixer's SSRC the 779 mixer is forced to rewrite the incoming RTP packet's header. First 780 the SSRC field must be set to the value of the Mixer's SSRC. 781 Secondly, the sequence number must be the next in the sequence of 782 outgoing packets it sent. Thirdly the RTP timestamp value needs to 783 be adjusted using an offset that changes each time one switch media 784 source. Finally depending on the negotiation the RTP payload type 785 value representing this particular RTP payload configuration may have 786 to be changed if the different endpoint mixer legs have not arrived 787 on the same numbering for a given configuration. This also requires 788 that the different end-points do support a common set of codecs, 789 otherwise media transcoding for codec compatibility is still 790 required. 792 Lets consider the operation of media switching mixer that supports a 793 video conference with six participants (A-F) where the two latest 794 speakers in the conference are shown to each participants. Thus the 795 mixer has two SSRCs sending video to each peer. 797 +-A---------+ +-MIXER----------------------+ 798 | +-RTP1----| |-RTP1------+ +-----+ | 799 | | +-Video-| |-Video---+ | | | | 800 | | | AV1|------------>|---------+-+------->| S | | 801 | | | |<------------|MV1 <----+-+-BV1----| W | | 802 | | | |<------------|MV2 <----+-+-EV1----| I | | 803 | | +-------| |---------+ | | T | | 804 | +---------| |-----------+ | C | | 805 +-----------+ | | H | | 806 | | | | 807 +-B---------+ | | M | | 808 | +-RTP2----| |-RTP2------+ | A | | 809 | | +-Video-| |-Video---+ | | T | | 810 | | | BV1|------------>|---------+-+------->| R | | 811 | | | |<------------|MV3 <----+-+-AV1----| I | | 812 | | | |<------------|MV4 <----+-+-EV1----| X | | 813 | | +-------| |---------+ | | | | 814 | +---------| |-----------+ | | | 815 +-----------+ | | | | 816 : : : : 817 : : : : 818 +-F---------+ | | | | 819 | +-RTP6----| |-RTP6------+ | | | 820 | | +-Video-| |-Video---+ | | | | 821 | | | CV1|------------>|---------+-+------->| | | 822 | | | |<------------|MV11 <---+-+-AV1----| | | 823 | | | |<------------|MV12 <---+-+-EV1----| | | 824 | | +-------| |---------+ | | | | 825 | +---------| |-----------+ +-----+ | 826 +-----------+ +----------------------------+ 828 Figure 11: Media Switching RTP Mixer 830 The Media Switching RTP mixer can similar to the Media Mixing one 831 reduce the bit-rate needed towards the different peers by selecting 832 and switching in a sub-set of RTP media streams out of the ones it 833 receives from the conference participants. 835 To ensure that a media receiver can correctly decode the RTP media 836 stream after a switch, it becomes necessary to ensure for state 837 saving codecs that they start from default state at the point of 838 switching. Thus one common tool for video is to request that the 839 encoding creates an intra picture, something that isn't dependent on 840 earlier state. This can be done using Full Intra Request [RFC5104] 841 RTCP codec control message. 843 Also in this type of mixer one could consider to terminate the RTP 844 sessions fully between the different end-point and mixer legs. The 845 same arguments and considerations as discussed in Section 3.8 applies 846 here. 848 3.6. Source Projecting Middlebox 850 Another method for handling media in the RTP mixer is to project all 851 potential RTP sources (SSRCs) into a per end-point independent RTP 852 session. The mixer can then select which of the potential sources 853 that are currently actively transmitting media, despite that the 854 mixer in another RTP session receives media from that end-point. 855 This is similar to the media switching Mixer but have some important 856 differences in RTP details. 858 +-A---------+ +-MIXER---------------------+ 859 | +-RTP1----| |-RTP1------+ +-----+ | 860 | | +-Video-| |-Video---+ | | | | 861 | | | AV1|------------>|---------+-+------>| | | 862 | | | |<------------|BV1 <----+-+-------| S | | 863 | | | |<------------|CV1 <----+-+-------| W | | 864 | | | |<------------|DV1 <----+-+-------| I | | 865 | | | |<------------|EV1 <----+-+-------| T | | 866 | | | |<------------|FV1 <----+-+-------| C | | 867 | | +-------| |---------+ | | H | | 868 | +---------| |-----------+ | | | 869 +-----------+ | | M | | 870 | | A | | 871 +-B---------+ | | T | | 872 | +-RTP2----| |-RTP2------+ | R | | 873 | | +-Video-| |-Video---+ | | I | | 874 | | | BV1|------------>|---------+-+------>| X | | 875 | | | |<------------|AV1 <----+-+-------| | | 876 | | | |<------------|CV1 <----+-+-------| | | 877 | | | | : : : |: : : : : : : : :| | | 878 | | | |<------------|FV1 <----+-+-------| | | 879 | | +-------| |---------+ | | | | 880 | +---------| |-----------+ | | | 881 +-----------+ | | | | 882 : : : : 883 : : : : 884 +-F---------+ | | | | 885 | +-RTP6----| |-RTP6------+ | | | 886 | | +-Video-| |-Video---+ | | | | 887 | | | CV1|------------>|---------+-+------>| | | 888 | | | |<------------|AV1 <----+-+-------| | | 889 | | | | : : : |: : : : : : : : :| | | 890 | | | |<------------|EV1 <----+-+-------| | | 891 | | +-------| |---------+ | | | | 892 | +---------| |-----------+ +-----+ | 893 +-----------+ +---------------------------+ 895 Figure 12: Media Projecting Mixer 897 So in this six participant conference depicted above in (Figure 12) 898 one can see that end-point A will in this case be aware of 5 incoming 899 SSRCs, BV1-FV1. If this mixer intend to have the same behaviour as 900 in Section 3.5.2 where the mixer provides the end-points with the two 901 latest speaking end-points, then only two out of these five SSRCs 902 will concurrently transmit media to A. As the mixer selects which 903 source in the different RTP sessions that transmit media to the end- 904 points each RTP media stream will require some rewriting when being 905 projected from one session into another. The main thing is that the 906 sequence number will need to be consecutively incremented based on 907 the packet actually being transmitted in each RTP session. Thus the 908 RTP sequence number offset will change each time a source is turned 909 on in a RTP session. 911 As the RTP sessions are independent the SSRC numbers used can be 912 handled independently also thus working around any SSRC collisions by 913 having remapping tables between the RTP sessions. This will result 914 that each endpoint may have a different view of the application usage 915 of a particular SSRC. Thus the application must not use SSRC as 916 references to RTP media streams when communicating with other peers 917 directly. 919 The mixer will also be responsible to act on any RTCP codec control 920 requests coming from an end-point and decide if it can act on it 921 locally or needs to translate the request into the RTP session that 922 contains the media source. Both end-points and the mixer will need 923 to implement conference related codec control functionalities to 924 provide a good experience. Full Intra Request to request from the 925 media source to provide switching points between the sources, 926 Temporary Maximum Media Bit-rate Request (TMMBR) to enable the mixer 927 to aggregate congestion control response towards the media source and 928 have it adjust its bit-rate in case the limitation is not in the 929 source to mixer link. 931 This version of the mixer also puts different requirements on the 932 end-point when it comes to decoder instances and handling of the RTP 933 media streams providing media. As each projected SSRC can at any 934 time provide media the end-point either needs to handle having thus 935 many allocated decoder instances or have efficient switching of 936 decoder contexts in a more limited set of actual decoder instances to 937 cope with the switches. The WebRTC application also gets more 938 responsibility to update how the media provides is to be presented to 939 the user. 941 Note, this could potentially be seen as a media translator which 942 include an on/off logic as part of its media translation. The main 943 difference would be a common global SSRC space in the case of the 944 Media Translator and the mapped one used in the above. 946 3.7. Point to Multipoint Using Video Switching MCUs 948 Shortcut name: Topo-Video-switch-MCU 949 +---+ +------------+ +---+ 950 | A |------| Multipoint |------| B | 951 +---+ | Control | +---+ 952 | Unit | 953 +---+ | (MCU) | +---+ 954 | C |------| |------| D | 955 +---+ +------------+ +---+ 957 Figure 13: Point to Multipoint Using a Video Switching MCU 959 This PtM topology is still deployed today, although the RTCP- 960 terminating MCUs, as discussed in the next section, are perhaps more 961 common. This topology, as well as the following one, reflect today's 962 lack of wide availability of IP multicast technologies, as well as 963 the simplicity of content switching when compared to content mixing. 964 The technology is commonly implemented in what is known as "Video 965 Switching MCUs". 967 A video switching MCU forwards to a participant a single media 968 stream, selected from the available streams. The criteria for 969 selection are often based on voice activity in the audio-visual 970 conference, but other conference management mechanisms (like 971 presentation mode or explicit floor control) are known to exist as 972 well. 974 The video switching MCU may also perform media translation to modify 975 the content in bit-rate, encoding, or resolution. However, it still 976 may indicate the original sender of the content through the SSRC. In 977 this case, the values of the CC and CSRC fields are retained. 979 If not terminating RTP, the RTCP Sender Reports are forwarded for the 980 currently selected sender. All RTCP Receiver Reports are freely 981 forwarded between the participants. In addition, the MCU may also 982 originate RTCP control traffic in order to control the session and/or 983 report on status from its viewpoint. 985 The video switching MCU has most of the attributes of a Translator. 986 However, its stream selection is a mixing behavior. This behavior 987 has some RTP and RTCP issues associated with it. The suppression of 988 all but one media stream results in most participants seeing only a 989 subset of the sent media streams at any given time, often a single 990 stream per conference. Therefore, RTCP Receiver Reports only report 991 on these streams. Consequently, the media senders that are not 992 currently forwarded receive a view of the session that indicates 993 their media streams disappear somewhere en route. This makes the use 994 of RTCP for congestion control, or any type of quality reporting, 995 very problematic. 997 To avoid the aforementioned issues, the MCU needs to implement two 998 features. First, it needs to act as a Mixer (see Section 3.5) and 999 forward the selected media stream under its own SSRC and with the 1000 appropriate CSRC values. Second, the MCU needs to modify the RTCP 1001 RRs it forwards between the domains. As a result, it is RECOMMENDED 1002 that one implement a centralized video switching conference using a 1003 Mixer according to RFC 3550, instead of the shortcut implementation 1004 described here. 1006 3.8. Point to Multipoint Using RTCP-Terminating MCU 1008 Shortcut name: Topo-RTCP-terminating-MCU 1010 +---+ +------------+ +---+ 1011 | A |<---->| Multipoint |<---->| B | 1012 +---+ | Control | +---+ 1013 | Unit | 1014 +---+ | (MCU) | +---+ 1015 | C |<---->| |<---->| D | 1016 +---+ +------------+ +---+ 1018 Figure 14: Point to Multipoint Using Content Modifying MCUs 1020 In this PtM scenario, each participant runs an RTP point-to-point 1021 session between itself and the MCU. This is a very commonly deployed 1022 topology in multipoint video conferencing. The content that the MCU 1023 provides to each participant is either: 1025 a. a selection of the content received from the other participants, 1026 or 1028 b. the mixed aggregate of what the MCU receives from the other PtP 1029 paths, which are part of the same conference session. 1031 In case a), the MCU may modify the content in bit-rate, encoding, or 1032 resolution. No explicit RTP mechanism is used to establish the 1033 relationship between the original media sender and the version the 1034 MCU sends. In other words, the outgoing sessions typically use a 1035 different SSRC, and may well use a different payload type (PT), even 1036 if this different PT happens to be mapped to the same media type. 1037 This is a result of the individually negotiated session for each 1038 participant. 1040 In case b), the MCU is the content source as it mixes the content and 1041 then encodes it for transmission to a participant. According to RTP 1042 [RFC3550], the SSRC of the contributors are to be signalled using the 1043 CSRC/CC mechanism. In practice, today, most deployed MCUs do not 1044 implement this feature. Instead, the identification of the 1045 participants whose content is included in the Mixer's output is not 1046 indicated through any explicit RTP mechanism. That is, most deployed 1047 MCUs set the CSRC Count (CC) field in the RTP header to zero, thereby 1048 indicating no available CSRC information, even if they could identify 1049 the content sources as suggested in RTP. 1051 The main feature that sets this topology apart from what RFC 3550 1052 describes is the breaking of the common RTP session across the 1053 centralized device, such as the MCU. This results in the loss of 1054 explicit RTP-level indication of all participants. If one were using 1055 the mechanisms available in RTP and RTCP to signal this explicitly, 1056 the topology would follow the approach of an RTP Mixer. The lack of 1057 explicit indication has at least the following potential problems: 1059 1. Loop detection cannot be performed on the RTP level. When 1060 carelessly connecting two misconfigured MCUs, a loop could be 1061 generated. 1063 2. There is no information about active media senders available in 1064 the RTP packet. As this information is missing, receivers cannot 1065 use it. It also deprives the client of information related to 1066 currently active senders in a machine-usable way, thus preventing 1067 clients from indicating currently active speakers in user 1068 interfaces, etc. 1070 Note that deployed MCUs (and endpoints) rely on signalling layer 1071 mechanisms for the identification of the contributing sources, for 1072 example, a SIP conferencing package [RFC4575]. This alleviates, to 1073 some extent, the aforementioned issues resulting from ignoring RTP's 1074 CSRC mechanism. 1076 As a result of the shortcomings of this topology, it is RECOMMENDED 1077 to instead implement the Mixer concept as specified by RFC 3550. 1079 3.9. De-composite Endpoint 1081 The implementation of an application may desire to send a subset of 1082 the application's data to each of multiple devices, each with their 1083 own network address. A very basic use case for this would be to 1084 separate audio and video processing for a particular endpoint, like a 1085 conference room, into one device handling the audio and another 1086 handling the video, being interconnected by some control functions 1087 allowing them to behave as a single endpoint in all aspects except 1088 for transport Figure 15. 1090 Which decomposition that is possible is highly dependent on the RTP 1091 session usage. It is not really feasible to decomposed one logical 1092 end-point into two different transport node in one RTP session. From 1093 a third party monitor of such an attempt the two entities would look 1094 like two different end-points with a CNAME collision. This put a 1095 requirement on that the only type of de-composited endpoint that RTP 1096 really supports is one where the different parts have separate RTP 1097 sessions to send and/or receive media streams intended for them. 1099 +---------------------+ 1100 | Endpoint A | 1101 | Local Area Network | 1102 | +------------+ | 1103 | +->| Audio |<+-RTP---\ 1104 | | +------------+ | \ +------+ 1105 | | +------------+ | +-->| | 1106 | +->| Video |<+-RTP-------->| B | 1107 | | +------------+ | +-->| | 1108 | | +------------+ | / +------+ 1109 | +->| Control |<+-SIP---/ 1110 | +------------+ | 1111 +---------------------+ 1113 Figure 15: De-composite End-Point 1115 In the above usage, let us assume that the RTP sessions are different 1116 for audio and video. The audio and video parts will use a common 1117 CNAME and also have a common clock to ensure that synchronisation and 1118 clock drift handling works despite the decomposition. Also the RTCP 1119 handling works correctly as long as only one part of the de-composite 1120 is part of each RTP session. That way any differences in the path 1121 between A's audio entity and B and A's video and B are related to 1122 different SSRCs in different RTP sessions. 1124 The requirements that can derived from the above usage is that the 1125 transport flows for each RTP session might be under common control 1126 but still go to what looks like different endpoints based on 1127 addresses and ports. This geometry cannot be accomplished using one 1128 RTP session, so in this case, multiple RTP sessions are needed. 1130 3.10. Non-Symmetric Mixer/Translators 1132 Shortcut name: Topo-Asymmetric 1134 It is theoretically possible to construct an MCU that is a Mixer in 1135 one direction and a Translator in another. The main reason to 1136 consider this would be to allow topologies similar to Figure 8, where 1137 the Mixer does not need to mix in the direction from B or D towards 1138 the multicast domains with A and C. Instead, the media streams from B 1139 and D are forwarded without changes. Avoiding this mixing would save 1140 media processing resources that perform the mixing in cases where it 1141 isn't needed. However, there would still be a need to mix B's stream 1142 towards D. Only in the direction B -> multicast domain or D -> 1143 multicast domain would it be possible to work as a Translator. In 1144 all other directions, it would function as a Mixer. 1146 The Mixer/Translator would still need to process and change the RTCP 1147 before forwarding it in the directions of B or D to the multicast 1148 domain. One issue is that A and C do not know about the mixed-media 1149 stream the Mixer sends to either B or D. Thus, any reports related to 1150 these streams must be removed. Also, receiver reports related to A 1151 and C's media stream would be missing. To avoid A and C thinking 1152 that B and D aren't receiving A and C at all, the Mixer needs to 1153 insert its Receiver Reports for the streams from A and C into B and 1154 D's Sender Reports. In the opposite direction, the Receiver Reports 1155 from A and C about B's and D's stream also need to be aggregated into 1156 the Mixer's Receiver Reports sent to B and D. Since B and D only have 1157 the Mixer as source for the stream, all RTCP from A and C must be 1158 suppressed by the Mixer. 1160 This topology is so problematic and it is so easy to get the RTCP 1161 processing wrong, that it is NOT RECOMMENDED to implement this 1162 topology. 1164 3.11. Combining Topologies 1166 Topologies can be combined and linked to each other using Mixers or 1167 Translators. However, care must be taken in handling the SSRC/CSRC 1168 space. A Mixer will not forward RTCP from sources in other domains, 1169 but will instead generate its own RTCP packets for each domain it 1170 mixes into, including the necessary Source Description (SDES) 1171 information for both the CSRCs and the SSRCs. Thus, in a mixed 1172 domain, the only SSRCs seen will be the ones present in the domain, 1173 while there can be CSRCs from all the domains connected together with 1174 a combination of Mixers and Translators. The combined SSRC and CSRC 1175 space is common over any Translator or Mixer. This is important to 1176 facilitate loop detection, something that is likely to be even more 1177 important in combined topologies due to the mixed behavior between 1178 the domains. Any hybrid, like the Topo-Video-switch-MCU or Topo- 1179 Asymmetric, requires considerable thought on how RTCP is dealt with. 1181 4. Comparing Topologies 1183 The topologies discussed in Section 3 have different properties. 1184 This section first lists these properties and then maps the different 1185 topologies to them. Please note that even if a certain property is 1186 supported within a particular topology concept, the necessary 1187 functionality may, in many cases, be optional to implement. 1189 Note: This section has not yet been updated with the new additions of 1190 topologies. 1192 4.1. Topology Properties 1194 4.1.1. All to All Media Transmission 1196 Multicast, at least Any Source Multicast (ASM), provides the 1197 functionality that everyone may send to, or receive from, everyone 1198 else within the session. MCUs, Mixers, and Translators may all 1199 provide that functionality at least on some basic level. However, 1200 there are some differences in which type of reachability they 1201 provide. 1203 The transport Translator function called "relay", in Section 3.4, is 1204 the one that provides the emulation of ASM that is closest to true 1205 IP-multicast-based, all to all transmission. Media Translators, 1206 Mixers, and the MCU variants do not provide a fully meshed forwarding 1207 on the transport level; instead, they only allow limited forwarding 1208 of content from the other session participants. 1210 The "all to all media transmission" requires that any media 1211 transmitting entity considers the path to the least capable receiver. 1212 Otherwise, the media transmissions may overload that path. 1213 Therefore, a media sender needs to monitor the path from itself to 1214 any of the participants, to detect the currently least capable 1215 receiver, and adapt its sending rate accordingly. As multiple 1216 participants may send simultaneously, the available resources may 1217 vary. RTCP's Receiver Reports help performing this monitoring, at 1218 least on a medium time scale. 1220 The transmission of RTCP automatically adapts to any changes in the 1221 number of participants due to the transmission algorithm, defined in 1222 the RTP specification [RFC3550], and the extensions in AVPF [RFC4585] 1223 (when applicable). That way, the resources utilized for RTCP stay 1224 within the bounds configured for the session. 1226 4.1.2. Transport or Media Interoperability 1228 Translators, Mixers, and RTCP-terminating MCU all allow changing the 1229 media encoding or the transport to other properties of the other 1230 domain, thereby providing extended interoperability in cases where 1231 the participants lack a common set of media codecs and/or transport 1232 protocols. 1234 4.1.3. Per Domain Bit-Rate Adaptation 1236 Participants are most likely to be connected to each other with a 1237 heterogeneous set of paths. This makes congestion control in a Point 1238 to Multipoint set problematic. For the ASM and "relay" scenario, 1239 each individual sender has to adapt to the receiver with the least 1240 capable path. This is no longer necessary when Media Translators, 1241 Mixers, or MCUs are involved, as each participant only needs to adapt 1242 to the slowest path within its own domain. The Translator, Mixer, or 1243 MCU topologies all require their respective outgoing streams to 1244 adjust the bit-rate, packet-rate, etc., to adapt to the least capable 1245 path in each of the other domains. That way one can avoid lowering 1246 the quality to the least-capable participant in all the domains at 1247 the cost (complexity, delay, equipment) of the Mixer or Translator. 1249 4.1.4. Aggregation of Media 1251 In the all to all media property mentioned above and provided by ASM, 1252 all simultaneous media transmissions share the available bit-rate. 1253 For participants with limited reception capabilities, this may result 1254 in a situation where even a minimal acceptable media quality cannot 1255 be accomplished. This is the result of multiple media streams 1256 needing to share the available resources. The solution to this 1257 problem is to provide for a Mixer or MCU to aggregate the multiple 1258 streams into a single one. This aggregation can be performed 1259 according to different methods. Mixing or selection are two common 1260 methods. 1262 4.1.5. View of All Session Participants 1264 The RTP protocol includes functionality to identify the session 1265 participants through the use of the SSRC and CSRC fields. In 1266 addition, it is capable of carrying some further identity information 1267 about these participants using the RTCP Source Descriptors (SDES). 1268 To maintain this functionality, it is necessary that RTCP is handled 1269 correctly in domain bridging function. This is specified for 1270 Translators and Mixers. The MCU described in Section 3.7 does not 1271 entirely fulfill this. The one described in Section 3.8 does not 1272 support this at all. 1274 4.1.6. Loop Detection 1276 In complex topologies with multiple interconnected domains, it is 1277 possible to form media loops. RTP and RTCP support detecting such 1278 loops, as long as the SSRC and CSRC identities are correctly set in 1279 forwarded packets. It is likely that loop detection works for the 1280 MCU, described in Section 3.7, at least as long as it forwards the 1281 RTCP between the participants. However, the MCU in Section 3.8 will 1282 definitely break the loop detection mechanism. 1284 4.2. Comparison of Topologies 1286 The table below attempts to summarize the properties of the different 1287 topologies. The legend to the topology abbreviations are: Topo- 1288 Point-to-Point (PtP), Topo-Multicast (Multic), Topo-Trns-Translator 1289 (TTrn), Topo-Media-Translator (including Transport Translator) 1290 (MTrn), Topo-Mixer (Mixer), Topo-Asymmetric (ASY), Topo-Video-switch- 1291 MCU (MCUs), and Topo-RTCP-terminating-MCU (MCUt). In the table 1292 below, Y indicates Yes or full support, N indicates No support, (Y) 1293 indicates partial support, and N/A indicates not applicable. 1295 Property PtP Multic TTrn MTrn Mixer ASY MCUs MCUt 1296 ------------------------------------------------------------------ 1297 All to All media N Y Y Y (Y) (Y) (Y) (Y) 1298 Interoperability N/A N Y Y Y Y N Y 1299 Per Domain Adaptation N/A N N Y Y Y N Y 1300 Aggregation of media N N N N Y (Y) Y Y 1301 Full Session View Y Y Y Y Y Y (Y) N 1302 Loop Detection Y Y Y Y Y Y (Y) N 1304 Please note that the Media Translator also includes the transport 1305 Translator functionality. 1307 5. Security Considerations 1309 The use of Mixers and Translators has impact on security and the 1310 security functions used. The primary issue is that both Mixers and 1311 Translators modify packets, thus preventing the use of integrity and 1312 source authentication, unless they are trusted devices that take part 1313 in the security context, e.g., the device can send Secure Realtime 1314 Transport Protocol (SRTP) and Secure Realtime Transport Control 1315 Protocol (SRTCP) [RFC3711] packets to session endpoints. If 1316 encryption is employed, the media Translator and Mixer need to be 1317 able to decrypt the media to perform its function. A transport 1318 Translator may be used without access to the encrypted payload in 1319 cases where it translates parts that are not included in the 1320 encryption and integrity protection, for example, IP address and UDP 1321 port numbers in a media stream using SRTP [RFC3711]. However, in 1322 general, the Translator or Mixer needs to be part of the signalling 1323 context and get the necessary security associations (e.g., SRTP 1324 crypto contexts) established with its RTP session participants. 1326 Including the Mixer and Translator in the security context allows the 1327 entity, if subverted or misbehaving, to perform a number of very 1328 serious attacks as it has full access. It can perform all the 1329 attacks possible (see RFC 3550 and any applicable profiles) as if the 1330 media session were not protected at all, while giving the impression 1331 to the session participants that they are protected. 1333 Transport Translators have no interactions with cryptography that 1334 works above the transport layer, such as SRTP, since that sort of 1335 Translator leaves the RTP header and payload unaltered. Media 1336 Translators, on the other hand, have strong interactions with 1337 cryptography, since they alter the RTP payload. A media Translator 1338 in a session that uses cryptographic protection needs to perform 1339 cryptographic processing to both inbound and outbound packets. 1341 A media Translator may need to use different cryptographic keys for 1342 the inbound and outbound processing. For SRTP, different keys are 1343 required, because an RFC 3550 media Translator leaves the SSRC 1344 unchanged during its packet processing, and SRTP key sharing is only 1345 allowed when distinct SSRCs can be used to protect distinct packet 1346 streams. 1348 When the media Translator uses different keys to process inbound and 1349 outbound packets, each session participant needs to be provided with 1350 the appropriate key, depending on whether they are listening to the 1351 Translator or the original source. (Note that there is an 1352 architectural difference between RTP media translation, in which 1353 participants can rely on the RTP Payload Type field of a packet to 1354 determine appropriate processing, and cryptographically protected 1355 media translation, in which participants must use information that is 1356 not carried in the packet.) 1358 When using security mechanisms with Translators and Mixers, it is 1359 possible that the Translator or Mixer could create different security 1360 associations for the different domains they are working in. Doing so 1361 has some implications: 1363 First, it might weaken security if the Mixer/Translator accepts a 1364 weaker algorithm or key in one domain than in another. Therefore, 1365 care should be taken that appropriately strong security parameters 1366 are negotiated in all domains. In many cases, "appropriate" 1367 translates to "similar" strength. If a key management system does 1368 allow the negotiation of security parameters resulting in a different 1369 strength of the security, then this system SHOULD notify the 1370 participants in the other domains about this. 1372 Second, the number of crypto contexts (keys and security related 1373 state) needed (for example, in SRTP [RFC3711]) may vary between 1374 Mixers and Translators. A Mixer normally needs to represent only a 1375 single SSRC per domain and therefore needs to create only one 1376 security association (SRTP crypto context) per domain. In contrast, 1377 a Translator needs one security association per participant it 1378 translates towards, in the opposite domain. Considering Figure 5, 1379 the Translator needs two security associations towards the multicast 1380 domain, one for B and one for D. It may be forced to maintain a set 1381 of totally independent security associations between itself and B and 1382 D respectively, so as to avoid two-time pad occurrences. These 1383 contexts must also be capable of handling all the sources present in 1384 the other domains. Hence, using completely independent security 1385 associations (for certain keying mechanisms) may force a Translator 1386 to handle N*DM keys and related state; where N is the total number of 1387 SSRCs used over all domains and DM is the total number of domains. 1389 There exist a number of different mechanisms to provide keys to the 1390 different participants. One example is the choice between group keys 1391 and unique keys per SSRC. The appropriate keying model is impacted 1392 by the topologies one intends to use. The final security properties 1393 are dependent on both the topologies in use and the keying 1394 mechanisms' properties, and need to be considered by the application. 1395 Exactly which mechanisms are used is outside of the scope of this 1396 document. Please review RTP Security Options 1397 [I-D.ietf-avtcore-rtp-security-options] to get a better understanding 1398 of most of the available options. 1400 6. IANA Considerations 1402 This document makes no request of IANA. 1404 Note to RFC Editor: this section may be removed on publication as an 1405 RFC. 1407 7. Acknowledgements 1409 The authors would like to thank Bo Burman, Umesh Chandra, Roni Even, 1410 Keith Lantz, Ladan Gharai, Geoff Hunt, and Mark Baugher for their 1411 help in reviewing this document. 1413 8. References 1415 8.1. Normative References 1417 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1418 Requirement Levels", BCP 14, RFC 2119, March 1997. 1420 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 1421 Jacobson, "RTP: A Transport Protocol for Real-Time 1422 Applications", STD 64, RFC 3550, July 2003. 1424 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 1425 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 1426 RFC 3711, March 2004. 1428 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session 1429 Initiation Protocol (SIP) Event Package for Conference 1430 State", RFC 4575, August 2006. 1432 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 1433 "Extended RTP Profile for Real-time Transport Control 1434 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, 1435 July 2006. 1437 8.2. Informative References 1439 [H323] ITU-T Recommendation H.323, "Packet-based multimedia 1440 communications systems", June 2006. 1442 [I-D.ietf-avtcore-rtp-security-options] 1443 Westerlund, M. and C. Perkins, "Options for Securing RTP 1444 Sessions", draft-ietf-avtcore-rtp-security-options-00 1445 (work in progress), July 2012. 1447 [RFC4607] Holbrook, H. and B. Cain, "Source-Specific Multicast for 1448 IP", RFC 4607, August 2006. 1450 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 1451 "Codec Control Messages in the RTP Audio-Visual Profile 1452 with Feedback (AVPF)", RFC 5104, February 2008. 1454 [RFC5760] Ott, J., Chesterfield, J., and E. Schooler, "RTP Control 1455 Protocol (RTCP) Extensions for Single-Source Multicast 1456 Sessions with Unicast Feedback", RFC 5760, February 2010. 1458 [RFC6285] Ver Steeg, B., Begen, A., Van Caenegem, T., and Z. Vax, 1459 "Unicast-Based Rapid Acquisition of Multicast RTP 1460 Sessions", RFC 6285, June 2011. 1462 [RFC6465] Ivov, E., Marocco, E., and J. Lennox, "A Real-time 1463 Transport Protocol (RTP) Header Extension for Mixer-to- 1464 Client Audio Level Indication", RFC 6465, December 2011. 1466 Authors' Addresses 1468 Magnus Westerlund 1469 Ericsson 1470 Farogatan 6 1471 SE-164 80 Kista 1472 Sweden 1474 Phone: +46 10 714 82 87 1475 Email: magnus.westerlund@ericsson.com 1477 Stephan Wenger 1478 Vidyo 1479 433 Hackensack Ave 1480 Hackensack, NJ 07601 1481 USA 1483 Email: stewe@stewe.org