idnits 2.17.1 draft-ietf-avt-topologies-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 945. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 956. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 963. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 969. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == The document seems to use 'NOT RECOMMENDED' as an RFC 2119 keyword, but does not include the phrase in its RFC 2119 key words list. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RTCP-SSM' is mentioned on line 215, but not defined == Outdated reference: A later version (-10) exists of draft-ietf-avt-avpf-ccm-08 Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Magnus Westerlund 3 INTERNET-DRAFT Ericsson 4 Expires: April 2008 Stephan Wenger 5 Intended Status: Informational Nokia 7 26 October, 2007 9 RTP Topologies 10 draft-ietf-avt-topologies-07.txt> 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six 25 months and may be updated, replaced, or obsoleted by other 26 documents at any time. It is inappropriate to use Internet-Drafts 27 as reference material or to cite them other than as "work in 28 progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/ietf/1id-abstracts.txt. 33 The list of Internet-Draft Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2007). 40 Abstract 42 This document discusses multi-endpoint topologies used in Real-time 43 Transport Protocol (RTP)-based environments. In particular, 44 centralized topologies commonly employed in the video conferencing 45 industry are mapped to the RTP terminology. 47 TABLE OF CONTENTS 49 1. Introduction....................................................3 50 2. Definitions.....................................................3 51 2.1. Glossary...................................................3 52 2.2. Indicating Requirement leves...............................3 53 3. Topologies......................................................4 54 3.1. Point to Point.............................................4 55 3.2. Point to Multi-point using Multicast.......................5 56 3.3. Point to Multipoint using the RFC 3550 translator..........6 57 3.4. Point to Multipoint using the RFC 3550 mixer model.........9 58 3.5. Point to Multipoint using video switching MCU.............12 59 3.6. Point to Multipoint using RTCP-terminating MCU............13 60 3.7. Non-Symmetric Mixer/Translators...........................14 61 3.8. Combining Topologies......................................15 62 4. Comparing Topologies...........................................15 63 4.1. Topology Proporties.......................................16 64 4.1.1. All to All media transmission........................16 65 4.1.2. Transport or Media Interoperability..................16 66 4.1.3. Per Domain Bit-rate Adaptation.......................16 67 4.1.4. Aggregation of Media.................................17 68 4.1.5. View of all session participants.....................17 69 4.1.6. Loop Detection.......................................17 70 4.2. Comparision of topologies.................................18 71 5. Security Considerations........................................18 72 6. Acknowledgements...............................................20 73 7. IANA Considerations............................................20 74 8. References.....................................................21 75 8.1. Normative References......................................21 76 8.2. Informative References....................................21 77 9. Authors' Addresses.............................................22 78 1. Introduction 80 When working on the Codec Control Messages [CCM] considerable 81 confusion was noticed in the community with respect to terms such 82 as Multipoint Control Unit (MCU), mixer, and translator, and their 83 usage in various topologies. This document tries to address this 84 confusion by providing a common information basis for future 85 discussion and specification work. It attempts to clarify and 86 explain sections of the Real-time Transport Protocol (RTP) spec 87 [RFC3550] in an informal way. It is not intended to update or 88 change what is normatively specified within RFC 3550. 90 When the Audio-Visual Profile with Feedback (AVPF) [RFC4585] was 91 developed the main emphasis lay in the efficient support of point- 92 to-point and small multipoint scenarios without centralized 93 multipoint control. However, in practice, many small multipoint 94 conferences operate utilizing devices known as Multipoint Control 95 Units (MCUs). MCUs may implement mixers and translators (in RTP 96 [RFC3550] terminology), but also signalling support. They may also 97 contain additional application functionality. This document 98 focuses on the media transport aspects of the MCU that can be 99 realized using RTP, as discussed below. Further considered are the 100 properties of mixers and translators, and how some types of 101 deployed MCUs deviate from these properties. 103 2. Definitions 105 2.1. Glossary 107 ASM - Any Source Multicast 108 AVPF - The Extended RTP Profile for RTCP-based Feedback 109 CSRC - Contributing Source 110 Link - The data transport to the next IP hop 111 MCU - Multipoint Control Unit 112 Path - The concatenation of multiple links, resulting in a end- 113 to-end data transfer. 114 PtM - Point to Multipoint 115 PtP - Point to Point 116 SSM - Source-Specific Multicast 117 SSRC - Synchronization Source 119 2.2. Indicating Requirement leves 121 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 122 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 123 "OPTIONAL" in this document are to be interpreted as described in 124 RFC 2119 [RFC2119]. 126 The RFC 2119 language is used in this document to highlight those 127 important requirements and/or resulting solutions that are 128 necessary to address the issues raised in this document. 130 3. Topologies 132 This subsection defines several basic topologies that are relevant 133 for codec control. The first four relate to the RTP system model 134 utilizing multicast and/or unicast, as envisioned in RFC 3550. The 135 last two topologies, in contrast, describe the deployed system 136 models as used in many H.323 [H323] video conferences, where both 137 the media streams and the RTP Control Protocol (RTCP) control 138 traffic terminate at the MCU. In these two cases, the media sender 139 does not receive the (unmodified or translator-modified) receiver 140 reports from all sources (which it needs to interprete based on 141 Synchronization Source (SSRC) values), and therefore has no full 142 information about all the endpoint's situation as reported in RTCP- 143 Receiver Reports (RR)s. More topologies can be constructed by 144 combining any of the models; see Section 3.8. 146 The topologies may be referenced in other documents by a shortcut 147 name, indicated by the prefix "Topo-". 149 For each of the RTP defined topologies, we discuss how RTP, RTCP, 150 and the carried media are handled. With respect to RTCP, we also 151 introduce the handling of RTCP feedback message as defined in 152 [RFC4585] and [CCM]. Any important differences between the two will 153 be illuminated in the discussion. 155 3.1. Point to Point 157 Shortcut name: Topo-Point-to-Point 159 The Point to Point (PtP) topology (Figure 1) consists of two end- 160 points, communicating using unicast. Both RTP and RTCP traffic are 161 conveyed endpoint-to-endpoint, using unicast traffic only (even if- 162 --in exotic cases---this unicast traffic happens to be conveyed 163 over an IP-multicast address). 165 +---+ +---+ 166 | A |<------->| B | 167 +---+ +---+ 169 Figure 1 - Point to Point 171 The main property of this topology is that A sends to B and only B, 172 while B sends to A and only A. This avoids all complexities of 173 handling multiple endpoints and combining the requirements from 174 them. Note that an endpoint can still use multiple RTP 175 Synchronization Sources (SSRCs) in an RTP session. 177 RTCP feedback messages for the indicated SSRCs are communicated 178 directly between the endpoints. Therefore, this topology poses 179 minimal (if any) issues for any feedback messages. 181 3.2. Point to Multi-point using Multicast 183 Shortcut name: Topo-Multicast 185 +-----+ 186 +---+ / \ +---+ 187 | A |----/ \---| B | 188 +---+ / Multi- \ +---+ 189 + Cast + 190 +---+ \ Network / +---+ 191 | C |----\ /---| D | 192 +---+ \ / +---+ 193 +-----+ 195 Figure 2 - Point to Multipoint using Multicast 197 Point to Multipoint (PtM) is defined here as using a multicast 198 topology as a transmission model, in which traffic from any 199 participant reaches all the other participants, except for cases 200 such as 201 o packet loss, or 202 o a participant does not wish to receive the traffic for a 203 specific multicast group, and therefore has not subscribed to 204 the IP multicast group in question. This is for the cases 205 where a multi-media session is distributed using two or more 206 multicast groups. 208 In the above context, "traffic" encompasses both RTP and RTCP 209 traffic. The number of participants can vary between one and many- 210 --as RTP and RTCP scales to very large multicast groups (the 211 theoretical limit of the number of participants in a single RTP 212 session is approximately two billion). The above can be realized 213 using Any Source Multicast (ASM). Source-Specific Multicast (SSM) 214 may be also be used with RTP. However, then only the designated 215 source may reach all receivers. Please review [RTCP-SSM] for how 216 RTCP can be made to work in combination with SSM. 218 This draft is primarily interested in that subset of multicast 219 sessions wherein the number of participants in the multicast group 220 is so low that it allows the participants to use early or immediate 221 feedback, as defined in AVPF [RFC4585]. This document refers to 222 those groups as "small multicast groups". 224 RTCP feedback messages in multicast will, like media, reach 225 everyone (subject to packet losses and multicast group 226 subscription). Therefore, the feedback suppression mechanism 227 discussed in [RFC4585] is required. Each individual node needs to 228 process every feedback message it receives, to determine if it is 229 affected, or if the feedback message applies only to some other 230 participant. 232 3.3. Point to Multipoint using the RFC 3550 translator 234 Shortcut name: Topo-Translator 236 Two main categories of Translators can be distinguished. 238 Transport Translators (Topo-Trn-Translator) do not modify the media 239 stream itself, but are concerned with transport parameters. 240 Transport parameters, in the sense of this section, comprise the 241 transport addresses (to bridge different domains) and the media 242 packetization to allow other transport protocols to be 243 interconnected to a session (in gateways). Of the transport 244 translators this memo is primarily interested in those which use 245 RTP on both sides, and this is assumed henceforth. Translators 246 that bridge between different protocol worlds need to be concerned 247 about the mapping of the SSRC/CSRC (Contributing Source) concept to 248 the non-RTP protocol. When designing a translator to a non-RTP- 249 based media transport one crucial factor consists in how to handle 250 different sources and their identity. This problem space is not 251 discussed henceforth. 253 Media Translators (Topo-Media-Translator), in contrast, modify the 254 media stream itself. This process is commonly known as 255 transcoding. The modification of the media stream can be as small 256 as removing parts of the stream, and can go all the way to a full 257 transcoding (down to the sample level or equivalent) utilizing a 258 different media codec. Media translators are commonly used to 259 connect entities without a common interoperability point. 261 Stand-alone Media Translators are rare. Most commonly, a 262 combination of Transport and Media Translators are used to 263 translate both the media stream and the transport aspects of a 264 stream between two transport domains (or clouds). 266 Both Translator types share common attributes that separate them 267 from mixers. For each media stream that the translator receives, 268 it generates an individual stream in the other domain. A 269 translator always keeps the SSRC for a stream across the 270 translation, where a mixer can select a media stream, or send them 271 out mixed, allways under its own SSRC, using the CSRC field to 272 indicate the source(s) of the content. 274 The RTCP translation process can be trivial---for example, when 275 Transport translators just need to adjust IP addresses---or can be 276 quite complex in the case of media translators. See section 7.2 of 277 [RFC3550]. 279 +-----+ 280 +---+ / \ +------------+ +---+ 281 | A |<---/ \ | |<---->| B | 282 +---+ / Multi- \ | | +---+ 283 + Cast +->| Translator | 284 +---+ \ Network / | | +---+ 285 | C |<---\ / | |<---->| D | 286 +---+ \ / +------------+ +---+ 287 +-----+ 289 Figure 3 - Point to Multipoint using a Translator 291 Figure 3 depicts an example of a Transport Translator performing at 292 least IP address translation. It allows the (non multicast- 293 capable) participants B and D to take part in a multicast session 294 by having the translator forward their unicast traffic to the 295 multicast addresses in use, and vice versa. It must also forward 296 B's traffic to D and vice versa, to provide each of B and D with a 297 complete view of the session. 299 If B were behind a limited network path, the translator may perform 300 media transcoding to allow the traffic received from the other 301 participants to reach B without overloading the path. 303 When, in the example depicted in Figure 3, the translator acts only 304 as a Transport Translator, then the RTCP traffic can simply be 305 forwarded, similar to the media traffic. However, when media 306 translation occurs, the translator's task becomes substantially 307 more complex, even with respect to the RTCP traffic. In this case, 308 the translator needs to rewrite B's RTCP receiver report, before 309 forwarding them to D and the multicast network. The rewriting is 310 needed as the stream received by B is not the same stream as the 311 other participants receive. For example, the number of packets 312 transmitted to B may be lower than what D receives, due to the 313 different media format. Therefore, if the receiver reports were 314 forwarded without changes, the extended highest sequence number 315 would indicate that B were substantially behind in reception--- 316 while it most likely it would not be. Therefore, the translator 317 must translate that number to a corresponding sequence number for 318 the stream the translator received. Similar arguments can be made 319 for most other fields in the RTCP receiver reports. 321 As specified in Section 7.1 of [RFC3550], the SSRC space is common 322 for all participants in the session, independent of which side they 323 are of the translator. Therefore, it is the responsibility of the 324 participants to run SSRC collision detection, and the SSRC is a 325 field the translator should not change. 327 +---+ +------------+ +---+ 328 | A |<---->| |<---->| B | 329 +---+ | | +---+ 330 | Translator | 331 +---+ | | +---+ 332 | C |<---->| |<---->| D | 333 +---+ +------------+ +---+ 335 Figure 4 - RTP Translator (relay) with only unicast paths 337 Another translator scenario is depicted in Figure 4. Herein, the 338 translator connects multiple users of a conference through unicast. 339 This can be implemented using a very simple transport translator, 340 which in this document is called a relay. The relay forwards all 341 traffic it receives, both RTP and RTCP, to all other participants. 342 In doing so, a multicast network is emulated without relying on a 343 multicast-capable network infrastructure. 345 A translator normally does not use an SSRC of its own, and is not 346 visible as an active participant in the session. One exception can 347 be conceived when it acts as a quality monitor that sends RTCP 348 reports, and therefore is required to have an SSRC. Another 349 example is the case when a translator is prepared to use RTCP 350 feedback messages. This may, for example, occur when it suffers 351 packet loss of important video packets and wants to trigger repair 352 by the media sender, by sending feedback messages. To be able to 353 do this it needs to have a unique SSRC. 355 A media translator may in some cases act on behalf of the "real" 356 source and respond to RTCP feedback messages. This may occur, for 357 example, when a receiver requests a bandwidth reduction and the 358 media translator has not detected any congestion or other reasons 359 for bandwidth reduction between the media source and itself. In 360 that case, it is sensible that the media translator reacts to the 361 codec control messages itself, for example by transcoding to a 362 lower media rate. If it were not reacting, the media quality in 363 the media sender's domain may suffer, as a result of the media 364 sender adjusting its media rate (and quality) according to the 365 needs of the slow past-translator endpoint, at the expense of the 366 rate and quality of all other session participants. 368 In general, a translator implementation should consider which RTCP 369 feedback messages or codec control messages it needs to understand 370 in relation to the functionality of the translator itself. This is 371 completely in line with the requirement to translate also RTCP 372 messages between the domains. 373 3.4. Point to Multipoint using the RFC 3550 mixer model 375 Shortcut name: Topo-Mixer 377 A mixer is a middlebox that aggregates multiple RTP streams that 378 are part of a session, by mixing the media data and generating a 379 new RTP stream. One common application for a mixer is to allow a 380 participant to receive a session with a reduced amount of 381 resources. 383 +-----+ 384 +---+ / \ +-----------+ +---+ 385 | A |<---/ \ | |<---->| B | 386 +---+ / Multi- \ | | +---+ 387 + Cast +->| Mixer | 388 +---+ \ Network / | | +---+ 389 | C |<---\ / | |<---->| D | 390 +---+ \ / +-----------+ +---+ 391 +-----+ 393 Figure 5 - Point to Multipoint using RFC 3550 mixer model 395 A mixer can be viewed as a device terminating the media streams 396 received from other session participants. Using the media data 397 from the received media streams, a mixer generates a media stream 398 that is sent to the session participant. 400 The content that the mixer provides is the mixed aggregate of what 401 the mixer receives over the PtP or PtM paths, which are part of the 402 same conference session. 404 The mixer is the content source, as it mixes the content (often in 405 the uncompressed domain) and then encodes it for transmission to a 406 participant. The CC and CSRC fields in the RTP header are used to 407 indicate the contributors of to the newly generated stream. The 408 SSRCs of the to-be-mixed streams on the mixer input appear as the 409 CSRCs at the mixer output. That output stream uses a unique SSRC 410 that identifies the Mixer's stream. The CSRC are forwarded between 411 the two domains to allow for loop detection and identification of 412 sources that are part of the global session. Note that Section 7.1 413 of RFC 3550 requires the SSRC space to be shared between domains 414 for these reasons. 416 The mixer is responsible for generating RTCP packets in accordance 417 with its role. It is a receiver and should therefore send reception 418 reports for the media streams it receives. In its role as a media 419 sender, it should also generate sender reports for those media 420 streams sent. As specified in Section 7.3 of RFC 3550, a mixer 421 must not forward RTCP unaltered between the two domains. 423 The mixer depicted in 425 Figure 5 is involved in three domains that need to be separated: 426 the multicast network, participant B and participant D. The Mixer 427 produces different mixed streams to B and D, as the one to B may 428 contain content received from D and vice versa. However, the mixer 429 only needs one SSRC in each domain that is the receiving entity and 430 transmitter of mixed content. 432 In the multicast domain, a mixer still needs to provide a mixed 433 view of the other domains. This makes the mixer simpler to 434 implement and avoids any issues with advanced RTCP handling or loop 435 detection, which would be problematic if the mixer were providing 436 non-symmetric behavior. Please see Section 3.7 for more discussion 437 on this topic. 439 A mixer is responsible for receiving RTCP feedback messages and 440 handling them appropriately. The definition of "appropriate" 441 depends on the message itself and the context. In some cases, the 442 reception of a codec control message may result in the generation 443 and transmission of RTCP feedback messages by the mixer to the 444 participants in the other domain. In other cases, a message is 445 handled by the mixer itself and therefore not forwarded to any 446 other domain. 448 When replacing the multicast network in 450 Figure 5 (to the left of the mixer) with individual unicast paths 451 as depicted in Figure 6, the mixer model is very similar to the one 452 discussed in section 3.6 below. Please see the discussion in 3.6 453 about the differences between these two models. 455 +---+ +------------+ +---+ 456 | A |<---->| |<---->| B | 457 +---+ | | +---+ 458 | Mixer | 459 +---+ | | +---+ 460 | C |<---->| |<---->| D | 461 +---+ +------------+ +---+ 463 Figure 6 - RTP Mixer with only unicast paths 465 3.5. Point to Multipoint using video switching MCU 467 Shortcut name: Topo-Video-switch-MCU 469 +---+ +------------+ +---+ 470 | A |------| Multipoint |------| B | 471 +---+ | Control | +---+ 472 | Unit | 473 +---+ | (MCU) | +---+ 474 | C |------| |------| D | 475 +---+ +------------+ +---+ 477 Figure 7 - Point to Multipoint using relaying MCU 479 This PtM topology is still deployed today, although the RTCP- 480 terminating MCUs, as discussed in the next section, are perhaps 481 more common. This topology, as well as the following one, reflect 482 today's lack of wide availability of IP multicast technologies, as 483 well as the simplicity of content switching when compared to 484 content mixing. The technology is commonly implemented in what is 485 known as "Video Switching MCUs". 487 A video switching MCU forwards to a participant a single media 488 stream, selected from the available streams. The criteria for 489 selection are often based on voice activity in the audio-visual 490 conference, but other conference management mechanisms (like 491 presentation mode or explicit floor control) are known to exist as 492 well. 494 The video switching MCU may also perform media translation, to 495 modify the content in bit-rate, encoding, or resolution. However 496 it still may indicate the original sender of the content through 497 the SSRC. In this case the values of the CC and CSRC fields are 498 retained. 500 If not terminating RTP, the RTCP Sender Reports are forwarded for 501 the currently selected sender. All RTCP receiver reports are freely 502 forwarded between the participants. In addition, the MCU may also 503 originate RTCP control traffic in order to control the session 504 and/or report on status from its viewpoint. 506 The video switching MCU has mostly the attributes of a translator. 507 However, its stream selection is a mixing behavior. This behavior 508 has some RTP and RTCP issues associated with it. The suppression 509 of all but one media stream results in most participants seeing 510 only a subset of the sent media streams at any given time -- often 511 a single stream per conference. Therefore, RTCP receiver reports 512 only report on these streams. Consequently, the media senders that 513 are not currently forwarded receive a view of the session that 514 indicates their media streams disappear somewhere en route. This 515 makes the use of RTCP for congestion control or any type of quality 516 reporting very problematic. 518 To avoid the aforementioned issues, the MCU needs to implement two 519 features. First it needs to act as a mixer (see section 3.4) and 520 forward the selected media stream under its own SSRC and with the 521 appropriate CSRC values. Second, the MCU needs to modify the RTCP 522 RRs it forwards between the domains. As a result, it is 523 RECOMMENDED that one implement a centralized video switching 524 conference using a Mixer according to RFC 3550, instead of the 525 shortcut implementation described here. 527 3.6. Point to Multipoint using RTCP-terminating MCU 529 Shortcut name: Topo-RTCP-terminating-MCU 531 +---+ +------------+ +---+ 532 | A |<---->| Multipoint |<---->| B | 533 +---+ | Control | +---+ 534 | Unit | 535 +---+ | (MCU) | +---+ 536 | C |<---->| |<---->| D | 537 +---+ +------------+ +---+ 539 Figure 8 - Point to Multipoint using content modifying MCU 541 In this PtM scenario, each participant runs an RTP point-to-point 542 session between itself and the MCU. This is a very commonly 543 deployed topology in multipoint video conferencing. The content 544 that the MCU provides to each participant is either: 546 a) a selection of the content received from the other 547 participants, or 549 b) the mixed aggregate of what the MCU receives from the other 550 PtP paths, which are part of the same conference session. 552 In case a) the MCU may modify the content in bit-rate, encoding, or 553 resolution. No explicit RTP mechanism is used to establish the 554 relationship between the original media sender and the version the 555 MCU sends. In other words, the outgoing sessions typically uses a 556 different SSRC, and may well use a different payload type (PT), 557 even if this different PT happens to be mapped to the same media 558 type. This is a result of the session to each participant is 559 negotiated individually. 561 In case b) the MCU is the content source as it mixes the content 562 and then encodes it for transmission to a participant. According to 563 RTP [RFC3550], the SSRC of the contributors are to be signalled 564 using the CSRC/CC mechanism. In practice, today, most deployed 565 MCUs do not implement this feature. Instead, the identification of 566 the participants whose content is included in the mixer's output is 567 not indicated through any explicit RTP mechanism. That is, most 568 deployed MCUs set the CSRC Count (CC) field in the RTP header to 569 zero, thereby indicating no available CSRC information, even if 570 they could identify the content sources as suggested in RTP. 572 The main feature that sets this topology apart from what RFC 3550 573 describes is the breaking of the common RTP session across the 574 centralized device, such as the MCU. This results in the loss of 575 explicit RTP level indication of all participants. If one were 576 using the mechanisms available in RTP and RTCP to signal this 577 explicitly, the topology would follow the approach of an RTP mixer. 578 The lack of explicit indication has at least the following 579 potential problems: 581 1) Loop detection cannot be performed on the RTP level. When 582 carelessly connecting two misconfigured MCUs, a loop could be 583 generated. 584 2) There is no information about active media senders available 585 in the RTP packet. As this information is missing, receivers 586 cannot use it. It also deprives the client of information 587 related to currently active senders in a machine-usable way, 588 thus preventing clients from indicating currently active 589 speakers in user interfaces, etc. 591 Note, that deployed MCUs (and endpoints) rely on signalling layer 592 mechanisms for the identification of the contributing sources -- 593 for example, a SIP conferencing package [RFC4575]. This alleviates 594 to some extent the aforementioned issues resulting from ignoring 595 RTP's CSRC mechanism. 597 As a result of the shortcomings of this topology it is RECOMMENDED 598 to instead implement the Mixer concept as specified by RFC 3550. 600 3.7. Non-Symmetric Mixer/Translators 602 Shortcut name: Topo-Asymmetric 604 It is theoretically possible to construct an MCU that is a mixer in 605 one direction and a translator in another. The main reason to 606 consider this would be to allow topologies similar to Figure 5, 607 where the mixer does not need to mix in the direction from B or D 608 towards the multicast domains with A and C. Instead, the media 609 streams from B and D are forwarded without changes. Avoiding this 610 mixing would save media processing resources that perform the 611 mixing in cases where it isn't needed. However, there would still 612 be a need to mix B's stream towards D. Only in the direction B -> 613 multicast domain or D -> multicast domain would it be possible to 614 work as a translator. In all other directions it would function as 615 a mixer. 617 The mixer/translator would still need to process and change the 618 RTCP before forwarding it in the directions of B or D to the 619 multicast domain. One issue is that A and C does not know about the 620 mixed media stream the mixer sends to either B or D. Thus, any 621 reports related to these streams must be removed. Also receiver 622 reports related to A and C media stream would be missing. To avoid 623 A and C thinking that B and D aren't receiving A and C at all, the 624 mixer needs to insert its receiver reports for the streams from A 625 and C into B's and D's Sender Reports. In the opposite direction 626 the receiver reports from A and C about B's and D's stream also 627 need to be aggregated into the mixer's receiver reports sent to B 628 and D. This as B and D only has the mixer as source for the stream, 629 all RTCP from A and C must be suppressed by the mixer. 631 This topology is so problematic and it is so easy to get the RTCP 632 processing wrong, that it is NOT RECOMMENDED to implement this 633 topology. 635 3.8. Combining Topologies 637 Topologies can be combined and linked to each other using mixers or 638 translators. However, care must be taken in handling the SSRC/CSRC 639 space. A mixer will not forward RTCP from sources in other domains, 640 but will instead generate its own RTCP packets for each domain it 641 mixes into, including the necessary Source Description (SDES) 642 information for both the CSRCs and the SSRCs. Thus, in a mixed 643 domain the only SSRCs seen will be the ones present in the domain, 644 while there can be CSRCs from all the domains connected together 645 with a combination of mixers and translators. The combined SSRC and 646 CSRC space is common over any translator or mixer. This is 647 important to facilitate loop detection -- something that is likely 648 to be even more important in combined topologies due to the mixed 649 behavior between the domains. Any hybrid, like the Topo-Video- 650 switch-MCU or Topo-Asymmetric, requires considerable thought on how 651 RTCP is dealt with. 653 4. Comparing Topologies 655 The topologies discussed in section 3 have different properties. 656 This section first lists these properties and then maps the 657 different topologies to them. Please note that even if a certain 658 property is supported within a particular topology concept, the 659 necessary functionality may in many cases be optional to implement. 661 4.1. Topology Properties 663 4.1.1. All to All media transmission 665 Multicast, at least Any Source Multicast (ASM), provides the 666 functionality that everyone may send to, or receive from, everyone 667 else within the session. MCUs, Mixers and Translators may all 668 provide that functionality at least on some basic level. However 669 there are some differences in what type of reachability they 670 provide. 672 The transport translator function called "relay" in Section 3.3 is 673 the one that provides the emulation of ASM that is closest to true 674 IP-multicast-based, all-to-all transmission. Media Translators, 675 Mixers and the MCU variants do not provide a fully meshed 676 forwarding on the transport level, instead they only allow limited 677 forwarding of content from the other session participants. 679 The "all to all media transmission" requires that any media 680 transmitting entity considers the path to the least capable 681 receiver. Otherwise, the media transmissions may overload that 682 path. Therefore, a media sender needs to monitor the path from 683 itself to any of the participants, to detect the least capable 684 receiver at this time instance, and adapt its sending rate 685 accordingly. As multiple participants may send simultaneously, the 686 available resources may vary. RTCP's Receiver Reports help 687 performing this monitoring, at least on a medium time scale. 689 The transmission of RTCP automatically adapts to any changes in the 690 number of participants due to the transmission algorithm defined in 691 the RTP specification [RFC3550], and the extensions in AVPF 692 [RFC4585] (when applicable). That way, the resources utilized for 693 RTCP stay within the bounds configured for the session. 695 4.1.2. Transport or Media Interoperability 697 Translators, Mixers and RTCP-terminating MCU all allow changing the 698 media encoding or the transport to other properties of the other 699 domain, thereby providing extended interoperability in cases where 700 the participants lack a common set of media codecs and/or transport 701 protocols. 703 4.1.3. Per Domain Bit-rate Adaptation 704 Participants are most likely to be connected to each other with a 705 heterogenous set of paths. This makes congestion control in a point 706 to multi-point set problematic. For the ASM and "relay" scenario, 707 each individual sender has to adapt to the receiver with the least 708 capable path. This is no longer necessary when Media Translators, 709 Mixers or MCUs are involved, as each participant only needs to 710 adapt to the slowest path within its own domain. The Translator, 711 Mixer or MCU topologies all require their respective outgoing 712 streams to adjust the bit-rate, packet rate, etc, to adapt to the 713 least capable path in each of the other domains. That way one can 714 avoid lowering the quality to least capable participant in all the 715 domains, at the cost (complexity, delay, equipment) of the Mixer or 716 Translator. 718 4.1.4. Aggregation of Media 720 In the all-to-all-media property mentioned above and provided by 721 ASM, all simultaneous media transmissions share the available bit- 722 rate. For participants with limited reception capabilities this may 723 result in a situation where even a minimal acceptable media quality 724 cannot be accomplished. This is the result of multiple media 725 streams need to share the available resources. The solution to this 726 problem is to provide for a mixer or MCU to aggregate the multiple 727 streams into a single one. This aggregation can be performed 728 according to different methods. Mixing or selection are two 729 common methods. 731 4.1.5. View of all session participants 733 The RTP protocol includes functionality to identify the session 734 participants through the use of the SSRC and CSRC fields. In 735 addition, it is capable of carrying some further identity 736 information about these participants using the RTCP Source 737 Descriptors (SDES). To maintain this functionality, it is necessary 738 that RTCP is handled correctly in domain bridging function. This is 739 specified for translators and mixers. The MCU described in Section 740 3.5 does not fully fulfill this. The one described in Section 3.6 741 does not support this at all. 743 4.1.6. Loop Detection 745 In complex topologies with multiple domains interconnected, it is 746 possible to form media loops. RTP and RTCP support detecting such 747 loops, as long as the SSRC and CSRC identities are correctly set in 748 forwarded packets. It is likely that loop detection works for the 749 MCU described in Section 3.5, at least as long as it forwards the 750 RTCP between the participants. However, the MCU in section 3.6 will 751 definitely break the loop detection mechanism. 753 4.2. Comparision of topologies 755 The table below attempts to summarize the properties of the 756 different topologies. The legend to the topology abbrevations are: 757 Topo-Point-to-Point (PtP), Topo-Multicast (Multic), Topo-Trns- 758 Translator (TTrn), Topo-Media-Translator (including Transport 759 Translator) (MTrn), Topo-Mixer (Mixer), Topo-Asymmetric (ASY), 760 Topo-Video-switch-MCU (MCUs), and Topo-RTCP-terminating-MCU 761 (MCUt). In the below table Y indicates Y or full support, N 762 indicates No support, (Y) indicates partial support. N/A is not 763 applicable. 765 Property PtP Multic TTrn MTrn Mixer ASY MCUs MCUt 766 ------------------------------------------------------------------ 767 All to All media N Y Y Y (Y) (Y) (Y) (Y) 768 Interoperability N/A N Y Y Y Y N Y 769 Per Domain Adaptation N/A N N Y Y Y N Y 770 Aggregation of media N N N N Y (Y) Y Y 771 Full Session View Y Y Y Y Y Y (Y) N 772 Loop Detection Y Y Y Y Y Y (Y) N 774 Please note that the Media Translator also includes the transport 775 translator functionality. 777 5. Security Considerations 779 The use of mixers and translators has impact on security and the 780 security functions used. The primary issue is that both mixers and 781 translators modify packets, thus preventing the use of integrity 782 and source authentication, unless they are trusted devices that 783 take part in the security context, e.g. the device can send SRTP 784 and SRTCP [RFC3711] packets to session endpoints. If encryption is 785 employed, the media translator and mixer needs to be able to 786 decrypt the media to perform its function. A transport translator 787 may be used without access to the encrypted payload in cases where 788 it translates parts that are not included in the encryption and 789 integrity protection, for example, IP address and UDP port numbers 790 in a media stream using SRTP [RFC3711]. However, in general the 791 translator or mixer needs to be part of the signalling context and 792 get the necessary security associations (e.g. SRTP crypto contexts) 793 established with its RTP session participants. 795 Including the mixer and translator in the security context allows 796 the entity, if subverted or misbehaving, to perform a number of 797 very serious attacks as it has full access. It can perform all the 798 attacks possible---see RFC 3550 and any applicable profiles---as if 799 the media session were not protected at all, while giving the 800 impression to the session participants that they are protected. 802 Transport translators have no interactions with cryptography that 803 works above the transport layer, such as SRTP, since that sort of 804 translator leaves the RTP header and payload unaltered. Media 805 translators, on the other hand, have strong interactions with 806 cryptography, since they alter the RTP payload. A media translator 807 in a session that uses cryptographic protection needs to perform 808 cryptographic processing to both inbound and outbound packets. 810 A media translator may need to use different cryptographic keys for 811 the inbound and outbound processing. For SRTP, different keys are 812 required, because an RFC 3550 media translator leaves the SSRC 813 unchanged during its packet processing, and SRTP key sharing is 814 only allowed when distinct SSRCs can be used to protect distinct 815 packet streams. 817 When the media translator uses different keys to process inbound 818 and outbound packets, each session participant needs to be provided 819 with the appropriate key, depending on whether they are listening 820 to the translator or the original source. (Note that there is an 821 architectural difference between RTP media translation, in which 822 participants can rely on the RTP Payload Type field of a packet to 823 determine appropriate processing, and cryptographically protected 824 media translation, in which participants must use information that 825 is not carried in the packet.) 827 When using security mechanisms with translators and mixers, it is 828 possible that the translator or mixer creates different security 829 associations for the different domains they are working in. Doing 830 so has some implications. 832 First, it might weaken security if the mixer/translator accepts in 833 one domain a weaker algorithm or key than in another. Therefore, 834 care should be taken that appropriatly strong security parameters 835 are negotiated in all domains. In many cases, "appropriate" 836 translates to "similar" strength. If a key management system does 837 allow the negotiation of security parameters resulting in a 838 different strength of the security, then this system SHOULD notify 839 the participants in the other domains about this. 841 Second, the number of crypto contexts (keys, security related 842 state) needed (for example in SRTP [RFC3711]) may vary between 843 mixers and translators. A mixer normally needs to represent only a 844 single SSRC per domain, and therefore needs to create only one 845 security association (SRTP crypto context) per domain. In 846 contrast, a translator needs one security association per 847 participant it translates towards, in the opposite domain. 848 Considering Figure 3, the translator needs two security 849 associations towards the multicast domain, one for B and one for D. 850 It may be forced to maintain a set of totally independent security 851 associations between itself and B and D respectively, so to avoid 852 two-time pad. These contexts must also be capable of handling all 853 the sources present in the other domains. Hence, using completely 854 independent security associations (for certain keying mechanisms) 855 may force a translator to handle N*DM keys and related state; where 856 N is the total number of SSRCs used over all domains and DM is the 857 total number of domains. 859 There exist a number of different mechanisms to provide keys to the 860 different participants. One example is the choice between group 861 keys and unique keys per SSRC. The appropriate keying model is 862 impacted by the topologies one intends to use. The final security 863 properties are dependent on both the topologies in use and the 864 keying mechanisms' properties, and need to be considered by the 865 application. Exactly what mechanisms are used is outside of the 866 scope of this document. 868 6. Acknowledgements 870 The authors would like to thank Bo Burman, Umesh Chandra, Roni 871 Even, Keith Lantz, Ladan Gharai, Geoff Hunt and Mark Baugher for 872 their help in reviewing this document. 874 7. IANA Considerations 876 This document specifies no actions for IANA. 878 8. References 880 8.1. Normative References 882 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 883 Requirement Levels", BCP 14, RFC 2119, March 1997. 884 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 885 Jacobson, "RTP: A Transport Protocol for Real-Time 886 Applications", STD 64, RFC 3550, July 2003. 887 [RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. 888 Norrman, "The Secure Real-time Transport Protocol (SRTP)", 889 RFC 3711, March 2004. 890 [RFC4575] J. Rosenberg, H. Schulzrinne, O. Levin, "A Session 891 Initiation Protocol (SIP) Event Package for Conference 892 State", RFC 4575, August 2006 893 [RFC4585] Ott, J., Wenger, S., Sato, N., Burmeister, C., and J. Rey, 894 "Extended RTP Profile for Real-time Transport Control 895 Protocol (RTCP)-Based Feedback (RTP/AVPF)", RFC 4585, July 896 2006. 898 8.2. Informative References 900 [CCM] Wenger, S., Chandra, U., Westerlund, M., Burman, B., 901 "Codec Control Messages in the Audio-Visual Profile with 902 Feedback (AVPF)", Internet Draft, Work in Progress, 903 draft-ietf-avt-avpf-ccm-08.txt>, July 2007 904 [H323] ITU-T Recommendation H.323, "Packet-based multimedia 905 communications systems", June 2006. 906 [RTCP-SSM]J. Ott, J. Chesterfield, E. Schooler, "RTCP Extensions 907 for Single-Source Multicast Sessions with Unicast 908 Feedback," draft-ietf-avt-rtcpssm-13, work in progress, 909 March 2007. 911 9. Authors' Addresses 913 Magnus Westerlund 914 Ericsson Research 915 Ericsson AB 916 SE-164 80 Stockholm, SWEDEN 918 Phone: +46 8 7190000 919 EMail: magnus.westerlund@ericsson.com 921 Stephan Wenger 922 Nokia Corporation 923 P.O. Box 100 924 FIN-33721 Tampere 925 FINLAND 927 Phone: +358-50-486-0637 928 EMail: stewe@stewe.org 930 Full Copyright Statement 932 Copyright (C) The IETF Trust (2007). 934 This document is subject to the rights, licenses and restrictions 935 contained in BCP 78, and except as set forth therein, the authors 936 retain all their rights. 938 This document and the information contained herein are provided on an 939 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 940 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST 941 AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, 942 EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT 943 THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY 944 IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR 945 PURPOSE. 947 Intellectual Property 949 The IETF takes no position regarding the validity or scope of any 950 Intellectual Property Rights or other rights that might be claimed to 951 pertain to the implementation or use of the technology described in 952 this document or the extent to which any license under such rights 953 might or might not be available; nor does it represent that it has 954 made any independent effort to identify any such rights. Information 955 on the procedures with respect to rights in RFC documents can be 956 found in BCP 78 and BCP 79. 958 Copies of IPR disclosures made to the IETF Secretariat and any 959 assurances of licenses to be made available, or the result of an 960 attempt made to obtain a general license or permission for the use of 961 such proprietary rights by implementers or users of this 962 specification can be obtained from the IETF on-line IPR repository at 963 http://www.ietf.org/ipr. 965 The IETF invites any interested party to bring to its attention any 966 copyrights, patents or patent applications, or other proprietary 967 rights that may cover technology that may be required to implement 968 this standard. Please address the information to the IETF at 969 ietf-ipr@ietf.org. 971 Acknowledgement 973 Funding for the RFC Editor function is provided by the IETF 974 Administrative Support Activity (IASA). 976 RFC Editor Considerations 978 None