idnits 2.17.1 draft-even-clue-rtp-mapping-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 16, 2012) is 4296 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC5888' is defined on line 460, but no explicit reference was found in the text == Unused Reference: 'RFC6184' is defined on line 463, but no explicit reference was found in the text == Unused Reference: 'RFC6236' is defined on line 466, but no explicit reference was found in the text == Outdated reference: A later version (-25) exists of draft-ietf-clue-framework-05 == Outdated reference: A later version (-09) exists of draft-ietf-clue-telepresence-use-cases-02 -- Obsolete informational reference (is this intentional?): RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CLUE WG R. Even 3 Internet-Draft Huawei Technologies 4 Intended status: Standards Track J. Lennox 5 Expires: January 17, 2013 Vidyo 6 July 16, 2012 8 Mapping RTP streams to CLUE media captures 9 draft-even-clue-rtp-mapping-03.txt 11 Abstract 13 This document describes mechanisms and recommended practice for 14 mapping RTP media streams defined in SDP to CLUE media captures. 16 Status of this Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF). Note that other groups may also distribute 23 working documents as Internet-Drafts. The list of current Internet- 24 Drafts is at http://datatracker.ietf.org/drafts/current/. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 This Internet-Draft will expire on January 17, 2013. 33 Copyright Notice 35 Copyright (c) 2012 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 51 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 3. RTP topologies for CLUE . . . . . . . . . . . . . . . . . . . 3 53 4. Mapping CLUE Media Captures to RTP streams . . . . . . . . . . 5 54 4.1. Static Mapping . . . . . . . . . . . . . . . . . . . . . . 6 55 4.2. Dynamic mapping . . . . . . . . . . . . . . . . . . . . . 7 56 4.3. Recommendations . . . . . . . . . . . . . . . . . . . . . 7 57 5. Application to CLUE Media Requirements . . . . . . . . . . . . 7 58 6. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 59 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 60 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 61 9. Security Considerations . . . . . . . . . . . . . . . . . . . 10 62 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 63 10.1. Normative References . . . . . . . . . . . . . . . . . . . 10 64 10.2. Informative References . . . . . . . . . . . . . . . . . . 10 65 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 67 1. Introduction 69 Telepresence systems can send and receive multiple media streams. 70 The CLUE framework [I-D.ietf-clue-framework] defines media captures 71 as a source of Media, such as from one or more Capture Devices. A 72 Media Capture (MC) may be the source of one or more Media streams. A 73 Media Capture may also be constructed from other Media streams. A 74 middle box can express Media Captures that it constructs from Media 75 streams it receives. 77 SIP offer answer [RFC3264] uses SDP [RFC4566] to describe the 78 RTP[RFC3550] media streams. Each RTP stream has a payload type 79 number and SSRC. The content of the RTP stream is created by the 80 encoder in the endpoint. This may be an original content from a 81 camera or a content created by an intermediary device like an MCU. 83 This document makes recommendations, for this telepresence 84 architecture, about how RTP and RTCP streams should be encoded and 85 transmitted, and how their relation to CLUE Media Captures should be 86 communicated. The proposed solution supports multiple RTP topologies 88 2. Terminology 90 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 91 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 92 document are to be interpreted as described in RFC2119[RFC2119] and 93 indicate requirement levels for compliant RTP implementations. 95 3. RTP topologies for CLUE 97 The typical RTP topologies used by telepresence systems specify 98 different behaviors for RTP and RTCP distribution. The relevant 99 topologies include point-to-point, as well as media mixers, media- 100 switching mixers, and source-projection mixers. 102 In the point-to-point topology, one peer communicates directly with a 103 single peer over unicast. There can be one or more RTP sessions, and 104 each RTP session can carry multiple RTP streams identified by their 105 SSRC. All SSRCs will be recognized by the peers based on the 106 information in the RTCP SDES report that will include the CNAME and 107 SSRC of the sent RTP streams. There are different use point to point 108 ise cases as specified in CLUE use case 109 [I-D.ietf-clue-telepresence-use-cases]. There may be a difference 110 between the symmetric and asymmetric use cases. While in the 111 symmetric use case the typical mapping will be from a Media capture 112 device to a render device (e.g. camera to monitor) in the asymmetric 113 case the render device may receive different capture information (RTP 114 stream from a different camera) if it has fewer rendering devices 115 (monitors). In some cases, a CLUE session which, at a high-level, is 116 point-to-point may nonetheless have RTP which is best described by 117 one of the mixer topologies below. For example, a CLUE endpoint can 118 produce composited or switched captures for use by a receiving system 119 with fewer displays than the sender has cameras. 121 In the Media Mixer topology, the peers communicate only with the 122 mixer. The mixer provides mixed or composited media streams, using 123 its own SSRC for the sent streams. There are two cases here. In the 124 first case the mixer may have separate RTP sessions with each peer 125 (similar to the point to point topology) terminating the RTCP 126 sessions on the mixer; this is known as Topo-RTCP-Terminating MCU in 127 [RFC5117]. In the second case, the mixer can use a conference-wide 128 RTP session similar to RFC 5117's Topo-mixer or Topo-Video-switching. 129 The major difference is that for the second case, the mixer uses 130 conference-wide RTP sessions, and distributes the RTCP reports to all 131 the RTP session participants, enabling them to learn all the CNAMEs 132 and SSRCs of the participants and know the contributing source or 133 sources (CSRCs) of the original streams from the RTP header. In the 134 first case, the Mixer terminates the RTCP and the participants cannot 135 know all the available sources based on the RTCP information. The 136 conference roster information including conference participants, 137 endpoints, media and media-id (SSRC) can be available using the 138 conference event package [RFC4575] element. 140 In the Media-Switching Mixer topology, the peer to mixer 141 communication is unicast with mixer RTCP feedback. It is 142 conceptually similar to a compositing mixer as described in the 143 previous paragraph, except that rather than compositing or mixing 144 multiple sources, the mixer provides one or more conceptual sources 145 selecting one source at a time from the original sources. The Mixer 146 creates a conference-wide RTP session by sharing remote SSRC values 147 as CSRCs to all conference participants. 149 In the Source-Projection Mixer topology, the peer to mixer 150 communication is unicast with RTCP mixer feedback. Every potential 151 sender in the conference has a source which is "projected" by the 152 mixer into every other session in the conference; thus, every 153 original source is maintained with an independent RTP identity to 154 every receiver, maintaining separate decoding state and its original 155 RTCP SDES information. However, RTCP is terminated at the mixer, 156 which might also perform reliability, repair, rate adaptation, or 157 transcoding on the stream. Senders' SSRCs may be renumbered by the 158 mixer. The sender may turn the projected sources on and off at any 159 time, depending on which sources it thinks are most relevant for the 160 receiver; this is the primary reason why this topology must act as an 161 RTP mixer rather than as a translator, as otherwise these disabled 162 sources would appear to have enormous packet loss. Source switching 163 is accomplished through this process of enabling and disabling 164 projected sources, with the higher-level semantic assignment of 165 reason for the RTP streams assigned externally. 167 The above topologies demonstrate two major RTP/RTCP behaviors: 169 1. The mixer may either use the source SSRC when forwarding RTP 170 packets, or use its own created SSRC. Still the mixer will 171 distribute all RTCP information to all participants creating 172 conference-wide RTP session/s. This allows the participants to 173 learn the available RTP sources in each RTP session. The 174 original source information will be the SSRC or in the CSRC 175 depending on the topology. The point to point case behaves like 176 this. 178 2. The mixer terminates the RTCP from the source, creating separate 179 RTP sessions with the peers. In this case the participants will 180 not receive the source SSRC in the CSRC. Since this is usually a 181 mixer topology, the source information is available from the SIP 182 conference event package [RFC4575] Subscribing to the conference 183 event package allows each participant to know the SSRCs of all 184 sources in the conference. 186 4. Mapping CLUE Media Captures to RTP streams 188 The different topologies described in Section 3 support different 189 SSRC distribution models and RTP stream multiplexing points. 191 Most video conferencing systems today can separate multiple RTP 192 sources by placing them into separate RTP sessions using, the SDP 193 description. For example, main and slides video sources are 194 separated into separate RTP sessions based on the content attribute 195 [RFC4796]. This solution works straightforwardly if the multiplexing 196 point is at the UDP transport level, where each RTP stream uses a 197 separate RTP session. This will also be true for mapping the RTP 198 streams to Media Captures if each media capture uses a separate RTP 199 session, and the consumer can identify it based on the receiving RTP 200 port. In this case, SDP only needs to label the RTP session with an 201 identifier that identifies the media capture in the CLUE description. 202 In this case, it does not change the mapping even if the RTP session 203 is switched using same or different SSRC. (The multiplexing is not 204 at the SSRC level). 206 Even though Session multiplexing is supported by CLUE, for scaling 207 reasons, CLUE recommends using SSRC multiplexing in a single or 208 multiple sessions. So we need to look at how to map RTP streams to 209 Media Capture IDs when SSRC multiplexing is used. 211 When looking at SSRC multiplexing we can see that in various 212 topologies, the SSRC behavior may be different: 214 1. The SSRCs are static (assigned by the MCU/Mixer), and there is an 215 SSRC for each media capture encoding defined in the CLUE 216 protocol. Source information may be conveyed using CSRC, or, in 217 the case of topo-RTCP-Terminating MCU, is not conveyed. 219 2. The SSRCs are dynamic, representing the original source and are 220 relayed by the Mixer/MCU to the participants. 222 In the above two cases the MCU/Mixer creates its own advertisement, 223 with a virtual room capture scene. 225 Another case we can envision is that the MCU / Mixer relays all the 226 capture scenes from all advertisements to all consumers. This means 227 that the advertisement will include multiple capture scenes, each 228 representing a separate TP room with its own coordinate system. A 229 general tools for distributing roster information is by using an 230 event package, for example by extending the conference event package. 232 4.1. Static Mapping 234 Static mapping is widely used in current MCU implementations. It is 235 also common for a point to point symmetric use case when both 236 endpoints have the same capabilities. For capture encodings with 237 static SSRCs, it is most straightforward to indicate this mapping 238 outside the media stream, in the CLUE or SDP signaling. An SDP 239 source attribute [RFC5576] could be defined to associate CLUE capture 240 IDs with SSRCs in SDP. Each SSRC will have a captureID value that 241 will be specified also in the CLUE media capture as an attribute. 242 The provider advertisement could, if it wished, use the same SSRC for 243 media capture encodings that are mutually exclusive. (This would be 244 natural, for example, if two advertised captures are implemented as 245 different configurations of the same physical camera, zoomed in or 246 out.) 248 Note: there may be more than one RTP session for a media capture like 249 in simulcast. We still need to figure out how to describe it in SDP 250 and CLUE. 252 Another method for static mapping may be to use the provider 253 advertisement could to indicate the intended SSRC directly. The 254 advnatge of using the SDP SSRC attribute is that RFC5576 [RFC5576] 255 the issue of SSRC collision and provide guideline how to address 256 them. 258 4.2. Dynamic mapping 260 Dynamic mapping using RTP header extension is described in 261 draft-lennox-clue-rtp-usage [I-D.lennox-clue-rtp-usage]section 10.2. 262 The draft does not specify what is the capture id value. As 263 specified for the static case there should be a capture id attribute 264 in the CLUE media capture information to enable this mapping. 266 4.3. Recommendations 268 The recommendation is that endpoints MUST support both the static 269 declaration of capture encoding SSRCs, and the RTP header extension 270 method of sharing capture IDs, with the extension in every media 271 packet. For low bandwidth situations, this may be considered 272 excessive overhead; in which case endpoints MAY support the combined 273 approach from [I-D.lennox-clue-rtp-usage]. The SDP offer MAY specify 274 the SSRC mapping to media capture. In the case of static mapping 275 topologies there will be no need to use the header extensions in the 276 media, since the SSRC for the RTP stream will remain the same during 277 the call unless a collision is detected and handled according to 278 RFC5576 [RFC5576]. If the used topology uses dynamic mapping then 279 the RTP header extension will be used to indicate the RTP stream 280 switch for the media capture. In this case the SDP description may 281 be used to negotiate the initial SSRC but this will be left for the 282 implementation. Note that if the SSRC is defined explicitly in the 283 SDP the SSRC collision should be handled as in RFC5576. 285 5. Application to CLUE Media Requirements 287 [I-D.lennox-clue-rtp-usage] offers a number of requirements that are 288 believed to be necessary for a CLUE RTP mapping. The solutions 289 described in this document are believed to meet that requirement, 290 though some of them are only possible for some of the topologies. 291 (Since the requirements are generally of the form "it must be 292 possible for a sender to do something", this is adequate; a sender 293 which wishes to perform that action needs to choose a topology which 294 allows the behavior it wants. 296 In this section we address only those requirements where the 297 topologies or the association mechanisms treat the requirements 298 differently. 300 Media-4: It must be possible for an original source to move among 301 switched captures (i.e. at one time be sent for one switched capture, 302 and at a later time be sent for another one). 304 This applies naturally for static sources with a Switched Mixer. For 305 dynamic sources with a Source-Projecting Mixer, this just requires 306 the capture tag in the header extension element to be updated 307 appropriately. 309 Media-6: Whenever a given source is transmitted for a switched 310 capture, it must be immediately possible for a receiver to determine 311 the switched capture it corresponds to, and thus that any previous 312 source is no longer being mapped to that switched capture. 314 For a Switched Mixer, this applies naturally. For a Source- 315 Projecting mixer, this is done based on the header extension. 317 Media-7: It must be possible for a receiver to identify the original 318 source that is currently being mapped to a switched capture, and 319 correlate it with out-of-band (non-Clue) information such as rosters. 321 For a Switched Mixer, this is done based on the CSRC, if the mixer is 322 providing CSRCs; if for a Source-Projecting Mixer, this is done based 323 on the SSRC. 325 Media-8: It must be possible for a source to move among switched 326 captures without requiring a refresh of decoder state (e.g., for 327 video, a fresh I-frame), when this is unnecessary. However, it must 328 also be possible for a receiver to indicate when a refresh of decoder 329 state is in fact necessary. 331 This can be done by a Source-Projecting Mixer, but not by a Switching 332 Mixer. The last requirement can be accomplished through an FIR 333 message [RFC5104], though potentially a faster mechanism (not 334 requiring a round-trip time from the receiver) would be preferable. 336 Media-9: If a given source is being sent on the same transport flow 337 to satisfy more than one capture (e.g. if it corresponds to more than 338 one switched capture at once, or to a static capture as well as a 339 switched capture), it should be possible for a sender to send only 340 one copy of the source. 342 For a Source-Projecting Mixer, this can be accomplished by sending 343 multiple dynamic capture IDs for the same source; this can also be 344 done for an environment with a hybrid of mixer topologies and static 345 and dynamic captures, described below in Section 6. It is not 346 possible for static captures from a Switched Mixer. 348 Media-12: If multiple sources from a single synchronization context 349 are being sent simultaneously, it must be possible for a receiver to 350 associate and synchronize them properly, even for sources that are 351 are mapped to switched captures. 353 For a Mixed or Switched Mixer topology, receivers will see only a 354 single synchronization context (CNAME), corresponding to the mixer. 355 For a Source-Projecting Mixer, separate projecting sources keep 356 separate synchronization contexts based on their original CNAMEs, 357 thus allowing independent synchronization of sources from independent 358 rooms without needing global synchronization. In hybrid cases, 359 however (e.g. if audio is mixed), all sources which need to be 360 synchronized with the mixed audio must get the same CNAME (and thus a 361 mixer-provided timebase) as the mixed audio. 363 6. Examples 365 It is possible for a CLUE device to send multiple instances of the 366 topologies in Section 3 simultaneously. For example, an MCU which 367 uses a traditional audio bridge with switched video would be a Mixer 368 topology for audio, but a Switched Mixer or a Source-Projecting Mixer 369 for video. In the latter case, the audio could be sent as a static 370 source, whereas the video could be dynamic. 372 More notably, it is possible for an endpoint to send the same sources 373 both for static and dynamic captures. Consider the example in 374 Section 11.1 of [I-D.ietf-clue-framework], where an endpoint can 375 provide both three cameras (VC0, VC1, and VC2) for left, center, and 376 right views, as well as a switched view (VC3) of the loudest panel. 378 It is possible for a consumer to request both the (VC0 - VC2) set and 379 VC3. It is worth noting that the content of VC3 is, at all times, 380 exactly the content of one of VC0, VC1, or VC2. Thus, if the sender 381 uses the Source-Selection Mixer topology for VC3, the consumer that 382 receives these three sources would not need to send any additional 383 media traffic over just sending (VC0 - VC2). 385 In this case, the advertiser could describe VC0, VC1, and VC2 in its 386 initial advertisement or SDP with static SSRCs, whereas VC3 would 387 need to be dynamic. The role of VC3 would move among VC0, VC1, or 388 VC2, indicated by the RTP header extension on those streams' RTP 389 packets. 391 7. Acknowledgements 393 place holder 395 8. IANA Considerations 397 TBD 399 9. Security Considerations 401 TBD. 403 10. References 405 10.1. Normative References 407 [I-D.ietf-clue-framework] 408 Romanow, A., Duckworth, M., Pepperell, A., and B. Baldino, 409 "Framework for Telepresence Multi-Streams", 410 draft-ietf-clue-framework-05 (work in progress), 411 February 2012. 413 [I-D.lennox-clue-rtp-usage] 414 Lennox, J., Witty, P., and A. Romanow, "Real-Time 415 Transport Protocol (RTP) Usage for Telepresence Sessions", 416 draft-lennox-clue-rtp-usage-04 (work in progress), 417 June 2012. 419 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 420 Requirement Levels", BCP 14, RFC 2119, March 1997. 422 10.2. Informative References 424 [I-D.ietf-clue-telepresence-use-cases] 425 Romanow, A., Botzko, S., Duckworth, M., Even, R., and I. 426 Communications, "Use Cases for Telepresence Multi- 427 streams", draft-ietf-clue-telepresence-use-cases-02 (work 428 in progress), January 2012. 430 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 431 with Session Description Protocol (SDP)", RFC 3264, 432 June 2002. 434 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 435 Jacobson, "RTP: A Transport Protocol for Real-Time 436 Applications", STD 64, RFC 3550, July 2003. 438 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 439 Description Protocol", RFC 4566, July 2006. 441 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session 442 Initiation Protocol (SIP) Event Package for Conference 443 State", RFC 4575, August 2006. 445 [RFC4796] Hautakorpi, J. and G. Camarillo, "The Session Description 446 Protocol (SDP) Content Attribute", RFC 4796, 447 February 2007. 449 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 450 "Codec Control Messages in the RTP Audio-Visual Profile 451 with Feedback (AVPF)", RFC 5104, February 2008. 453 [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, 454 January 2008. 456 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 457 Media Attributes in the Session Description Protocol 458 (SDP)", RFC 5576, June 2009. 460 [RFC5888] Camarillo, G. and H. Schulzrinne, "The Session Description 461 Protocol (SDP) Grouping Framework", RFC 5888, June 2010. 463 [RFC6184] Wang, Y., Even, R., Kristensen, T., and R. Jesup, "RTP 464 Payload Format for H.264 Video", RFC 6184, May 2011. 466 [RFC6236] Johansson, I. and K. Jung, "Negotiation of Generic Image 467 Attributes in the Session Description Protocol (SDP)", 468 RFC 6236, May 2011. 470 Authors' Addresses 472 Roni Even 473 Huawei Technologies 474 Tel Aviv, 475 Israel 477 Email: even.roni@huawei.com 479 Jonathan Lennox 480 Vidyo, Inc. 481 433 Hackensack Avenue 482 Seventh Floor 483 Hackensack, NJ 07601 484 US 486 Email: jonathan@vidyo.com