idnits 2.17.1 draft-even-clue-rtp-mapping-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 6 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 1, 2013) is 4074 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'I-D.lennox-clue-rtp-usage' is defined on line 754, but no explicit reference was found in the text == Unused Reference: 'I-D.westerlund-avtext-codec-operation-point' is defined on line 800, but no explicit reference was found in the text == Outdated reference: A later version (-25) exists of draft-ietf-clue-framework-06 == Outdated reference: A later version (-03) exists of draft-westerlund-avtext-rtcp-sdes-srcname-01 == Outdated reference: A later version (-09) exists of draft-ietf-clue-telepresence-use-cases-04 == Outdated reference: A later version (-05) exists of draft-lennox-mmusic-sdp-source-selection-04 == Outdated reference: A later version (-04) exists of draft-westerlund-avtcore-rtp-simulcast-01 == Outdated reference: A later version (-02) exists of draft-westerlund-avtcore-rtp-topologies-update-01 == Outdated reference: A later version (-01) exists of draft-westerlund-avtext-codec-operation-point-00 -- Obsolete informational reference (is this intentional?): RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) -- Obsolete informational reference (is this intentional?): RFC 5285 (Obsoleted by RFC 8285) Summary: 1 error (**), 0 flaws (~~), 10 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CLUE WG R. Even 3 Internet-Draft Huawei Technologies 4 Intended status: Standards Track J. Lennox 5 Expires: August 5, 2013 Vidyo 6 February 1, 2013 8 Mapping RTP streams to CLUE media captures 9 draft-even-clue-rtp-mapping-05.txt 11 Abstract 13 This document describes mechanisms and recommended practice for 14 mapping RTP media streams defined in SDP to CLUE media captures. 16 Status of this Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF). Note that other groups may also distribute 23 working documents as Internet-Drafts. The list of current Internet- 24 Drafts is at http://datatracker.ietf.org/drafts/current/. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 This Internet-Draft will expire on August 5, 2013. 33 Copyright Notice 35 Copyright (c) 2013 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 51 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 3. RTP topologies for CLUE . . . . . . . . . . . . . . . . . . . 3 53 4. Mapping CLUE Media Captures to RTP streams . . . . . . . . . . 5 54 4.1. Review of current directions in MMUSIC, AVText and 55 AVTcore . . . . . . . . . . . . . . . . . . . . . . . . . 6 56 4.2. Requirements of a solution . . . . . . . . . . . . . . . . 7 57 4.3. Static Mapping . . . . . . . . . . . . . . . . . . . . . . 9 58 4.4. Dynamic mapping . . . . . . . . . . . . . . . . . . . . . 9 59 4.4.1. RTP header extension . . . . . . . . . . . . . . . . . 10 60 4.4.2. Restricted approach . . . . . . . . . . . . . . . . . 10 61 4.5. Recommendations . . . . . . . . . . . . . . . . . . . . . 11 62 5. Application to CLUE Media Requirements . . . . . . . . . . . . 11 63 6. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 64 6.1. Static mapping . . . . . . . . . . . . . . . . . . . . . . 13 65 6.2. Dynamic Mapping . . . . . . . . . . . . . . . . . . . . . 16 66 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16 67 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 68 9. Security Considerations . . . . . . . . . . . . . . . . . . . 17 69 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 70 10.1. Normative References . . . . . . . . . . . . . . . . . . . 17 71 10.2. Informative References . . . . . . . . . . . . . . . . . . 17 72 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19 74 1. Introduction 76 Telepresence systems can send and receive multiple media streams. 77 The CLUE framework [I-D.ietf-clue-framework] defines media captures 78 as a source of Media, such as from one or more Capture Devices. A 79 Media Capture (MC) may be the source of one or more Media streams. A 80 Media Capture may also be constructed from other Media streams. A 81 middle box can express Media Captures that it constructs from Media 82 streams it receives. 84 SIP offer answer [RFC3264] uses SDP [RFC4566] to describe the 85 RTP[RFC3550] media streams. Each RTP stream has a unique SSRC within 86 its RTP session. The content of the RTP stream is created by an 87 encoder in the endpoint. This may be an original content from a 88 camera or a content created by an intermediary device like an MCU. 90 This document makes recommendations, for this telepresence 91 architecture, about how RTP and RTCP streams should be encoded and 92 transmitted, and how their relation to CLUE Media Captures should be 93 communicated. The proposed solution supports multiple RTP 94 topologies. 96 2. Terminology 98 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 99 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 100 document are to be interpreted as described in RFC2119[RFC2119] and 101 indicate requirement levels for compliant RTP implementations. 103 3. RTP topologies for CLUE 105 The typical RTP topologies used by Telepresence systems specify 106 different behaviors for RTP and RTCP distribution. A number of RTP 107 topologies are described in 108 [I-D.westerlund-avtcore-rtp-topologies-update]. For telepresence, 109 the relevant topologies include point-to-point, as well as media 110 mixers, media- switching mixers, and source-projection mixers. 112 In the point-to-point topology, one peer communicates directly with a 113 single peer over unicast. There can be one or more RTP sessions, and 114 each RTP session can carry multiple RTP streams identified by their 115 SSRC. All SSRCs will be recognized by the peers based on the 116 information in the RTCP SDES report that will include the CNAME and 117 SSRC of the sent RTP streams. There are different point to point use 118 cases as specified in CLUE use case 119 [I-D.ietf-clue-telepresence-use-cases]. There may be a difference 120 between the symmetric and asymmetric use cases. While in the 121 symmetric use case the typical mapping will be from a Media capture 122 device to a render device (e.g. camera to monitor) in the asymmetric 123 case the render device may receive different capture information (RTP 124 stream from a different camera) if it has fewer rendering devices 125 (monitors). In some cases, a CLUE session which, at a high-level, is 126 point-to-point may nonetheless have RTP which is best described by 127 one of the mixer topologies below. For example, a CLUE endpoint can 128 produce composited or switched captures for use by a receiving system 129 with fewer displays than the sender has cameras. 131 In the Media Mixer topology, the peers communicate only with the 132 mixer. The mixer provides mixed or composited media streams, using 133 its own SSRC for the sent streams. There are two cases here. In the 134 first case the mixer may have separate RTP sessions with each peer 135 (similar to the point to point topology) terminating the RTCP 136 sessions on the mixer; this is known as Topo-RTCP-Terminating MCU in 137 [RFC5117]. In the second case, the mixer can use a conference-wide 138 RTP session similar to RFC 5117's Topo-mixer or Topo-Video-switching. 139 The major difference is that for the second case, the mixer uses 140 conference-wide RTP sessions, and distributes the RTCP reports to all 141 the RTP session participants, enabling them to learn all the CNAMEs 142 and SSRCs of the participants and know the contributing source or 143 sources (CSRCs) of the original streams from the RTP header. In the 144 first case, the Mixer terminates the RTCP and the participants cannot 145 know all the available sources based on the RTCP information. The 146 conference roster information including conference participants, 147 endpoints, media and media-id (SSRC) can be available using the 148 conference event package [RFC4575] element. 150 In the Media-Switching Mixer topology, the peer to mixer 151 communication is unicast with mixer RTCP feedback. It is 152 conceptually similar to a compositing mixer as described in the 153 previous paragraph, except that rather than compositing or mixing 154 multiple sources, the mixer provides one or more conceptual sources 155 selecting one source at a time from the original sources. The Mixer 156 creates a conference-wide RTP session by sharing remote SSRC values 157 as CSRCs to all conference participants. 159 In the Source-Projection Mixer topology, the peer to mixer 160 communication is unicast with RTCP mixer feedback. Every potential 161 sender in the conference has a source which is "projected" by the 162 mixer into every other session in the conference; thus, every 163 original source is maintained with an independent RTP identity to 164 every receiver, maintaining separate decoding state and its original 165 RTCP SDES information. However, RTCP is terminated at the mixer, 166 which might also perform reliability, repair, rate adaptation, or 167 transcoding on the stream. Senders' SSRCs may be renumbered by the 168 mixer. The sender may turn the projected sources on and off at any 169 time, depending on which sources it thinks are most relevant for the 170 receiver; this is the primary reason why this topology must act as an 171 RTP mixer rather than as a translator, as otherwise these disabled 172 sources would appear to have enormous packet loss. Source switching 173 is accomplished through this process of enabling and disabling 174 projected sources, with the higher-level semantic assignment of 175 reason for the RTP streams assigned externally. 177 The above topologies demonstrate two major RTP/RTCP behaviors: 179 1. The mixer may either use the source SSRC when forwarding RTP 180 packets, or use its own created SSRC. Still the mixer will 181 distribute all RTCP information to all participants creating 182 conference-wide RTP session/s. This allows the participants to 183 learn the available RTP sources in each RTP session. The 184 original source information will be the SSRC or in the CSRC 185 depending on the topology. The point to point case behaves like 186 this. 188 2. The mixer terminates the RTCP from the source, creating separate 189 RTP sessions with the peers. In this case the participants will 190 not receive the source SSRC in the CSRC. Since this is usually a 191 mixer topology, the source information is available from the SIP 192 conference event package [RFC4575]. Subscribing to the 193 conference event package allows each participant to know the 194 SSRCs of all sources in the conference. 196 4. Mapping CLUE Media Captures to RTP streams 198 The different topologies described in Section 3 support different 199 SSRC distribution models and RTP stream multiplexing points. 201 Most video conferencing systems today can separate multiple RTP 202 sources by placing them into separate RTP sessions using, the SDP 203 description. For example, main and slides video sources are 204 separated into separate RTP sessions based on the content attribute 205 [RFC4796]. This solution works straightforwardly if the multiplexing 206 point is at the UDP transport level, where each RTP stream uses a 207 separate RTP session. This will also be true for mapping the RTP 208 streams to Media Captures if each media capture uses a separate RTP 209 session, and the consumer can identify it based on the receiving RTP 210 port. In this case, SDP only needs to label the RTP session with an 211 identifier that identifies the media capture in the CLUE description. 212 In this case, it does not change the mapping even if the RTP session 213 is switched using same or different SSRC. (The multiplexing is not 214 at the SSRC level). 216 Even though Session multiplexing is supported by CLUE, for scaling 217 reasons, CLUE recommends using SSRC multiplexing in a single or 218 multiple sessions. So we need to look at how to map RTP streams to 219 Media Captures when SSRC multiplexing is used. 221 When looking at SSRC multiplexing we can see that in various 222 topologies, the SSRC behavior may be different: 224 1. The SSRCs are static (assigned by the MCU/Mixer), and there is an 225 SSRC for each media capture encoding defined in the CLUE 226 protocol. Source information may be conveyed using CSRC, or, in 227 the case of topo-RTCP-Terminating MCU, is not conveyed. 229 2. The SSRCs are dynamic, representing the original source and are 230 relayed by the Mixer/MCU to the participants. 232 In the above two cases the MCU/Mixer creates its own advertisement, 233 with a virtual room capture scene. 235 Another case we can envision is that the MCU / Mixer relays all the 236 capture scenes from all advertisements to all consumers. This means 237 that the advertisement will include multiple capture scenes, each 238 representing a separate TP room with its own coordinate system. A 239 general tools for distributing roster information is by using an 240 event package, for example by extending the conference event package. 242 4.1. Review of current directions in MMUSIC, AVText and AVTcore 244 Editor's note: This section provides an overview of the RFCs and 245 drafts that can be used a base for a mapping solution. This section 246 is for information only, and if the WG thinks that it is the right 247 direction, the authors will bring the required work to the relevant 248 WGs. 250 The solution needs to also support the simulcast case where more than 251 one RTP session may be advertised for a Media Capture. Support of 252 such simulcast is out of scope for CLUE. 254 When looking at the available tools based on current work in MMUSIC, 255 AVTcore and AVText for supporting SSRC multiplexing the following 256 documents are considered to be relevant. 258 SDP Source attribute [RFC5576] mechanisms to describe specific 259 attributes of RTP sources based on their SSRC. 261 Negotiation of generic image attributes in SDP [RFC6236] provides the 262 means to negotiate the image size. The image attribute can be used 263 to offer different image parameters like size but in order to offer 264 multiple RTP streams with different resolutions it does it using 265 separate RTP session for each image option. 267 [I-D.westerlund-avtcore-max-ssrc] proposes a signaling solution for 268 how to use multiple SSRCs within one RTP session. 270 [I-D.westerlund-avtext-rtcp-sdes-srcname] provides an extension that 271 may be send in SDP, as an RTCP SDES information or as an RTP header 272 extension that uniquely identifies a single media source. It defines 273 an hierarchical order of the SRCNAME parameter that can be used to 274 for example to describe multiple resolution from the same source (see 275 section 5.1 of [I-D.westerlund-avtcore-rtp-simulcast]). Still all 276 the examples are using RTP session multiplexing. 278 Other documents reviewed by the authors but are currently not used in 279 a proposed solution include: 281 [I-D.lennox-mmusic-sdp-source-selection] specifies how participants 282 in a multimedia session can request a specific source from a remote 283 party. 285 [I-D.westerlund-avtext-codec-operation-point](expired) extends the 286 codec control messages by specifying messages that let participants 287 communicate a set of codec configuration parameters. 289 Using the above documents it is possible to negotiate the max number 290 of received and sent RTP streams inside an RTP session (m-line or 291 bundled m-line). This allows also offering allowed combinations of 292 codec configurations using different payload type numbers 294 Examples: max-recv-ssrc:{96:2 & 97:3) where 96 and 96 are different 295 payload type numbers. Or max-send-ssrc{*:4}. 297 In the next sections, the document will propose mechanisms to map the 298 RTP streams to media captures addressing. 300 4.2. Requirements of a solution 302 This section lists, more briefly, the requirements a media 303 architecture for Clue telepresence needs to achieve, summarizing the 304 discussion of previous sections. In this section, RFC 2119 [RFC2119] 305 language refers to requirements on a solution, not an implementation; 306 thus, requirements keywords are not written in capital letters. 308 Media-1: It must not be necessary for a Clue session to use more than 309 a single transport flow for transport of a given media type (video or 310 audio). 312 Media-2: It must, however, be possible for a Clue session to use 313 multiple transport flows for a given media type where it is 314 considered valuable (for example, for distributed media, or 315 differential quality-of-service). 317 Media-3: It must be possible for a Clue endpoint or MCU to 318 simultaneously send sources corresponding to static, to composited, 319 and to switched captures, in the same transport flow. (Any given 320 device might not necessarily be able send all of these source types; 321 but for those that can, it must be possible for them to be sent 322 simultaneously.) 324 Media-4: It must be possible for an original source to move among 325 switched captures (i.e. at one time be sent for one switched capture, 326 and at a later time be sent for another one). 328 Media-5: It must be possible for a source to be placed into a 329 switched capture even if the source is a "late joiner", i.e. was 330 added to the conference after the receiver requested the switched 331 source. 333 Media-6: Whenever a given source is assigned to a switched capture, 334 it must be immediately possible for a receiver to determine the 335 switched capture it corresponds to, and thus that any previous source 336 is no longer being mapped to that switched capture. 338 Media-7: It must be possible for a receiver to identify the actual 339 source that is currently being mapped to a switched capture, and 340 correlate it with out-of-band (non-Clue) information such as rosters. 342 Media-8: It must be possible for a source to move among switched 343 captures without requiring a refresh of decoder state (e.g., for 344 video, a fresh I-frame), when this is unnecessary. However, it must 345 also be possible for a receiver to indicate when a refresh of decoder 346 state is in fact necessary. 348 Media-9: If a given source is being sent on the same transport flow 349 for more than one reason (e.g. if it corresponds to more than one 350 switched capture at once, or to a static capture), it should be 351 possible for a sender to send only one copy of the source. 353 Media-10: On the network, media flows should, as much as possible, 354 look and behave like currently-defined usages of existing protocols; 355 established semantics of existing protocols must not be redefined. 357 Media-11: The solution should seek to minimize the processing burden 358 for boxes that distribute media to decoding hardware. 360 Media-12: If multiple sources from a single synchronization context 361 are being sent simultaneously, it must be possible for a receiver to 362 associate and synchronize them properly, even for sources that are 363 are mapped to switched captures. 365 4.3. Static Mapping 367 Static mapping is widely used in current MCU implementations. It is 368 also common for a point to point symmetric use case when both 369 endpoints have the same capabilities. For capture encodings with 370 static SSRCs, it is most straightforward to indicate this mapping 371 outside the media stream, in the CLUE or SDP signaling. An SDP 372 source attribute [RFC5576] can be used to associate CLUE capture IDs 373 with SSRCs in SDP. Each SSRC will have a captureID value that will 374 be specified also in the CLUE media capture as an attribute. The 375 provider advertisement could, if it wished, use the same SSRC for 376 media capture encodings that are mutually exclusive. (This would be 377 natural, for example, if two advertised captures are implemented as 378 different configurations of the same physical camera, zoomed in or 379 out.). Section 6 provide an example of an SDP offer and CLUE 380 advertisement. 382 4.4. Dynamic mapping 384 Dynamic mapping by tagging each media packet with the capture ID. 385 This means that a receiver immediately knows how to interpret 386 received media, even when an unknown SSRC is seen. As long as the 387 media carries a known capture ID, it can be assumed that this media 388 stream will replace the stream currently being received with that 389 capture ID. 391 This gives significant advantages to switching latency, as a switch 392 between sources can be achieved without any form of negotiation with 393 the receiver. [RFC5285] recommends that header extensions must be 394 used with caution. 396 However, the disadvantage in using a capture ID in the stream that it 397 introduces additional processing costs for every media packet, as 398 capture IDs are scoped only within one hop (i.e., within a cascaded 399 conference a capture ID that is used from the source to the first MCU 400 is not meaningful between two MCUs, or between an MCU and a 401 receiver), and so they may need to be added or modified at every 402 stage. 404 As capture IDs are chosen by the media sender, by offering a 405 particular capture to multiple recipients with the same ID, this 406 requires the sender to only produce one version of the stream 407 (assuming outgoing payload type numbers match). This reduces the 408 cost in the multicast case, although does not necessarily help in the 409 switching case. 411 An additional issue with putting capture IDs in the RTP packets comes 412 from cases where a non-CLUE aware endpoint is being switched by an 413 MCU to a CLUE endpoint. In this case, we may require up to an 414 additional 12 bytes in the RTP header, which may push a media packet 415 over the MTU. However, as the MTU on either side of the switch may 416 not match, it is possible that this could happen even without adding 417 extra data into the RTP packet. The 12 additional bytes per packet 418 could also be a significant bandwidth increase in the case of very 419 low bandwidth audio codecs. 421 4.4.1. RTP header extension 423 The capture ID could be carried within the RTP header extension 424 field, using [RFC5285]. This is negotiated within the SDP i.e. 426 a=extmap:1 urn:ietf:params:rtp-hdrex:clue-capture-id 428 Packets tagged by the sender with the capture ID will then contain a 429 header extension as shown below 431 0 1 2 3 432 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 433 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 434 | ID=1 | L=3 | capture id | 435 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 436 | capture id | 437 +-+-+-+-+-+-+-+-+ 439 Figure : RTP header extension for encoding of the capture ID 441 To add or modify the capture ID can be an expensive operation, 442 particularly if SRTP is used to authenticate the packet. 443 Modification to the contents of the RTP header requires a 444 reauthentication of the complete packet, and this could prove to be a 445 limiting factor in the throughput of a multipoint device. However, 446 it may be that reauthentication is required in any case due to the 447 nature of SDP. SDP permits the receiver to choose payload types, 448 meaning that a similar option to modify the payload type in the 449 packet header will cause the need to reauthenticate. 451 4.4.2. Restricted approach 453 The flaws of the Capture ID method (high latency switching of SSRC 454 multiplexing, high computational cost on switching nodes) can be 455 mitigated by sending Capture ID only on some packets of a stream. In 456 this, the capture ID can be included in packets belonging to the 457 first frame of media (typically an IDR/GDR) following a change in the 458 dynamic mapping. Following this, the SSRC is used to map sources to 459 capture IDs. 461 Note: in the dynamic case there is a need to verify how it will work 462 if not all RTP streams of the same media type are multiplexed in a 463 single RTP session. 465 4.5. Recommendations 467 The recommendation is that endpoints MUST support both the static 468 declaration of capture encoding SSRCs, and the RTP header extension 469 method of sharing capture IDs, with the extension in every media 470 packet. For low bandwidth situations, this may be considered 471 excessive overhead; in which case endpoints MAY support the approach 472 where capture IDs are sent selectively. The SDP offer MAY specify 473 the SSRC mapping to media capture. In the case of static mapping 474 topologies there will be no need to use the header extensions in the 475 media, since the SSRC for the RTP stream will remain the same during 476 the call unless a collision is detected and handled according to 477 RFC5576 [RFC5576]. If the used topology uses dynamic mapping then 478 the RTP header extension will be used to indicate the RTP stream 479 switch for the media capture. In this case the SDP description may 480 be used to negotiate the initial SSRC but this will be left for the 481 implementation. Note that if the SSRC is defined explicitly in the 482 SDP the SSRC collision should be handled as in RFC5576. 484 5. Application to CLUE Media Requirements 486 The requirement section Section 4.2 offers a number of requirements 487 that are believed to be necessary for a CLUE RTP mapping. The 488 solutions described in this document are believed to meet these 489 requirements, though some of them are only possible for some of the 490 topologies. (Since the requirements are generally of the form "it 491 must be possible for a sender to do something", this is adequate; a 492 sender which wishes to perform that action needs to choose a topology 493 which allows the behavior it wants. 495 In this section we address only those requirements where the 496 topologies or the association mechanisms treat the requirements 497 differently. 499 Media-4: It must be possible for an original source to move among 500 switched captures (i.e. at one time be sent for one switched capture, 501 and at a later time be sent for another one). 503 This applies naturally for static sources with a Switched Mixer. For 504 dynamic sources with a Source-Projecting Mixer, this just requires 505 the capture tag in the header extension element to be updated 506 appropriately. 508 Media-6: Whenever a given source is transmitted for a switched 509 capture, it must be immediately possible for a receiver to determine 510 the switched capture it corresponds to, and thus that any previous 511 source is no longer being mapped to that switched capture. 513 For a Switched Mixer, this applies naturally. For a Source- 514 Projecting mixer, this is done based on the header extension. 516 Media-7: It must be possible for a receiver to identify the original 517 source that is currently being mapped to a switched capture, and 518 correlate it with out-of-band (non-Clue) information such as rosters. 520 For a Switched Mixer, this is done based on the CSRC, if the mixer is 521 providing CSRCs; if for a Source-Projecting Mixer, this is done based 522 on the SSRC. 524 Media-8: It must be possible for a source to move among switched 525 captures without requiring a refresh of decoder state (e.g., for 526 video, a fresh I-frame), when this is unnecessary. However, it must 527 also be possible for a receiver to indicate when a refresh of decoder 528 state is in fact necessary. 530 This can be done by a Source-Projecting Mixer, but not by a Switching 531 Mixer. The last requirement can be accomplished through an FIR 532 message [RFC5104], though potentially a faster mechanism (not 533 requiring a round-trip time from the receiver) would be preferable. 535 Media-9: If a given source is being sent on the same transport flow 536 to satisfy more than one capture (e.g. if it corresponds to more than 537 one switched capture at once, or to a static capture as well as a 538 switched capture), it should be possible for a sender to send only 539 one copy of the source. 541 For a Source-Projecting Mixer, this can be accomplished by sending 542 multiple dynamic capture IDs for the same source; this can also be 543 done for an environment with a hybrid of mixer topologies and static 544 and dynamic captures, described below in Section 6. It is not 545 possible for static captures from a Switched Mixer. 547 Media-12: If multiple sources from a single synchronization context 548 are being sent simultaneously, it must be possible for a receiver to 549 associate and synchronize them properly, even for sources that are 550 mapped to switched captures. 552 For a Mixed or Switched Mixer topology, receivers will see only a 553 single synchronization context (CNAME), corresponding to the mixer. 554 For a Source-Projecting Mixer, separate projecting sources keep 555 separate synchronization contexts based on their original CNAMEs, 556 thus allowing independent synchronization of sources from independent 557 rooms without needing global synchronization. In hybrid cases, 558 however (e.g. if audio is mixed), all sources which need to be 559 synchronized with the mixed audio must get the same CNAME (and thus a 560 mixer-provided timebase) as the mixed audio. 562 6. Examples 564 It is possible for a CLUE device to send multiple instances of the 565 topologies in Section 3 simultaneously. For example, an MCU which 566 uses a traditional audio bridge with switched video would be a Mixer 567 topology for audio, but a Switched Mixer or a Source-Projecting Mixer 568 for video. In the latter case, the audio could be sent as a static 569 source, whereas the video could be dynamic. 571 More notably, it is possible for an endpoint to send the same sources 572 both for static and dynamic captures. Consider the example in 573 Section 11.1 of [I-D.ietf-clue-framework], where an endpoint can 574 provide both three cameras (VC0, VC1, and VC2) for left, center, and 575 right views, as well as a switched view (VC3) of the loudest panel. 577 It is possible for a consumer to request both the (VC0 - VC2) set and 578 VC3. It is worth noting that the content of VC3 is, at all times, 579 exactly the content of one of VC0, VC1, or VC2. Thus, if the sender 580 uses the Source-Selection Mixer topology for VC3, the consumer that 581 receives these three sources would not need to send any additional 582 media traffic over just sending (VC0 - VC2). 584 In this case, the advertiser could describe VC0, VC1, and VC2 in its 585 initial advertisement or SDP with static SSRCs, whereas VC3 would 586 need to be dynamic. The role of VC3 would move among VC0, VC1, or 587 VC2, indicated by the RTP header extension on those streams' RTP 588 packets. 590 6.1. Static mapping 592 Using the video capture example from the framework for a three camera 593 system with four monitors where one is for the presentation stream 594 [I-D.ietf-clue-framework] document: 596 o VC0- (the camera-left camera stream, purpose=main, switched:no 597 o VC1- (the center camera stream, purpose=main, switched:no 599 o VC2- (the camera-right camera stream), purpose=main, switched:no 601 o VC3- (the loudest panel stream), purpose=main, switched:yes 603 o VC4- (the loudest panel stream with PiPs), purpose=main, 604 composed=true; switched:yes 606 o VC5- (the zoomed out view of all people in the room), 607 purpose=main, composed=no; switched:no 609 o VC6- (presentation stream), purpose=presentation, switched:no 611 Where the physical simultaneity information is: 613 {VC0, VC1, VC2, VC3, VC4, VC6} 615 {VC0, VC2, VC5, VC6} 617 In this case the provider can send up to six simultaneous streams and 618 receive four one for each monitor. This is the maximum case but it 619 can be further limited by the capture scene entries which may propose 620 sending only three camera streams and one presentation, still since 621 the consumer can select any media captures that can be sent 622 simultaneously the offer will specify 6 streams where VC5 and VC1 are 623 using the same resource and are mutually exclusive. 625 In the Advertisement there may be two capture scenes: 627 The first capture scene may have four entries: 629 {VC0, VC1, VC2} 631 {VC3} 633 {VC4} 635 {VC5} 637 The second capture scene will have the following single entry. 639 {VC6} 641 We assume that an intermediary will need to look at CLUE if want to 642 have better decision on handling specific RTP streams for example 643 based on them being part of the same capture scene so the SDP will 644 not group streams by capture scene. 646 The SIP offer may be 648 m=video 49200 RTP/AVP 99 650 a=extmap:1 urn:ietf:params:rtp-hdrex:clue-capture-id / for support 651 of dynamic mapping 653 a=rtpmap:99 H264/90000 655 a=max-send-ssrc:{*:6} 657 a=max-recv-ssrc:{*:4} 659 a=ssrc:11111 CaptureID:1 661 a=ssrc:22222 CaptureID:2 663 a=ssrc:33333 CaptureID:3 665 a=ssrc:44444 CaptureID:4 667 a=ssrc:55555 CaptureID:5 669 a=ssrc:66666 CaptureID:6 671 In the above example the provider can send up to five main streams 672 and one presentation stream. 674 We define a new Media Capture ID attribute CaptureID which will have 675 the mapping of the related RTP stream 677 Note that VC1 and VC5 have the same SSRC since they are using the 678 same resource. 680 o VC0- (the camera-left camera stream, purpose=main, switched:no, 681 CaptureID =1 683 o VC1- (the center camera stream, purpose=main, switched:no, 684 CaptureID =2 686 o VC2- (the camera-right camera stream), purpose=main, switched:no, 687 CaptureID =3 689 o VC3- (the loudest panel stream), purpose=main, switched:yes, 690 CaptureID =4 692 o VC4- (the loudest panel stream with PiPs), purpose=main, 693 composed=true; switched:yes, CaptureID =5 695 o VC5- (the zoomed out view of all people in the room), 696 purpose=main, composed=no; switched:no, CaptureID =2 698 o VC6- (presentation stream), purpose=presentation, switched:no, 699 CaptureID =6 701 Note: We can allocate an SSRC for each MC which will not require the 702 indirection of using a CaptureId. This will require if a switch to 703 dynamic is done to provide information about which SSRC is being 704 replaced by the new one. 706 6.2. Dynamic Mapping 708 For topologies that use dynamic mapping there is no need to provide 709 the SSRCs in the offer (they may not be available if the offers from 710 the sources will not include them when connecting to the mixer or 711 remote endpoint) In this case the captureID (srcname) will be 712 specified first in the advertisement. 714 The SIP offer may be 716 m=video 49200 RTP/AVP 99 718 a=extmap:1 urn:ietf:params:rtp-hdrex:clue-capture-id 720 a=rtpmap:99 H264/90000 722 a=max-send-ssrc:{*:4} 724 a=max-recv-ssrc:{*:4} 726 This will work for ssrc multiplex. It is not clear how it will work 727 when RTP streams of the same media are not multiplexed in a single 728 RTP session. How to know which encoding will be in which of the 729 different RTP sessions. 731 7. Acknowledgements 733 The authors would like to thanks Allyn Romanow and Paul Witty for 734 contributing text to this work. 736 8. IANA Considerations 738 TBD 740 9. Security Considerations 742 TBD. 744 10. References 746 10.1. Normative References 748 [I-D.ietf-clue-framework] 749 Romanow, A., Duckworth, M., Pepperell, A., and B. Baldino, 750 "Framework for Telepresence Multi-Streams", 751 draft-ietf-clue-framework-06 (work in progress), 752 July 2012. 754 [I-D.lennox-clue-rtp-usage] 755 Lennox, J., Witty, P., and A. Romanow, "Real-Time 756 Transport Protocol (RTP) Usage for Telepresence Sessions", 757 draft-lennox-clue-rtp-usage-04 (work in progress), 758 June 2012. 760 [I-D.westerlund-avtcore-max-ssrc] 761 Westerlund, M., Burman, B., and F. Jansson, "Multiple 762 Synchronization sources (SSRC) in RTP Session Signaling", 763 draft-westerlund-avtcore-max-ssrc-02 (work in progress), 764 July 2012. 766 [I-D.westerlund-avtext-rtcp-sdes-srcname] 767 Westerlund, M., Burman, B., and P. Sandgren, "RTCP SDES 768 Item SRCNAME to Label Individual Sources", 769 draft-westerlund-avtext-rtcp-sdes-srcname-01 (work in 770 progress), July 2012. 772 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 773 Requirement Levels", BCP 14, RFC 2119, March 1997. 775 10.2. Informative References 777 [I-D.ietf-clue-telepresence-use-cases] 778 Romanow, A., Botzko, S., Duckworth, M., Even, R., and I. 779 Communications, "Use Cases for Telepresence Multi- 780 streams", draft-ietf-clue-telepresence-use-cases-04 (work 781 in progress), August 2012. 783 [I-D.lennox-mmusic-sdp-source-selection] 784 Lennox, J. and H. Schulzrinne, "Mechanisms for Media 785 Source Selection in the Session Description Protocol 786 (SDP)", draft-lennox-mmusic-sdp-source-selection-04 (work 787 in progress), March 2012. 789 [I-D.westerlund-avtcore-rtp-simulcast] 790 Westerlund, M., Burman, B., Lindqvist, M., and F. Jansson, 791 "Using Simulcast in RTP sessions", 792 draft-westerlund-avtcore-rtp-simulcast-01 (work in 793 progress), July 2012. 795 [I-D.westerlund-avtcore-rtp-topologies-update] 796 Westerlund, M. and S. Wenger, "RTP Topologies", 797 draft-westerlund-avtcore-rtp-topologies-update-01 (work in 798 progress), October 2012. 800 [I-D.westerlund-avtext-codec-operation-point] 801 Westerlund, M., Burman, B., and L. Hamm, "Codec Operation 802 Point RTCP Extension", 803 draft-westerlund-avtext-codec-operation-point-00 (work in 804 progress), March 2012. 806 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 807 with Session Description Protocol (SDP)", RFC 3264, 808 June 2002. 810 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 811 Jacobson, "RTP: A Transport Protocol for Real-Time 812 Applications", STD 64, RFC 3550, July 2003. 814 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 815 Description Protocol", RFC 4566, July 2006. 817 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session 818 Initiation Protocol (SIP) Event Package for Conference 819 State", RFC 4575, August 2006. 821 [RFC4796] Hautakorpi, J. and G. Camarillo, "The Session Description 822 Protocol (SDP) Content Attribute", RFC 4796, 823 February 2007. 825 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 826 "Codec Control Messages in the RTP Audio-Visual Profile 827 with Feedback (AVPF)", RFC 5104, February 2008. 829 [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, 830 January 2008. 832 [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP 833 Header Extensions", RFC 5285, July 2008. 835 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 836 Media Attributes in the Session Description Protocol 837 (SDP)", RFC 5576, June 2009. 839 [RFC6236] Johansson, I. and K. Jung, "Negotiation of Generic Image 840 Attributes in the Session Description Protocol (SDP)", 841 RFC 6236, May 2011. 843 Authors' Addresses 845 Roni Even 846 Huawei Technologies 847 Tel Aviv, 848 Israel 850 Email: roni.even@mail01.huawei.com 852 Jonathan Lennox 853 Vidyo, Inc. 854 433 Hackensack Avenue 855 Seventh Floor 856 Hackensack, NJ 07601 857 US 859 Email: jonathan@vidyo.com