idnits 2.17.1 draft-ietf-clue-rtp-mapping-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 22, 2014) is 3566 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'I-D.lennox-clue-rtp-usage' is defined on line 703, but no explicit reference was found in the text == Unused Reference: 'I-D.westerlund-avtext-codec-operation-point' is defined on line 742, but no explicit reference was found in the text == Unused Reference: 'RFC5285' is defined on line 773, but no explicit reference was found in the text ** Downref: Normative reference to an Informational draft: draft-even-mmusic-application-token (ref. 'I-D.even-mmusic-application-token') == Outdated reference: A later version (-25) exists of draft-ietf-clue-framework-16 == Outdated reference: A later version (-03) exists of draft-westerlund-avtext-rtcp-sdes-srcname-01 == Outdated reference: A later version (-05) exists of draft-lennox-mmusic-sdp-source-selection-04 == Outdated reference: A later version (-02) exists of draft-westerlund-avtcore-rtp-topologies-update-01 == Outdated reference: A later version (-01) exists of draft-westerlund-avtext-codec-operation-point-00 -- Obsolete informational reference (is this intentional?): RFC 4566 (Obsoleted by RFC 8866) -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) -- Obsolete informational reference (is this intentional?): RFC 5285 (Obsoleted by RFC 8285) Summary: 1 error (**), 0 flaws (~~), 9 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CLUE WG R. Even 3 Internet-Draft Huawei Technologies 4 Intended status: Standards Track J. Lennox 5 Expires: January 23, 2015 Vidyo 6 July 22, 2014 8 Mapping RTP streams to CLUE media captures 9 draft-ietf-clue-rtp-mapping-02.txt 11 Abstract 13 This document describes mechanisms and recommended practice for 14 mapping RTP media streams defined in SDP to CLUE media captures. 16 Status of This Memo 18 This Internet-Draft is submitted in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF). Note that other groups may also distribute 23 working documents as Internet-Drafts. The list of current Internet- 24 Drafts is at http://datatracker.ietf.org/drafts/current/. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 This Internet-Draft will expire on January 23, 2015. 33 Copyright Notice 35 Copyright (c) 2014 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. Code Components extracted from this document must 44 include Simplified BSD License text as described in Section 4.e of 45 the Trust Legal Provisions and are provided without warranty as 46 described in the Simplified BSD License. 48 Table of Contents 50 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 51 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 52 3. RTP topologies for CLUE . . . . . . . . . . . . . . . . . . . 3 53 4. Mapping CLUE Capture Encodings to RTP streams . . . . . . . . 5 54 4.1. Review of current directions in MMUSIC, AVText and 55 AVTcore . . . . . . . . . . . . . . . . . . . . . . . . . 6 56 4.2. Requirements of a solution . . . . . . . . . . . . . . . 7 57 4.3. Static Mapping . . . . . . . . . . . . . . . . . . . . . 8 58 4.4. Dynamic mapping . . . . . . . . . . . . . . . . . . . . . 9 59 4.5. Recommendations . . . . . . . . . . . . . . . . . . . . . 9 60 5. Application to CLUE Media Requirements . . . . . . . . . . . 10 61 6. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 11 62 6.1. Static mapping . . . . . . . . . . . . . . . . . . . . . 12 63 6.2. Dynamic Mapping . . . . . . . . . . . . . . . . . . . . . 14 64 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 15 65 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 66 9. Security Considerations . . . . . . . . . . . . . . . . . . . 15 67 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 68 10.1. Normative References . . . . . . . . . . . . . . . . . . 15 69 10.2. Informative References . . . . . . . . . . . . . . . . . 16 70 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 72 1. Introduction 74 Telepresence systems can send and receive multiple media streams. 75 The CLUE framework [I-D.ietf-clue-framework] defines media captures 76 as a source of Media, such as from one or more Capture Devices. A 77 Media Capture (MC) may be the source of one or more Media streams. A 78 Media Capture may also be constructed from other Media streams. A 79 middle box can express conceptual Media Captures that it constructs 80 from Media streams it receives. 82 SIP offer answer [RFC3264] uses SDP [RFC4566] to describe the 83 RTP[RFC3550] media streams. Each RTP stream has a unique SSRC within 84 its RTP session. The content of the RTP stream is created by an 85 encoder in the endpoint. This may be an original content from a 86 camera or a content created by an intermediary device like an MCU. 88 This document makes recommendations, for the telepresence 89 architecture, about how RTP and RTCP streams should be encoded and 90 transmitted, and how their relation to CLUE Media Captures should be 91 communicated. The proposed solution supports multiple RTP 92 topologies. 94 2. Terminology 96 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 97 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 98 document are to be interpreted as described in RFC2119[RFC2119] and 99 indicate requirement levels for compliant RTP implementations. 101 3. RTP topologies for CLUE 103 The typical RTP topologies used by Telepresence systems specify 104 different behaviors for RTP and RTCP distribution. A number of RTP 105 topologies are described in 106 [I-D.westerlund-avtcore-rtp-topologies-update]. For telepresence, 107 the relevant topologies include point-to-point, as well as media 108 mixers, media- switching mixers, and source-projection middleboxs. 110 In the point-to-point topology, one peer communicates directly with a 111 single peer over unicast. There can be one or more RTP sessions, and 112 each RTP session can carry multiple RTP streams identified by their 113 SSRC. All SSRCs will be recognized by the peers based on the 114 information in the RTCP SDES report that will include the CNAME and 115 SSRC of the sent RTP streams. There are different point to point use 116 cases as specified in CLUE use case [RFC7205]. There may be a 117 difference between the symmetric and asymmetric use cases. While in 118 the symmetric use case the typical mapping will be from a Media 119 capture device to a render device (e.g. camera to monitor) in the 120 asymmetric case the render device may receive different capture 121 information (RTP stream from a different camera) if it has fewer 122 rendering devices (monitors). In some cases, a CLUE session which, 123 at a high-level, is point-to-point may nonetheless have RTP which is 124 best described by one of the mixer topologies below. For example, a 125 CLUE endpoint can produce composited or switched captures for use by 126 a receiving system with fewer displays than the sender has cameras. 128 In the Media Mixer topology, the peers communicate only with the 129 mixer. The mixer provides mixed or composited media streams, using 130 its own SSRC for the sent streams. There are two cases here. In the 131 first case the mixer may have separate RTP sessions with each peer 132 (similar to the point to point topology) terminating the RTCP 133 sessions on the mixer; this is known as Topo-RTCP-Terminating MCU in 134 [RFC5117]. In the second case, the mixer can use a conference-wide 135 RTP session similar to RFC 5117's Topo-mixer or Topo-Video-switching. 136 The major difference is that for the second case, the mixer uses 137 conference-wide RTP sessions, and distributes the RTCP reports to all 138 the RTP session participants, enabling them to learn all the CNAMEs 139 and SSRCs of the participants and know the contributing source or 140 sources (CSRCs) of the original streams from the RTP header. In the 141 first case, the Mixer terminates the RTCP and the participants cannot 142 know all the available sources based on the RTCP information. The 143 conference roster information including conference participants, 144 endpoints, media and media-id (SSRC) can be available using the 145 conference event package [RFC4575] element. 147 In the Media-Switching Mixer topology, the peer to mixer 148 communication is unicast with mixer RTCP feedback. It is 149 conceptually similar to a compositing mixer as described in the 150 previous paragraph, except that rather than compositing or mixing 151 multiple sources, the mixer provides one or more conceptual sources 152 selecting one source at a time from the original sources. The Mixer 153 creates a conference-wide RTP session by sharing remote SSRC values 154 as CSRCs to all conference participants. 156 In the Source-Projection middlebox topology, the peer to mixer 157 communication is unicast with RTCP mixer feedback. Every potential 158 sender in the conference has a source which is "projected" by the 159 mixer into every other session in the conference; thus, every 160 original source is maintained with an independent RTP identity to 161 every receiver, maintaining separate decoding state and its original 162 RTCP SDES information. However, RTCP is terminated at the mixer, 163 which might also perform reliability, repair, rate adaptation, or 164 transcoding on the stream. Senders' SSRCs may be renumbered by the 165 mixer. The sender may turn the projected sources on and off at any 166 time, depending on which sources it thinks are most relevant for the 167 receiver; this is the primary reason why this topology must act as an 168 RTP mixer rather than as a translator, as otherwise these disabled 169 sources would appear to have enormous packet loss. Source switching 170 is accomplished through this process of enabling and disabling 171 projected sources, with the higher-level semantic assignment of 172 reason for the RTP streams assigned externally. 174 The above topologies demonstrate two major RTP/RTCP behaviors: 176 1. The mixer may either use the source SSRC when forwarding RTP 177 packets, or use its own created SSRC. Still the mixer will 178 distribute all RTCP information to all participants creating 179 conference-wide RTP session/s. This allows the participants to 180 learn the available RTP sources in each RTP session. The 181 original source information will be the SSRC or in the CSRC 182 depending on the topology. The point to point case behaves like 183 this. 185 2. The mixer terminates the RTCP from the source, creating separate 186 RTP sessions with the peers. In this case the participants will 187 not receive the source SSRC in the CSRC. Since this is usually a 188 mixer topology, the source information is available from the SIP 189 conference event package [RFC4575]. Subscribing to the 190 conference event package allows each participant to know the 191 SSRCs of all sources in the conference. 193 4. Mapping CLUE Capture Encodings to RTP streams 195 The different topologies described in Section 3 support different 196 SSRC distribution models and RTP stream multiplexing points. 198 Most video conferencing systems today can separate multiple RTP 199 sources by placing them into separate RTP sessions using, the SDP 200 description. For example, main and slides video sources are 201 separated into separate RTP sessions based on the content attribute 202 [RFC4796]. This solution works straightforwardly if the multiplexing 203 point is at the UDP transport level, where each RTP stream uses a 204 separate RTP session. This will also be true for mapping the RTP 205 streams to Media Captures Encodings if each media capture encodings 206 uses a separate RTP session, and the consumer can identify it based 207 on the receiving RTP port. In this case, SDP only needs to label the 208 RTP session with an identifier that identifies the media capture in 209 the CLUE description. In this case, it does not change the mapping 210 even if the RTP session is switched using same or different SSRC. 211 (The multiplexing is not at the SSRC level). 213 Even though Session multiplexing is supported by CLUE, for scaling 214 reasons, CLUE recommends using SSRC multiplexing in a single or 215 multiple sessions. So we need to look at how to map RTP streams to 216 Media Captures Encodings when SSRC multiplexing is used. 218 When looking at SSRC multiplexing we can see that in various 219 topologies, the SSRC behavior may be different: 221 1. The SSRCs are static (assigned by the MCU/Mixer), and there is an 222 SSRC for each media capture encoding defined in the CLUE 223 protocol. Source information may be conveyed using CSRC, or, in 224 the case of topo-RTCP-Terminating MCU, is not conveyed. 226 2. The SSRCs are dynamic, representing the original source and are 227 relayed by the Mixer/MCU to the participants. 229 In the above two cases the MCU/Mixer may create an advertisement, 230 with a virtual room capture scene. 232 Another case we can envision is that the MCU / Mixer relays all the 233 capture scenes from all advertisements to all consumers. This means 234 that the advertisement will include multiple capture scenes, each 235 representing a separate TelePresence room with its own coordinate 236 system. 238 4.1. Review of current directions in MMUSIC, AVText and AVTcore 240 Editor's note: This section provides an overview of the RFCs and 241 drafts that can be used a base for a mapping solution. This section 242 is for information only, and if the WG thinks that it is the right 243 direction, the authors will bring the required work to the relevant 244 WGs. 246 The solution needs to also support the simulcast case where more than 247 one RTP session may be advertised for a Media Capture. Support of 248 such simulcast is out of scope for CLUE. 250 When looking at the available tools based on current work in MMUSIC, 251 AVTcore and AVText for supporting SSRC multiplexing the following 252 documents are considered to be relevant. 254 SDP Source attribute [RFC5576] mechanisms to describe specific 255 attributes of RTP sources based on their SSRC. 257 Negotiation of generic image attributes in SDP [RFC6236] provides the 258 means to negotiate the image size. The image attribute can be used 259 to offer different image parameters like size but in order to offer 260 multiple RTP streams with different resolutions it does it using 261 separate RTP session for each image option. 263 [I-D.westerlund-avtcore-max-ssrc] proposes a signaling solution for 264 how to use multiple SSRCs within one RTP session. 266 [I-D.westerlund-avtext-rtcp-sdes-srcname] provides an extension that 267 may be send in SDP, as an RTCP SDES information or as an RTP header 268 extension that uniquely identifies a single media source. It defines 269 an hierarchical order of the SRCNAME parameter that can be used to 270 for example to describe multiple resolution from the same source (see 271 section 5.1 of [I-D.westerlund-avtcore-rtp-simulcast]). Still all 272 the examples are using RTP session multiplexing. 274 Other documents reviewed by the authors but are currently not used in 275 a proposed solution include: 277 [I-D.lennox-mmusic-sdp-source-selection] specifies how participants 278 in a multimedia session can request a specific source from a remote 279 party. 281 [I-D.westerlund-avtext-codec-operation-point](expired) extends the 282 codec control messages by specifying messages that let participants 283 communicate a set of codec configuration parameters. 285 Using the above documents it is possible to negotiate the max number 286 of received and sent RTP streams inside an RTP session (m-line or 287 bundled m-line). This allows also offering allowed combinations of 288 codec configurations using different payload type numbers 290 Examples: max-recv-ssrc:{96:2 & 97:3) where 96 and 96 are different 291 payload type numbers. Or max-send-ssrc{*:4}. 293 In the next sections, the document will propose mechanisms to map the 294 RTP streams to media captures addressing. 296 4.2. Requirements of a solution 298 This section lists, more briefly, the requirements a media 299 architecture for Clue telepresence needs to achieve, summarizing the 300 discussion of previous sections. In this section, RFC 2119 [RFC2119] 301 language refers to requirements on a solution, not an implementation; 302 thus, requirements keywords are not written in capital letters. 304 Media-1: It must not be necessary for a Clue session to use more than 305 a single transport flow for transport of a given media type (video or 306 audio). 308 Media-2: It must, however, be possible for a Clue session to use 309 multiple transport flows for a given media type where it is 310 considered valuable (for example, for distributed media, or 311 differential quality-of-service). 313 Media-3: It must be possible for a Clue endpoint or MCU to 314 simultaneously send sources corresponding to static, to composited, 315 and to switched captures, in the same transport flow. (Any given 316 device might not necessarily be able send all of these source types; 317 but for those that can, it must be possible for them to be sent 318 simultaneously.) 320 Media-4: It must be possible for an original source to move among 321 switched captures (i.e. at one time be sent for one switched capture, 322 and at a later time be sent for another one). 324 Media-5: It must be possible for a source to be placed into a 325 switched capture even if the source is a "late joiner", i.e. was 326 added to the conference after the receiver requested the switched 327 source. 329 Media-6: Whenever a given source is assigned to a switched capture, 330 it must be immediately possible for a receiver to determine the 331 switched capture it corresponds to, and thus that any previous source 332 is no longer being mapped to that switched capture. 334 Media-7: It must be possible for a receiver to identify the actual 335 source that is currently being mapped to a switched capture, and 336 correlate it with out-of-band (non-Clue) information such as rosters. 338 Media-8: It must be possible for a source to move among switched 339 captures without requiring a refresh of decoder state (e.g., for 340 video, a fresh I-frame), when this is unnecessary. However, it must 341 also be possible for a receiver to indicate when a refresh of decoder 342 state is in fact necessary. 344 Media-9: If a given source is being sent on the same transport flow 345 for more than one reason (e.g. if it corresponds to more than one 346 switched capture at once, or to a static capture), it should be 347 possible for a sender to send only one copy of the source. 349 Media-10: On the network, media flows should, as much as possible, 350 look and behave like currently-defined usages of existing protocols; 351 established semantics of existing protocols must not be redefined. 353 Media-11: The solution should seek to minimize the processing burden 354 for boxes that distribute media to decoding hardware. 356 Media-12: If multiple sources from a single synchronization context 357 are being sent simultaneously, it must be possible for a receiver to 358 associate and synchronize them properly, even for sources that are 359 are mapped to switched captures. 361 4.3. Static Mapping 363 Static mapping is widely used in current MCU implementations. It is 364 also common for a point to point symmetric use case when both 365 endpoints have the same capabilities. For capture encodings with 366 static SSRCs, it is most straightforward to indicate this mapping 367 outside the media stream, in the CLUE or SDP signaling. An SDP 368 source attribute [RFC5576] can be used to associate CLUE encoding 369 using appIds[I-D.even-mmusic-application-token] with SSRCs in SDP. 370 Each SSRC will have an appId value that will be specified also in the 371 CLUE media capture as an attribute. The provider advertisement 372 could, if it wished, use the same SSRC for media capture encodings 373 that are mutually exclusive. (This would be natural, for example, if 374 two advertised captures are implemented as different configurations 375 of the same physical camera, zoomed in or out.). Section 6 provide 376 an example of an SDP offer and CLUE advertisement. 378 4.4. Dynamic mapping 380 Dynamic mapping by tagging each media packet with the appId. This 381 means that a receiver immediately knows how to interpret received 382 media, even when an unknown SSRC is seen. As long as the media 383 carries a known appId, it can be assumed that this media stream will 384 replace the stream currently being received with that appId. 386 This gives significant advantages to switching latency, as a switch 387 between sources can be achieved without any form of negotiation with 388 the receiver. 390 However, the disadvantage in using a appId in the stream that it 391 introduces additional processing costs for every media packet, as 392 appIds are scoped only within one hop (i.e., within a cascaded 393 conference a appId that is used from the source to the first MCU is 394 not meaningful between two MCUs, or between an MCU and a receiver), 395 and so they may need to be added or modified at every stage. 397 If the appIds are chosen by the media sender, by offering a 398 particular capture encoding to multiple recipients with the same ID, 399 this requires the sender to only produce one version of the stream 400 (assuming outgoing payload type numbers match). This reduces the 401 cost in the multicast case, although does not necessarily help in the 402 switching case. 404 An additional issue with putting appIds in the RTP packets comes from 405 cases where a non-CLUE aware endpoint is being switched by an MCU to 406 a CLUE endpoint. In this case, we may require up to an additional 12 407 bytes in the RTP header, which may push a media packet over the MTU. 408 However, as the MTU on either side of the switch may not match, it is 409 possible that this could happen even without adding extra data into 410 the RTP packet. The 12 additional bytes per packet could also be a 411 significant bandwidth increase in the case of very low bandwidth 412 audio codecs. 414 4.5. Recommendations 416 The recommendation is that endpoints MUST support both the static 417 declaration of capture encoding SSRCs, and appId in every media 418 packet. For low bandwidth situations, this may be considered 419 excessive overhead; in which case endpoints MAY support the approach 420 where appIds are sent selectively. The SDP offer MAY specify the 421 SSRC mapping to capture encoding. In the case of static mapping 422 topologies there will be no need to use the header extensions in the 423 media, since the SSRC for the RTP stream will remain the same during 424 the call unless a collision is detected and handled according to 425 RFC5576 [RFC5576]. If the used topology uses dynamic mapping then 426 the appId will be used to indicate the RTP stream switch for the 427 media capture. In this case the SDP description may be used to 428 negotiate the initial SSRC but this will be left for the 429 implementation. Note that if the SSRC is defined explicitly in the 430 SDP the SSRC collision should be handled as in RFC5576. 432 5. Application to CLUE Media Requirements 434 The requirement section Section 4.2 offers a number of requirements 435 that are believed to be necessary for a CLUE RTP mapping. The 436 solutions described in this document are believed to meet these 437 requirements, though some of them are only possible for some of the 438 topologies. (Since the requirements are generally of the form "it 439 must be possible for a sender to do something", this is adequate; a 440 sender which wishes to perform that action needs to choose a topology 441 which allows the behavior it wants. 443 In this section we address only those requirements where the 444 topologies or the association mechanisms treat the requirements 445 differently. 447 Media-4: It must be possible for an original source to move among 448 switched captures (i.e. at one time be sent for one switched capture, 449 and at a later time be sent for another one). 451 This applies naturally for static sources with a Switched Mixer. For 452 dynamic sources with a Source-Projecting middlebox, this just 453 requires the appId in the header extension element to be updated 454 appropriately. 456 Media-6: Whenever a given source is transmitted for a switched 457 capture, it must be immediately possible for a receiver to determine 458 the switched capture it corresponds to, and thus that any previous 459 source is no longer being mapped to that switched capture. 461 For a Switched Mixer, this applies naturally. For a Source- 462 Projecting middlebox, this is done based on the appId. 464 Media-7: It must be possible for a receiver to identify the original 465 source that is currently being mapped to a switched capture, and 466 correlate it with out-of-band (non-Clue) information such as rosters. 468 For a Switched Mixer, this is done based on the CSRC, if the mixer is 469 providing CSRCs; if for a Source-Projecting middlebox, this is done 470 based on the SSRC. 472 Media-8: It must be possible for a source to move among switched 473 captures without requiring a refresh of decoder state (e.g., for 474 video, a fresh I-frame), when this is unnecessary. However, it must 475 also be possible for a receiver to indicate when a refresh of decoder 476 state is in fact necessary. 478 This can be done by a Source-Projecting middlebox, but not by a 479 Switching Mixer. The last requirement can be accomplished through an 480 FIR message [RFC5104], though potentially a faster mechanism (not 481 requiring a round-trip time from the receiver) would be preferable. 483 Media-9: If a given source is being sent on the same transport flow 484 to satisfy more than one capture (e.g. if it corresponds to more than 485 one switched capture at once, or to a static capture as well as a 486 switched capture), it should be possible for a sender to send only 487 one copy of the source. 489 For a Source-Projecting middlebox, this can be accomplished by 490 sending multiple dynamic appIds for the same source; this can also be 491 done for an environment with a hybrid of mixer topologies and static 492 and dynamic captures, described below in Section 6. It is not 493 possible for static captures from a Switched Mixer. 495 Media-12: If multiple sources from a single synchronization context 496 are being sent simultaneously, it must be possible for a receiver to 497 associate and synchronize them properly, even for sources that are 498 mapped to switched captures. 500 For a Mixed or Switched Mixer topology, receivers will see only a 501 single synchronization context (CNAME), corresponding to the mixer. 502 For a Source-Projecting middlebox, separate projecting sources keep 503 separate synchronization contexts based on their original CNAMEs, 504 thus allowing independent synchronization of sources from independent 505 rooms without needing global synchronization. In hybrid cases, 506 however (e.g. if audio is mixed), all sources which need to be 507 synchronized with the mixed audio must get the same CNAME (and thus a 508 mixer-provided timebase) as the mixed audio. 510 6. Examples 512 It is possible for a CLUE device to send multiple instances of the 513 topologies in Section 3 simultaneously. For example, an MCU which 514 uses a traditional audio bridge with switched video would be a Mixer 515 topology for audio, but a Switched Mixer or a Source-Projecting 516 middlebox for video. In the latter case, the audio could be sent as 517 a static source, whereas the video could be dynamic. 519 More notably, it is possible for an endpoint to send the same sources 520 both for static and dynamic captures. Consider the example 521 [I-D.ietf-clue-framework], where an endpoint can provide both three 522 cameras (VC0, VC1, and VC2) for left, center, and right views, as 523 well as a switched view (VC3) of the loudest panel. 525 It is possible for a consumer to request both the (VC0 - VC2) set and 526 VC3. It is worth noting that the content of VC3 is, at all times, 527 exactly the content of one of VC0, VC1, or VC2. Thus, if the sender 528 uses the Source-Projecting middlebox topology for VC3, the consumer 529 that receives these three sources would not need to send any 530 additional media traffic over just sending (VC0 - VC2). 532 In this case, the advertiser could describe VC0, VC1, and VC2 in its 533 initial advertisement or SDP with static SSRCs, whereas VC3 would 534 need to be dynamic. The role of VC3 would move among VC0, VC1, or 535 VC2, indicated by the appId RTP header extension on those streams' 536 RTP packets. 538 6.1. Static mapping 540 Using the video capture example from the framework for a three camera 541 system with four monitors where one is for the presentation stream 542 [I-D.ietf-clue-framework] document: 544 o VC0- (the camera-left camera stream, purpose=main, switched:no 546 o VC1- (the center camera stream, purpose=main, switched:no 548 o VC2- (the camera-right camera stream), purpose=main, switched:no 550 o VC3- (the loudest panel stream), purpose=main, switched:yes 552 o VC4- (the loudest panel stream with PiPs), purpose=main, 553 composed=true; switched:yes 555 o VC5- (the zoomed out view of all people in the room), 556 purpose=main, composed=no; switched:no 558 o VC6- (presentation stream), purpose=presentation, switched:no 560 Where the physical simultaneity information is: 562 {VC0, VC1, VC2, VC3, VC4, VC6} 564 {VC0, VC2, VC5, VC6} 566 In this case the provider can send up to six simultaneous streams and 567 receive four one for each monitor. This is the maximum case but it 568 can be further limited by the capture scene entries which may propose 569 sending only three camera streams and one presentation, still since 570 the consumer can select any media captures that can be sent 571 simultaneously the offer will specify 6 streams where VC5 and VC1 are 572 using the same resource and are mutually exclusive. 574 In the Advertisement there may be two capture scenes: 576 The first capture scene may have four entries: 578 {VC0, VC1, VC2} 580 {VC3} 582 {VC4} 584 {VC5} 586 The second capture scene will have the following single entry. 588 {VC6} 590 We assume that an intermediary will need to look at CLUE if want to 591 have better decision on handling specific RTP streams for example 592 based on them being part of the same capture scene so the SDP will 593 not group streams by capture scene. 595 The SIP offer may be 597 m=video 49200 RTP/AVP 99 599 a=extmap:1 urn:ietf:params:rtp-hdrex:appId/ for support of dynamic 600 mapping 602 a=rtpmap:99 H264/90000 604 a=max-send-ssrc:{*:6} 606 a=max-recv-ssrc:{*:4} 608 a=ssrc:11111 appId:1 610 a=ssrc:22222appId:2 612 a=ssrc:33333 appId:3 614 a=ssrc:44444appId:4 616 a=ssrc:55555 appId:5 617 a=ssrc:66666 appId:6 619 In the above example the provider can send up to five main streams 620 and one presentation stream. 622 Note that VC1 and VC5 have the same SSRC since they are using the 623 same resource. 625 o VC0- (the camera-left camera stream, purpose=main, switched:no, 626 appId =1 628 o VC1- (the center camera stream, purpose=main, switched:no, appId 629 =2 631 o VC2- (the camera-right camera stream), purpose=main, switched:no, 632 appId =3 634 o VC3- (the loudest panel stream), purpose=main, switched:yes, appId 635 =4 637 o VC4- (the loudest panel stream with PiPs), purpose=main, 638 composed=true; switched:yes, appId =5 640 o VC5- (the zoomed out view of all people in the room), 641 purpose=main, composed=no; switched:no, appId =2 643 o VC6- (presentation stream), purpose=presentation, switched:no, 644 appId =6 646 Note: We can allocate an SSRC for each MC which will not require the 647 indirection of using an appId. This will require if a switch to 648 dynamic is done to provide information about which SSRC is being 649 replaced by the new one. 651 6.2. Dynamic Mapping 653 For topologies that use dynamic mapping there is no need to provide 654 the SSRCs in the offer (they may not be available if the offers from 655 the sources will not include them when connecting to the mixer or 656 remote endpoint) In this case the appId will be specified first in 657 the advertisement. 659 The SIP offer may be 661 m=video 49200 RTP/AVP 99 663 a=extmap:1 urn:ietf:params:appId 664 a=rtpmap:99 H264/90000 666 a=max-send-ssrc:{*:4} 668 a=max-recv-ssrc:{*:4} 670 This will work for ssrc multiplex. It is not clear how it will work 671 when RTP streams of the same media are not multiplexed in a single 672 RTP session. How to know which encoding will be in which of the 673 different RTP sessions. 675 7. Acknowledgements 677 The authors would like to thanks Allyn Romanow and Paul Witty for 678 contributing text to this work. 680 8. IANA Considerations 682 TBD 684 9. Security Considerations 686 TBD. 688 10. References 690 10.1. Normative References 692 [I-D.even-mmusic-application-token] 693 Even, R., Lennox, J., and Q. Wu, "The Session Description 694 Protocol (SDP) Application Token Attribute", draft-even- 695 mmusic-application-token-03 (work in progress), April 696 2014. 698 [I-D.ietf-clue-framework] 699 Romanow, A., Duckworth, M., Pepperell, A., and B. Baldino, 700 "Framework for Telepresence Multi-Streams", draft-ietf- 701 clue-framework-16 (work in progress), June 2014. 703 [I-D.lennox-clue-rtp-usage] 704 Lennox, J., Witty, P., and A. Romanow, "Real-Time 705 Transport Protocol (RTP) Usage for Telepresence Sessions", 706 draft-lennox-clue-rtp-usage-04 (work in progress), June 707 2012. 709 [I-D.westerlund-avtcore-max-ssrc] 710 Westerlund, M., Burman, B., and F. Jansson, "Multiple 711 Synchronization sources (SSRC) in RTP Session Signaling", 712 draft-westerlund-avtcore-max-ssrc-02 (work in progress), 713 July 2012. 715 [I-D.westerlund-avtext-rtcp-sdes-srcname] 716 Westerlund, M., Burman, B., and P. Sandgren, "RTCP SDES 717 Item SRCNAME to Label Individual Sources", draft- 718 westerlund-avtext-rtcp-sdes-srcname-01 (work in progress), 719 July 2012. 721 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 722 Requirement Levels", BCP 14, RFC 2119, March 1997. 724 10.2. Informative References 726 [I-D.lennox-mmusic-sdp-source-selection] 727 Lennox, J. and H. Schulzrinne, "Mechanisms for Media 728 Source Selection in the Session Description Protocol 729 (SDP)", draft-lennox-mmusic-sdp-source-selection-04 (work 730 in progress), March 2012. 732 [I-D.westerlund-avtcore-rtp-simulcast] 733 Westerlund, M., Burman, B., Lindqvist, M., and F. Jansson, 734 "Using Simulcast in RTP sessions", draft-westerlund- 735 avtcore-rtp-simulcast-04 (work in progress), July 2014. 737 [I-D.westerlund-avtcore-rtp-topologies-update] 738 Westerlund, M. and S. Wenger, "RTP Topologies", draft- 739 westerlund-avtcore-rtp-topologies-update-01 (work in 740 progress), October 2012. 742 [I-D.westerlund-avtext-codec-operation-point] 743 Westerlund, M., Burman, B., and L. Hamm, "Codec Operation 744 Point RTCP Extension", draft-westerlund-avtext-codec- 745 operation-point-00 (work in progress), March 2012. 747 [RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer Model 748 with Session Description Protocol (SDP)", RFC 3264, June 749 2002. 751 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 752 Jacobson, "RTP: A Transport Protocol for Real-Time 753 Applications", STD 64, RFC 3550, July 2003. 755 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 756 Description Protocol", RFC 4566, July 2006. 758 [RFC4575] Rosenberg, J., Schulzrinne, H., and O. Levin, "A Session 759 Initiation Protocol (SIP) Event Package for Conference 760 State", RFC 4575, August 2006. 762 [RFC4796] Hautakorpi, J. and G. Camarillo, "The Session Description 763 Protocol (SDP) Content Attribute", RFC 4796, February 764 2007. 766 [RFC5104] Wenger, S., Chandra, U., Westerlund, M., and B. Burman, 767 "Codec Control Messages in the RTP Audio-Visual Profile 768 with Feedback (AVPF)", RFC 5104, February 2008. 770 [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, 771 January 2008. 773 [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP 774 Header Extensions", RFC 5285, July 2008. 776 [RFC5576] Lennox, J., Ott, J., and T. Schierl, "Source-Specific 777 Media Attributes in the Session Description Protocol 778 (SDP)", RFC 5576, June 2009. 780 [RFC6236] Johansson, I. and K. Jung, "Negotiation of Generic Image 781 Attributes in the Session Description Protocol (SDP)", RFC 782 6236, May 2011. 784 [RFC7205] Romanow, A., Botzko, S., Duckworth, M., and R. Even, "Use 785 Cases for Telepresence Multistreams", RFC 7205, April 786 2014. 788 Authors' Addresses 790 Roni Even 791 Huawei Technologies 792 Tel Aviv 793 Israel 795 Email: roni.even@mail01.huawei.com 797 Jonathan Lennox 798 Vidyo, Inc. 799 433 Hackensack Avenue 800 Seventh Floor 801 Hackensack, NJ 07601 802 US 804 Email: jonathan@vidyo.com