idnits 2.17.1 draft-loreto-dispatch-disaggregated-media-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 5 instances of too long lines in the document, the longest one being 3 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 20, 2010) is 5176 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC3087' is defined on line 522, but no explicit reference was found in the text == Unused Reference: 'RFC3911' is defined on line 541, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 3525 (Obsoleted by RFC 5125) -- Obsolete informational reference (is this intentional?): RFC 4566 (Obsoleted by RFC 8866) Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 DISPATCH Working Group G. Camarillo 3 Internet-Draft S. Loreto 4 Intended status: Informational Ericsson 5 Expires: August 24, 2010 February 20, 2010 7 Disaggregated Media in the Session Initiation Protocol (SIP) 8 draft-loreto-dispatch-disaggregated-media-01.txt 10 Abstract 12 Disaggregated media refers to the ability for a user to create a 13 multimedia session combining different media streams, coming from 14 different devices under his or her control, so that they are treated 15 by the far end of the session as a single media session. This 16 document lists several use cases that involve disaggregated media in 17 SIP. Additionally, this document analyzes what types of 18 disaggregated media can be implemented using existing protocol 19 mechanisms, and the pros and cons of using each of those mechanisms. 20 Finally, this document describes scenarios that are not covered by 21 current mechanisms and proposes new IETF work to cover them. 23 Status of this Memo 25 This Internet-Draft is submitted to IETF in full conformance with the 26 provisions of BCP 78 and BCP 79. 28 Internet-Drafts are working documents of the Internet Engineering 29 Task Force (IETF), its areas, and its working groups. Note that 30 other groups may also distribute working documents as Internet- 31 Drafts. 33 Internet-Drafts are draft documents valid for a maximum of six months 34 and may be updated, replaced, or obsoleted by other documents at any 35 time. It is inappropriate to use Internet-Drafts as reference 36 material or to cite them other than as "work in progress." 38 The list of current Internet-Drafts can be accessed at 39 http://www.ietf.org/ietf/1id-abstracts.txt. 41 The list of Internet-Draft Shadow Directories can be accessed at 42 http://www.ietf.org/shadow.html. 44 This Internet-Draft will expire on August 24, 2010. 46 Copyright Notice 48 Copyright (c) 2010 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2. Disaggregated media: Use Cases . . . . . . . . . . . . . . . . 3 65 2.1. Using Two Separate devices to Start a Conversation . . . . 3 66 2.2. Showing a Pre-recorded Video During a Conversation . . . . 4 67 2.3. Sending a File from a PC During a Conversation . . . . . . 4 68 2.4. Including Live Video in a Conversation . . . . . . . . . . 4 69 2.5. Including Remote Live Video in a Conversation . . . . . . 5 70 2.6. Answering a call using Two Separate Devices . . . . . . . 5 71 2.7. Other possible use cases . . . . . . . . . . . . . . . . . 6 72 3. Existing Mechanisms to Implement Disaggregated Media . . . . . 6 73 3.1. Message Bus (Mbus) . . . . . . . . . . . . . . . . . . . . 7 74 3.1.1. Mbus issues . . . . . . . . . . . . . . . . . . . . . 8 75 3.2. Megaco (H.248) . . . . . . . . . . . . . . . . . . . . . . 8 76 3.2.1. Megaco issues . . . . . . . . . . . . . . . . . . . . 9 77 3.3. Third Part Call Control (3pcc) . . . . . . . . . . . . . . 9 78 3.3.1. 3pcc issues . . . . . . . . . . . . . . . . . . . . . 10 79 4. Scenarios Not Covered by Existing Mechanisms . . . . . . . . . 11 80 5. Security Considerations . . . . . . . . . . . . . . . . . . . 12 81 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 82 7. Informational References . . . . . . . . . . . . . . . . . . . 12 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 85 1. Introduction 87 Disaggregated media refers to the ability for a user to create a 88 multimedia session combining different media streams, coming from 89 different devices under his or her control, so that they are treated 90 by the far end of the session as a single media session. 92 The SIP specification [RFC3261] defines a multimedia session as "an 93 exchange of data between an association of participants". SDP 94 (Session Description Protocol) is the default session description 95 format in SIP. The SDP (Session Description Protocol) specification 96 [RFC4566] defines a multimedia session as "a set of multimedia 97 senders and receivers and the data streams flowing from senders to 98 receivers". 100 Generally, a given participant uses a single device to establish (or 101 participate in) a given multimedia session. Consequently, the SIP 102 signaling to manage the multimedia session and the actual media 103 streams are typically co-located in the same device. In scenarios 104 involving disaggregated media, a user wants to establish a single 105 multimedia session combining different media streams coming from 106 different devices under his or her control. This creates a need to 107 coordinate the exchange of the those media streams within the media 108 session. 110 The remainder of this document is organized as follows. Section 2 111 contains use cases where different media streams, coming from 112 different devices, are combined to establish a multimedia session. 113 Section 3 describes what types of disaggregated media can be 114 implemented using existing protocol mechanisms, and the pros and cons 115 of using each of those mechanisms. Section 4 describes scenarios 116 that are not covered by current mechanisms and proposes new IETF work 117 to cover them. 119 2. Disaggregated media: Use Cases 121 This section lists several use cases where users participate in a 122 multimedia session using multiple devices. That is, either the user 123 initiating the session uses several devices in parallel during the 124 session, or the user receiving the session invitation uses several 125 devices in parallel during the session, or both. 127 2.1. Using Two Separate devices to Start a Conversation 129 Laura is at her office. On her desk, she has a PC with a soft client 130 and a desk phone. The PC has a low-quality built-in microphone and 131 is connected to high-quality speakers. 133 Laura wants to establish a voice session with Bob using the desk 134 phone as the microphone and the soft client as the speaker. 136 Laura wants Bob to treat the send-only audio stream from her 137 deskphone and the receive-only audio stream from her softclient as 138 part of the same communication space. That is, Laura wants Bob to 139 treat both streams as belonging to the same multimedia session. 141 2.2. Showing a Pre-recorded Video During a Conversation 143 Bob has a voice-only phone and an IP-TV device. Laura has an 144 integrated advanced multimedia phone with camera. 146 Bob is talking on his voice-only phone with Laura, who is on her 147 multimedia phone. 149 Bob wants to show Laura part of the TV show he recorded last night. 150 Bob interacts, using his voice-only phone, with his IP-TV device and 151 sends a video stream to Laura's device. 153 Bob talks about the show with Laura on his voice-only phone while 154 Laura watches the show. 156 Bob wants Laura to treat the video stream from his IP-TV device and 157 the voice stream from his voice-only phone as part of the same 158 communication space. That is, Bob wants Laura to treat both streams 159 as belonging to the same multimedia session. 161 2.3. Sending a File from a PC During a Conversation 163 Bob has a voice-only phone and a PC with a soft client. Laura has an 164 integrated advanced multimedia phone with support for file transfers. 166 Bob wants to send a file to Laura from his PC during his conversation 167 with Laura on his voice-only phone. 169 Bob interacts, using his voice-only phone, with his PC and starts a 170 file transfer session to Laura's multimedia phone. 172 Bob wants Laura to treat the file transfer from his PC and the voice 173 stream from his voice-only phone as part of the same communication 174 space. That is, Bob wants Laura to treat both streams as belonging 175 to the same multimedia session. 177 2.4. Including Live Video in a Conversation 179 Bob has a voice-only phone and a PC which has a soft client, a 180 Webcam, and a low-quality built-in microphone. Laura has an 181 integrated advanced multimedia phone with camera. 183 Bob wants to send a live video to Laura from his PC during his 184 conversation with Laura. 186 Bob interacts, using his voice-only phone, with his PC and starts 187 live video stream to Laura's multimedia phone. 189 Bob wants Laura to treat the live video stream from his PC and the 190 voice stream from his voice-only phone as part of the same 191 communication space. That is, Bob wants Laura to treat both streams 192 as belonging to the same multimedia session. 194 2.5. Including Remote Live Video in a Conversation 196 Bob, who is at his office, has a multimedia phone. At his summer 197 cottage, Bob has a webcam-phone (e.g. a video-surveillance system). 198 Laura has an integrated advanced multimedia phone with a camera. 200 During his conversation with Laura, Bob wants to show her the new 201 summer cottage he just bought. Bob interacts, using his multimedia 202 phone, with his webcam phone at this summer cottage and start live 203 video stream to Laura's multimedia phone. 205 Bob wants Laura to treat the live video stream from his webcam-phone 206 and the voice stream from his voice-only phone as part of the same 207 communication space. That is, Bob wants Laura to treat both streams 208 as belonging to the same multimedia session. 210 2.6. Answering a call using Two Separate Devices 212 Laura has a PC with a softclient and a desk phone. Bob has an 213 integrated advanced multimedia phone with camera. 215 Laura receives on her desk phone an incoming voice-video call from 216 Bob. 218 Laura decides to answer Bob's session invitation by establishing a 219 voice session with Bob using the desk phone and the video session 220 using her multimedia phone. Two SIP dialogs need to be established: 221 one between Bob's device and Laura's desk phone and one between Bob's 222 device and Laura's multimedia phone. 224 Laura wants Bob to treat the audio stream from her deskphone and the 225 video stream from her softclient as part of the same communication 226 space. That is, Laura wants Bob to treat both streams as belonging 227 to the same multimedia session. 229 2.7. Other possible use cases 231 This section just enumerates, for sake of completeness, other 232 possible use cases, similar to the one elaborated in the previous 233 sections. 235 A user wants to start or answer a call combining: 237 o Voice and video streams from a deskphone and application sharing 238 from a computer. 239 o Voice stream from a deskphone and video stream to/from a TV 240 attached to a set top box with a camera built in. 241 o The User Interface (UI) for the call on a mobile phone and the 242 audio streaming coming in/out of a speakerphone that is in the 243 same room where he is. 245 3. Existing Mechanisms to Implement Disaggregated Media 247 Figure 1 shows the media flow in the most generic scenario where both 248 the Caller and the Callee use disaggregated media, involving in the 249 multimedia session different devices under their control. 251 /----------------\ /--------------\ 252 | ---- | | ---- | 253 | | UA |====================================| UA | | 254 | ---- | video | ---- | 255 | ---- | | ---- | 256 | | UA |=================================| UA | | 257 | ---- ---- | audio | ---- ---- | 258 | | UA |=================================| UA | | 259 | ---- | text | ---- | 260 \----------------/ \--------------/ 261 Laura Bob 263 Figure 1: Media Flows in Disaggregated Media 265 All existing mechanisms to implement disaggregated media in SIP use a 266 centralized approach whereby the far end of the session receives the 267 same SIP signaling flow that it would receive if all the media 268 streams came from a single device. This makes it transparent to the 269 far end of the session the fact that the caller is using separate 270 devices for different media. 272 ---- ---- 273 | UA |\ /| UA | 274 ---- \ / ---- 275 \ / 276 ---- \---- ----/ ---- 277 | UA |-----| UA |-------------| UA |---| UA | 278 ---- /---- SIP ----\ ---- 279 / \ 280 ---- / \ ---- 281 | UA |/ \| UA | 282 ---- ---- 283 Alice Bob 285 Figure 2: Centralized scenario 287 Figure 2 shows the generic signaling flow common to all centralized 288 solutions, where a Central Node is able to manage all signaling 289 messages needed to coordinate the different user's devices and hide 290 from the far end of the session all the mechanisms used to distribute 291 the media among different devices. 293 Section 3.1, Section 3.2 and Section 3.3 analyze how different 294 existing mechanisms can be used to implement disaggregated media in a 295 centralized way. 297 3.1. Message Bus (Mbus) 299 The Message Bus (Mbus) [RFC3259] is a light-weight local coordination 300 protocol for developing component-based distributed applications. 301 Mbus provides a simple and flexible message oriented communication 302 channel for a group of components that may be distributed on multiple 303 hosts in a local network. The transport services include useful 304 features such as peer location, point-to-point and group 305 communication and security. 307 In a disaggregated media scenario a user can use Mbus to coordinate 308 the different devices under his control in a loosely coupled 309 conference control model and so involve them in the call. The 310 different devices can communicate with one another using Mbus 311 messages, and then let only one device, a call control engine, to 312 initiate, manage and terminate call control relations to other SIP 313 endpoints. In this case the fact that the caller is using separate 314 devices for different media becomes transparent to the callee. 316 Figure 3 shows an example of the relation between a call control 317 engine in an Mbus enabled conferencing system. 319 +------------------ Mbus--------------------+ 320 | | 321 | +---------------+ | 322 | |Audio Component| | 323 | +---------------+ | 324 | | | 325 | +---------------+ | +---------------+ 326 | +---------------+ | call | | SIP | SIP | 327 | |Video Component|-----| control |=================| User Agent | 328 | +---------------+ | engine | | | | 329 | +---------------+ | +---------------+ 330 | | 331 +-------------------------------------------+ 333 Figure 3: MBus Architecture 335 3.1.1. Mbus issues 337 The Mbus protocol introduces the following issues. 339 Mbus support: All the different user's devices need to support the 340 Mbus protocol. 341 Central point: Because the call control engine has a complete 342 control over the call, it needs to be involved during the whole 343 duration of the session. It cannot leave the session before the 344 whole session ends (unless it transfers the controller role to one 345 of the other user's devices). 346 Local network: Mbus uses multicast with a limited scope for message 347 transport. The multicast limits the coordination mechanism only 348 to groups of devices that are connected to a local network. So 349 Mbus can be used in a disaggregated media scenario only if all the 350 different devices under the user control, or at least the one the 351 user wants to involve in the communication, are attached to the 352 same local network. 354 3.2. Megaco (H.248) 356 The Megaco [RFC3525] protocol is used to establish, and terminate 357 calls across terminations (end points) connected to Media Gateways 358 (MGs). Megaco instructs Media Gateways (MG) to connect streams 359 coming from outside a packet network on to a packet stream such as 360 RTP. Master-MGC issues commands to send and receive media from 361 addresses, generate tones, and to modify configuration. The 362 architecture requires a Media Gateway Controller (MGC) controlling 363 the Media Gateway(s). However it does not constitute a complete 364 system. To establish communication between MGC(s) is necessary a 365 session initiation protocol. SIP is the protocol normally used to 366 establish calls across domains or across MGCs. 368 Megaco can be used in a disaggregated media scenario to let one of 369 the user's devices act as a Media Gateway Controller, coordinating 370 all the other devices under the user's control, which in this case 371 will act as Media Gateways. In this case the fact that the caller is 372 using separate devices for different media becomes transparent to the 373 callee. 375 3.2.1. Megaco issues 377 The Megaco protocol introduces the following issues. 379 Megaco support: All the different user's devices need to support the 380 Megaco protocol. 381 Central point: Because the Media Gateway Controller has a complete 382 control over the call, it needs to be involved for all the session 383 time. It can not leave the session before the whole session ends. 385 3.3. Third Part Call Control (3pcc) 387 3pcc [RFC3725] allows one entity (called the Controller) to setup and 388 manage a communications relationship between two or more other 389 parties using SIP. 391 In a disaggregated media scenario, a user can use 3pcc mechanism only 392 if at least one among the different devices under his control 393 supports this mechanism and is able to become a Controller for the 394 other in the call. In this case become transparent for the callee 395 the fact that the caller is using separate devices for different 396 media. In fact the Controller is a central point for the signaling 397 on the caller side, and has complete control over the call, making 398 everything in one dialog for the callee. 400 The call flow for disaggregated media using 3pcc is shown in 401 Figure 4. It is based on Flow IV documented in Section 4.4 of 402 [RFC3725] and recommended for calls to unknown entities, or to 403 entities known to represent people. The flow requires some SDP 404 manipulation by the Controller to convert offer2 to offer2' and 405 offer2'', and then to convert answer2' and answer2'' to answer2. 407 Alice Alice Alice Controller Bob 408 UA Media X UA Video UA Audio UA 409 | |(1) INVITE offer1 | | 410 | |no media | | 411 | |<-----------------| | 412 | |(2) 200 answer1 | | 413 | |no media | | 414 | |----------------->| | 415 | |(3) ACK | | 416 | |<-----------------| | 417 |(4) Invite offer1' | | 418 |no media | | | 419 |<------------------------------| | 420 |(5) 200 answer1' | | 421 |------------|----------------->| | 422 |(6)ACK | | | 423 |<-----------|------------------|(7) INVITE no SDP | 424 | | |----------------->| 425 | | |(8) 200 OK offer2 | 426 | |(9) INVITE offer2'|<-----------------| 427 | |<-----------------| | 428 | |(10) 200 answer2' | | 429 | |----------------->| | 430 |(11) INVITE offer2'' | | 431 |<------------------------------| | 432 |(12) 200 answer2'' | | 433 |------------------------------>|(13) ACK answer2 | 434 | |(14) ACK |----------------->| 435 |(15)ACK |<-----------------| | 436 |<------------------------------|(16) RTP | 437 | |(17) RTP |..................| 438 |(18) RTP |.....................................| 439 |..................................................| 441 Figure 4: 3pcc call flow 443 3.3.1. 3pcc issues 445 The 3pcc mechanism introduces the following issues. 447 Complexity: Third party call control only uses protocol mechanism 448 specified in [RFC3261]. However the usage of third party call 449 control becomes more complex when aspects of the call utilize SIP 450 extensions or optional features of SIP like resource reservation. 452 Central point: Because the Controller has a complete control over 453 the call, it needs to be involved during the whole duration of the 454 session. It cannot leave the session before the whole session 455 ends (unless it transfers the controller role to one of the other 456 user's devices). 457 User experience: 3ppc results in a suboptimal user experience 458 because the slave phones are not aware that they are involved in a 459 disaggregated media call scenario. Indeed, the slave phones 460 behave as they were just involved in a normal call with the 461 Controller. Moreover the slave phones will be alerted without any 462 media having been established yet. 463 SDP manipulation: the Controller cannot "proxy" the SIP messages 464 received from one of the parties. In many cases, it is required 465 to modify the SDP exchanged between the participants in order to 466 affect the changes. 468 4. Scenarios Not Covered by Existing Mechanisms 470 As discussed previously, all existing mechanisms implement 471 disaggregated media using a centralized approach. Scenarios not 472 covered by existing mechanisms include those where none of the nodes 473 can act as a controller because it does not support the necessary 474 functionality or because it will not participate in the whole session 475 (transferring the SIP dialog from a controller to a new one using 476 REFER and Replaces is complex and requires support from the far end). 477 These scenarios are better implemented using a more distributed 478 approach. 480 In a distributed scenario, the far end of the session receives 481 directly signaling messages from each of the devices involved in the 482 multimedia session. That means that the receiver needs to treat all 483 the signaling and media coming from different devices of the same 484 user as part of the same media session. 486 /------------\ 487 | ---- ____|________________ 488 | | UA | | \ 489 | ---- | \ ------ 490 | ---- | | | 491 | | UA |-------------------------| UA | 492 | ---- | | | 493 | ---- | / ------ 494 | | UA |____|________________/ Bob 495 | ---- | 496 \------------/ 497 Alice 498 Figure 5: Distributed scenario 500 Figure 5 shows the generic signaling flow in a Distributed Scenario, 501 where all the signaling messages go from the different user's devices 502 to the far end of the session. 504 Since this type of scenario is not covered by existing mechanisms, we 505 propose to initiate work on SIP extensions to support it. These 506 extensions may require support from the far end of the session. 507 While this may limit the usability of these extensions in some 508 scenarios, scenarios where an administrator can deploy devices with 509 support for a given extension (e.g., in an enterprise) could still 510 benefit from it. 512 5. Security Considerations 514 To be done. 516 6. IANA Considerations 518 This document does not require any actions by the IANA. 520 7. Informational References 522 [RFC3087] Campbell, B. and R. Sparks, "Control of Service Context 523 using SIP Request-URI", RFC 3087, April 2001. 525 [RFC3259] Ott, J., Perkins, C., and D. Kutscher, "A Message Bus for 526 Local Coordination", RFC 3259, April 2002. 528 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 529 A., Peterson, J., Sparks, R., Handley, M., and E. 530 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 531 June 2002. 533 [RFC3525] Groves, C., Pantaleo, M., Anderson, T., and T. Taylor, 534 "Gateway Control Protocol Version 1", RFC 3525, June 2003. 536 [RFC3725] Rosenberg, J., Peterson, J., Schulzrinne, H., and G. 537 Camarillo, "Best Current Practices for Third Party Call 538 Control (3pcc) in the Session Initiation Protocol (SIP)", 539 BCP 85, RFC 3725, April 2004. 541 [RFC3911] Mahy, R. and D. Petrie, "The Session Initiation Protocol 542 (SIP) "Join" Header", RFC 3911, October 2004. 544 [RFC4566] Handley, M., Jacobson, V., and C. Perkins, "SDP: Session 545 Description Protocol", RFC 4566, July 2006. 547 Authors' Addresses 549 Gonzalo Camarillo 550 Ericsson 551 Hirsalantie 11 552 Jorvas 02420 553 Finland 555 Email: Gonzalo.Camarillo@ericsson.com 557 Salvatore Loreto 558 Ericsson 559 Hirsalantie 11 560 Jorvas 02420 561 Finland 563 Email: salvatore.loreto@ericsson.com