idnits 2.17.1 draft-ietf-clue-telepresence-requirements-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (December 12, 2013) is 3780 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 5117 (Obsoleted by RFC 7667) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CLUE WG A. Romanow 3 Internet-Draft Cisco Systems 4 Intended status: Informational S. Botzko 5 Expires: June 15, 2014 M. Barnes 6 Polycom 7 December 12, 2013 9 Requirements for Telepresence Multi-Streams 10 draft-ietf-clue-telepresence-requirements-07.txt 12 Abstract 14 This memo discusses the requirements for specifications, that enable 15 telepresence interoperability by describing behaviors and protocols 16 for Controlling Multiple Streams for Telepresence (CLUE). In 17 addition, the problem statement and related definitions are also 18 covered herein. 20 Status of this Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at http://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on June 15, 2014. 37 Copyright Notice 39 Copyright (c) 2013 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 4. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 5 58 5. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 7 59 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 60 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 61 8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 62 9. Informative References . . . . . . . . . . . . . . . . . . . . 11 63 Appendix A. Changes From Earlier Versions . . . . . . . . . . . . 12 64 A.1. Changes from draft -06 . . . . . . . . . . . . . . . . . . 12 65 A.2. Changes from draft -05 . . . . . . . . . . . . . . . . . . 12 66 A.3. Changes from draft -04 . . . . . . . . . . . . . . . . . . 13 67 A.4. Changes from draft -03 . . . . . . . . . . . . . . . . . . 13 68 A.5. Changes from draft -02 . . . . . . . . . . . . . . . . . . 13 69 A.6. Changes from draft -01 . . . . . . . . . . . . . . . . . . 13 70 A.7. Changes From Draft -00 . . . . . . . . . . . . . . . . . . 13 71 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 14 73 1. Introduction 75 Telepresence systems greatly improve collaboration. In a 76 telepresence conference (as used herein), the goal is to create an 77 environment that gives the users a feeling of (co-located) presence - 78 the feeling that a local user is in the same room with other local 79 users and the remote parties. Currently, systems from different 80 vendors often do not interoperate because they do the same tasks 81 differently, as discussed in the Problem Statement section below. 83 The approach taken in this memo is to set requirements for a future 84 specification(s) that, when fulfilled by an implementation of the 85 specification(s), provide for interoperability between IETF protocol 86 based telepresence systems. It is anticipated that a solution for 87 the requirements set out in this memo likely involves the exchange of 88 adequate information about participating sites; information that is 89 currently not standardized by the IETF. 91 The purpose of this document is to describe the requirements for a 92 specification that enables interworking between different SIP-based 93 [RFC3261] telepresence systems, by exchanging and negotiating 94 appropriate information. In the context of the requirements in this 95 document and related solution documents, this includes both point to 96 point SIP sessions as well as SIP based conferences as described in 97 the SIP conferencing framework [RFC4353] and the SIP based conference 98 control [RFC4579] specifications. Non IETF protocol based systems, 99 such as those based on ITU-T Rec. H.323, are out of scope. These 100 requirements are for the specification, they are not requirements on 101 the telepresence systems implementing the solution/protocol that will 102 be specified. 104 Telepresence systems of different vendors, today, can follow 105 radically different architectural approaches while offering a similar 106 user experience. CLUE will not dictate telepresence architectural 107 and implementation choices; however it will describe a protocol 108 architecture for CLUE and how it relates to other protocols. CLUE 109 enables interoperability between telepresence systems by exchanging 110 information about the systems' characteristics. Systems can use this 111 information to control their behavior to allow for interoperability 112 between those systems. 114 A telepresence session requires at least one sending and one 115 receiving endpoint. Multiparty telepresence sessions include more 116 than two endpoints, and centralized infrastructure such as Multipoint 117 Control Units (MCUs) or equivalent. CLUE specifies the syntax, 118 semantics, and control flow of information to enable the best 119 possible user experience at those endpoints. 121 Sending endpoints, or MCUs, are not mandated to use any of the CLUE 122 specifications that describe their capabilities, attributes, or 123 behavior. Similarly, it is not envisioned that endpoints or MCUs 124 must ever take into account information received. However, by making 125 available as much information as possible, and by taking into account 126 as much information as has been received or exchanged, MCUs and 127 endpoints are expected to select operation modes that enable the best 128 possible user experience under their constraints. 130 The document structure is as follows: Definitions are set out, 131 followed by a description of the problem of telepresence 132 interoperability that led to this work. Then the requirements to a 133 specification addressing the current shortcomings are enumerated and 134 discussed. 136 2. Terminology 138 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 139 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 140 document are to be interpreted as described in RFC 2119 [RFC2119]. 142 3. Definitions 144 The following terms are used throughout this document and serve as 145 reference for other documents. 147 Audio Mixing: refers to the accumulation of scaled audio signals 148 to produce a single audio stream. See RTP Topologies, [RFC5117]. 150 Conference: used as defined in [RFC4353], A Framework for 151 Conferencing within the Session Initiation Protocol (SIP). 153 Endpoint: The logical point of final termination through 154 receiving, decoding and rendering, and/or initiation through 155 capturing, encoding, and sending of media streams. An endpoint 156 consists of one or more physical devices which source and sink 157 media streams, and exactly one [RFC4353] Participant (which, in 158 turn, includes exactly one SIP User Agent). In contrast to an 159 endpoint, an MCU may also send and receive media streams, but it 160 is not the initiator nor the final terminator in the sense that 161 Media is Captured or Rendered. Endpoints can be anything from 162 multiscreen/multicamera rooms to handheld devices. 164 Endpoint Characteristics: include placement of Capture and 165 Rendering Devices, capture/render angle, resolution of cameras and 166 screens, spatial location and mixing parameters of microphones. 167 Endpoint characteristics are not specific to individual media 168 streams sent by the endpoint. 170 Layout: How rendered media streams are spatially arranged with 171 respect to each other on a single screen/mono audio telepresence 172 endpoint, and how rendered media streams are arranged with respect 173 to each other on a multiple screen/speaker telepresence endpoint. 174 Note that audio as well as video is encompassed by the term 175 layout--in other words, included is the placement of audio streams 176 on speakers as well as video streams on video screens. 178 Local: Sender and/or receiver physically co-located ("local") in 179 the context of the discussion. 181 MCU: Multipoint Control Unit (MCU) - a device that connects two or 182 more endpoints together into one single multimedia conference 183 [RFC5117]. An MCU may include a Mixer [RFC4353]. 185 Media: Any data that, after suitable encoding, can be conveyed 186 over RTP, including audio, video or timed text. 188 Model: a set of assumptions a telepresence system of a given 189 vendor adheres to and expects the remote telepresence system(s) 190 also to adhere to. 192 Remote: Sender and/or receiver on the other side of the 193 communication channel (depending on context); not Local. A remote 194 can be an Endpoint or an MCU. 196 Render: the process of generating a representation from a media, 197 such as displayed motion video or sound emitted from loudspeakers. 199 Telepresence: an environment that gives non co-located users or 200 user groups a feeling of (co-located) presence - the feeling that 201 a Local user is in the same room with other Local users and the 202 Remote parties. The inclusion of Remote parties is achieved 203 through multimedia communication including at least audio and 204 video signals of high fidelity. 206 4. Problem Statement 208 In order to create a "being there" experience characteristic of 209 telepresence, media inputs need to be transported, received, and 210 coordinated between participating systems. Different telepresence 211 systems take diverse approaches in crafting a solution, or, they 212 implement similar solutions quite differently. 214 They use disparate techniques, and they describe, control and 215 negotiate media in dissimilar fashions. Such diversity creates an 216 interoperability problem. The same issues are solved in different 217 ways by different systems, so that they are not directly 218 interoperable. This makes interworking difficult at best and 219 sometimes impossible. 221 Worse, many telepresence systems use proprietary protocol extensions 222 to solve telepresence-related problems, even if those extensions are 223 based on common standards such as SIP. 225 Some degree of interworking between systems from different vendors is 226 possible through transcoding and translation. This requires 227 additional devices, which are expensive, often not entirely 228 automatic, and they sometimes introduce unwelcome side effects, such 229 as additional delay or degraded performance. Specialized knowledge 230 is currently required to operate a telepresence conference with 231 endpoints from different vendors, for example to configure 232 transcoding and translating devices. Often such conferences do not 233 start as planned, or are interrupted by difficulties that arise. 235 The general problem that needs to be solved can be described as 236 follows. Today, each endpoint sends audio and video captures based 237 upon an implicitly assumed model for rendering a realistic depiction 238 based on this information. If all endpoints are manufactured by the 239 same vendor, they work with the same model and render the information 240 according to the model implicitly assumed by the vendor. However, if 241 the devices are from different vendors, the models they each use for 242 rendering presence can and usually do differ. The result can be that 243 the telepresence systems actually connect, but the user experience 244 suffers, for example because one system assumes that the first video 245 stream is captured from the right camera, whereas the other assumes 246 the first video stream is captured from the left camera. 248 If Alice and Bob are at different sites, Alice needs to tell Bob 249 about the camera and sound equipment arrangement at her site so that 250 Bob's receiver can create an accurate rendering of her site. Alice 251 and Bob need to agree on what the salient characteristics are as well 252 as how to represent and communicate them. Characteristics may 253 include number, placement, capture/render angle, resolution of 254 cameras and screens, spatial location and audio mixing parameters of 255 microphones. 257 The telepresence multi-stream work seeks to describe the sender 258 situation in a way that allows the receiver to render it 259 realistically even though it may have a different rendering model 260 than the sender. 262 5. Requirements 264 Although some aspects of these requirements can be met by existing 265 technology, such as SDP, they are stated here to have a complete 266 record of what the requirements for CLUE are, whether new work is 267 needed or they can be met by existing technology. Figuring this out 268 will be part of the solution development, rather than part of the 269 requirements. Note, the term "solution" is used in these 270 requirements to mean the protocol specifications, including 271 extensions to existing protocols as well as any new protocols, 272 developed to support the use cases. The solution can introduce 273 additional functionality that isn't mapped directly to these 274 requirements - e.g., the detailed information carried in the 275 signaling protocol(s). In cases where the requirements are directly 276 related to a specific use case, a reference to the use case is 277 provided. 279 REQMT-1: The solution MUST support a description of the spatial 280 arrangement of source video images sent in video streams 281 which enables a satisfactory reproduction at the receiver 282 of the original scene. This applies to each site in a 283 point to point or a multipoint meeting and refers to the 284 spatial ordering within a site, not to the ordering of 285 images between sites. 287 Use case point to point symmetric, and all other use 288 cases. 290 REQMT-1a: The solution MUST support a means of allowing 291 the preservation of the order of images in the 292 captured scene. For example, if John is to 293 Susan's right in the image capture, John is 294 also to Susan's right in the rendered image. 296 REQMT-1b: The solution MUST support a means of allowing 297 the preservation of order of images in the 298 scene in two dimensions - horizontal and 299 vertical. 301 REQMT-1c: The solution MUST support a means to identify 302 the point of capture of individual video 303 captures in three dimensions. 305 REQMT-1d: The solution MUST support a means to identify 306 the area of coverage of individual video 307 captures in three dimensions. 309 REQMT-2: The solution MUST support a description of the spatial 310 arrangement of captured source audio sent in audio streams 311 which enables a satisfactory reproduction at the receiver 312 in a spatially correct manner. This applies to each site 313 in a point to point or a multipoint meeting and refers to 314 the spatial ordering within a site, not the ordering of 315 channels between sites. 317 Use case point to point symmetric, and all use cases, 318 especially heterogeneous. 320 REQMT-2a: The solution MUST support a means of preserving 321 the spatial order of audio in the captured 322 scene. For example, if John sounds as if he is 323 at Susan's right in the captured audio, John 324 voice is also placed at Susan's right in the 325 rendered image. 327 REQMT-2b: The solution MUST support a means to identify 328 the number and spatial arrangement of audio 329 channels including monaural, stereophonic 330 (2.0), and 3.0 (left, center, right) audio 331 channels. 333 REQMT-2c: The solution MUST support a means to identify 334 the point of capture of individual audio 335 captures in three dimensions. 337 REQMT-2d: The solution MUST support a means to identify 338 the area of coverage of individual audio 339 captures in three dimensions. 341 REQMT-3: The solution MUST enable individual audio streams to be 342 associated with one or more video image captures, and 343 individual video image captures to be associated with one 344 or more audio captures, for the purpose of rendering 345 proper position. 347 Use case is point to point symmetric, and all use cases. 349 REQMT-4: The solution MUST enable interoperability between 350 endpoints that have a different number of similar devices. 351 For example, one endpoint may have 1 screen, 1 speaker, 1 352 camera, 1 mic, and another endpoint may have 3 screens, 2 353 speakers, 3 cameras and 2 microphones. Or, in a multi- 354 point conference, one endpoint may have one screen, 355 another may have 2 screens and a third may have 3 screens. 356 This includes endpoints where the number of devices of a 357 given type is zero. 359 Use case is asymmetric point to point and multipoint. 361 REQMT-5: The solution MUST support means of enabling 362 interoperability between telepresence endpoints where 363 cameras are of different picture aspect ratios. 365 REQMT-6: The solution MUST provide scaling information which 366 enables rendering of a video image at the actual size of 367 the captured scene. 369 REQMT-7: The solution MUST support means of enabling 370 interoperability between telepresence endpoints where 371 displays are of different resolutions. 373 REQMT-8: The solution MUST support methods for handling different 374 bit rates in the same conference. 376 REQMT-9: The solution MUST support means of enabling 377 interoperability between endpoints that send and receive 378 different numbers of media streams. 380 Use case heterogeneous and multipoint. 382 REQMT-10: The solution MUST ensure that endpoints that support 383 telepresence extensions can establish a session with a SIP 384 endpoint that does not support the telepresence 385 extensions. For example, in the case of a SIP endpoint 386 that supports a single audio and a single video stream, an 387 endpoint that supports the telepresence extensions would 388 setup a session with a single audio and single video 389 stream using existing SIP and SDP mechanisms. 391 REQMT-11: The solution MUST support a mechanism for determining 392 whether or not an endpoint or MCU is capable of 393 telepresence extensions. 395 REQMT-12: The solution MUST support a means to enable more than two 396 endpoints to participate in a teleconference. 398 Use case multipoint. 400 REQMT-13: The solution MUST support both transcoding and switching 401 approaches to providing multipoint conferences. 403 REQMT-14: The solution MUST support mechanisms to allow media from 404 one source endpoint or/and multiple source endpoints to be 405 sent to a remote endpoint at a particular point in time. 406 Which media is sent at a point in time may be based on 407 local policy. 409 REQMT-15: The solution MUST provide mechanisms to support the 410 following: 412 * Presentations with different media sources 414 * Presentations for which the media streams are visible 415 to all endpoints 417 * Multiple, simultaneous presentation media streams, 418 including presentation media streams that are spatially 419 related to each other. 421 Use case is presentation. 423 REQMT-16: The specification of any new protocols for the solution 424 MUST provide extensibility mechanisms. 426 REQMT-17: The solution MUST support a mechanism for allowing 427 information about media captures to change during a 428 conference. 430 REQMT-18: The solution MUST provide a mechanism for the secure 431 exchange of information about the media captures. 433 6. Acknowledgements 435 This draft has benefitted from all the comments on the mailing list 436 and a number of discussions. So many people contributed that it is 437 not possible to list them all. However, the comments provided by 438 Roberta Presta, Christian Groves and Paul Coverdale during WGLC were 439 particularly helpful in completing the WG document. 441 7. IANA Considerations 443 There are no IANA considerations associated with this specification. 445 8. Security Considerations 447 Requirement REQMT-18 identifies the need to securely transport the 448 information about media captures. It is important to note that 449 session setup for a telepresence session will use SIP for basic 450 session setup and either SIP or CCMP for a multi-party telepresence 451 session. Information carried in the SIP signaling can be secured by 452 the SIP security mechanisms as defined in [RFC3261]. In the case of 453 conference control using CCMP, the security model and mechanisms as 454 defined in the XCON Framework [RFC5239] and CCMP [RFC6503] documents 455 would meet the requirement. Any additional signaling mechanism used 456 to transport the information about media captures would need to 457 define the mechanisms by the which the information is secure. The 458 details for the mechanisms needs to be defined and described in the 459 CLUE framework document and related solution document(s). 461 9. Informative References 463 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 464 Requirement Levels", BCP 14, RFC 2119, March 1997. 466 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 467 A., Peterson, J., Sparks, R., Handley, M., and E. 468 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 469 June 2002. 471 [RFC4353] Rosenberg, J., "A Framework for Conferencing with the 472 Session Initiation Protocol (SIP)", RFC 4353, 473 February 2006. 475 [RFC4579] Johnston, A. and O. Levin, "Session Initiation Protocol 476 (SIP) Call Control - Conferencing for User Agents", 477 BCP 119, RFC 4579, August 2006. 479 [RFC5117] Westerlund, M. and S. Wenger, "RTP Topologies", RFC 5117, 480 January 2008. 482 [RFC5239] Barnes, M., Boulton, C., and O. Levin, "A Framework for 483 Centralized Conferencing", RFC 5239, June 2008. 485 [RFC6503] Barnes, M., Boulton, C., Romano, S., and H. Schulzrinne, 486 "Centralized Conferencing Manipulation Protocol", 487 RFC 6503, March 2012. 489 Appendix A. Changes From Earlier Versions 491 Note to the RFC-Editor: please remove this section prior to 492 publication as an RFC. 494 A.1. Changes from draft -06 496 Addressing IETF LC comments/editorial nits resulting in the following 497 changes: 499 o Included expansion of CLUE in the abstract. 501 o Deleted definitions for "Left" and "Right". 503 o Section 5 - clarified that solution = protocol specifications to 504 support requirements. 506 o REQMT-1d, REQMT-2d: Changed term "extent" to "area of coverage" 508 o REQMT-10 - clarified requirement with regards to interworking with 509 non-CLUE endpoints 511 o REQMT-15 - reworded to be more specific and normative 513 o REQMT-16 - expanded on what is meant by "extensibility" 515 A.2. Changes from draft -05 517 Addressing WGLC comments resulting in the following changes: 519 o REQMT-12: Changed term "site" to "endpoint" 521 o Intro: clarified that SIP based conferencing also is relevant to 522 CLUE. 524 o Intro: clarified that while CLUE doesn't dictate implementation 525 choices, it does describe a framework for the protocol solution. 527 o Clarified that mapping to use cases isn't comprehensive (i.e., 528 only done when there is a direct correlation). 530 o Added text that the requirements do not reflect all those required 531 for the solution - i.e., the solution can provide more 532 functionality as needed. 534 o Editorial nits and clarifications - changed lc "must" to UC 535 (REQMT-17). 537 A.3. Changes from draft -04 539 o Removed REQMT-2c, related to issue #37 in the tracker. 541 o Deleted REQMT-3b. Condensed REQMT-3 to subsume REQMT-3a. This is 542 related to Issue #38 in the tracker. 544 o Updated REQMT-14 based on (mailing list) resolution of Issue #39. 546 o Deleted OPEN issue section as those were transferred to the ID 547 tracker and have been resolved either by changes to this document 548 or to earlier versions of the document 550 A.4. Changes from draft -03 552 o Added a tad more text to the security section Paragraph 18. 554 A.5. Changes from draft -02 556 o Updated IANA section - i.e., no IANA registrations required. 558 o Added security requirement Paragraph 18. 560 o Added some initial text to the security section. 562 A.6. Changes from draft -01 564 o Cleaned up the Problem Statement section, re-worded. 566 o Added Requirement Paragraph 17 in response to WG Issue #4 to make 567 a requirement for dynamically changing information. Approved by 568 WG 570 o Added requirements #1.c and #1.d. Approved by WG 572 o Added requirements #2.d and #2.e. Approved by WG 574 A.7. Changes From Draft -00 576 o Requirement #2, The solution MUST support a means to identify 577 monaural, stereophonic (2.0), and 3.0 (left, center, right) audio 578 channels. 580 changed to 582 The solution MUST support a means to identify the number and 583 spatial arrangement of audio channels including monaural, 584 stereophonic (2.0), and 3.0 (left, center, right) audio channels. 586 o Added back references to the Use case document. 588 * Requirement #1 Use case point to point symmetric, and all other 589 use cases. 591 * Requirement #2 Use case point to point symmetric, and all use 592 cases, especially heterogeneous. 594 * Requirement #3 Use case point to point symmetric, and all use 595 cases. 597 * Requirement #4 Use case is asymmetric point to point, and 598 multipoint. 600 * Requirement #9 Use case heterogeneous and multipoint. 602 * Requirement #12 Use case multipoint. 604 Authors' Addresses 606 Allyn Romanow 607 Cisco Systems 608 San Jose, CA 95134 609 USA 611 Email: allyn@cisco.com 613 Stephen Botzko 614 Polycom 615 Andover, MA 01810 616 US 618 Email: stephen.botzko@polycom.com 620 Mary Barnes 621 Polycom 623 Email: mary.ietf.barnes@gmail.com