idnits 2.17.1 draft-ietf-clue-telepresence-use-cases-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (August 26, 2012) is 4260 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 4582 (Obsoleted by RFC 8855) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CLUE WG A. Romanow 3 Internet-Draft Cisco 4 Intended status: Informational S. Botzko 5 Expires: February 27, 2013 M. Duckworth 6 Polycom 7 R. Even, Ed. 8 Huawei Technologies 9 T. Eubanks 10 Iformata Communications 11 August 26, 2012 13 Use Cases for Telepresence Multi-streams 14 draft-ietf-clue-telepresence-use-cases-04.txt 16 Abstract 18 Telepresence conferencing systems seek to create the sense of really 19 being present for the participants. A number of techniques for 20 handling audio and video streams are used to create this experience. 21 When these techniques are not similar, interoperability between 22 different systems is difficult at best, and often not possible. 23 Conveying information about the relationships between multiple 24 streams of media would allow senders and receivers to make choices to 25 allow telepresence systems to interwork. This memo describes the 26 most typical and important use cases for sending multiple streams in 27 a telepresence conference. 29 Status of this Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on February 27, 2013. 46 Copyright Notice 48 Copyright (c) 2012 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2. Telepresence Scenarios Overview . . . . . . . . . . . . . . . 3 65 3. Use Case Scenarios . . . . . . . . . . . . . . . . . . . . . . 6 66 3.1. Point to point meeting: symmetric . . . . . . . . . . . . 6 67 3.2. Point to point meeting: asymmetric . . . . . . . . . . . . 7 68 3.3. Multipoint meeting . . . . . . . . . . . . . . . . . . . . 9 69 3.4. Presentation . . . . . . . . . . . . . . . . . . . . . . . 10 70 3.5. Heterogeneous Systems . . . . . . . . . . . . . . . . . . 11 71 3.6. Multipoint Education Usage . . . . . . . . . . . . . . . . 12 72 3.7. Multipoint Multiview (Virtual space) . . . . . . . . . . . 13 73 3.8. Multiple presentations streams - Telemedicine . . . . . . 14 74 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15 75 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 76 6. Security Considerations . . . . . . . . . . . . . . . . . . . 16 77 7. Informative References . . . . . . . . . . . . . . . . . . . . 16 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 80 1. Introduction 82 Telepresence applications try to provide a "being there" experience 83 for conversational video conferencing. Often this telepresence 84 application is described as "immersive telepresence" in order to 85 distinguish it from traditional video conferencing, and from other 86 forms of remote presence not related to conversational video 87 conferencing, such as avatars and robots. The salient 88 characteristics of telepresence are often described as: actual sized, 89 immersive video, preserving interpersonal interaction and allowing 90 non-verbal communication. 92 Although telepresence systems are based on open standards such as RTP 93 [RFC3550], SIP [RFC3261], H.264, and the H.323[ITU.H323]suite of 94 protocols, they cannot easily interoperate with each other without 95 operator assistance and expensive additional equipment which 96 translates from one vendor's protocol to another. A standard way of 97 describing the multiple streams constituting the media flows and the 98 fundamental aspects of their behavior, would allow telepresence 99 systems to interwork. 101 This draft presents a set of use cases describing typical scenarios. 102 Requirements will be derived from these use cases in a separate 103 document. The use cases are described from the viewpoint of the 104 users. They are illustrative of the user experience that needs to be 105 supported. It is possible to implement these use cases in a variety 106 of different ways. 108 Many different scenarios need to be supported. This document 109 describes in detail the most common and basic use cases. These will 110 cover most of the requirements. There may be additional scenarios 111 that bring new features and requirements which can be used to extend 112 the initial work. 114 Point-to-point and Multipoint telepresence conferences are 115 considered. In some use cases, the number of displays is the same at 116 all sites, in others, the number of displays differs at different 117 sites. Both use cases are considered. Also included is a use case 118 describing display of presentation material or content. 120 The document structure is as follows:Section 2 gives an overview of 121 scenarios, and Section 3 describes use cases. 123 2. Telepresence Scenarios Overview 125 This section describes the general characteristics of the use cases 126 and what the scenarios are intended to show. The typical setting is 127 a business conference, which was the initial focus of telepresence. 128 Recently consumer products are also being developed. We specifically 129 do not include in our scenarios the infrastructure aspects of 130 telepresence, such as room construction, layout and decoration. 132 Telepresence systems are typically composed of one or more video 133 cameras and encoders and one or more display monitors of large size 134 (diagonal around 60"). Microphones pick up sound and audio 135 codec(s)produce one or more audio streams. The cameras used to 136 capture the telepresence users we will call participant cameras (and 137 likewise for displays). There may also be other cameras, such as for 138 document display. These will be referred to as presentation or 139 content cameras, which generally have different formats, aspect 140 ratios, and frame rates from the participant cameras. The 141 presentation streams may be shown on participant monitor, or on 142 auxiliary display monitors. A user's computer may also serve as a 143 virtual content camera, generating an animation or playing back a 144 video for display to the remote participants. 146 We describe such a telepresence system as sending M video streams, N 147 audio streams, and D content streams to the remote system(s). (Note 148 that the number of audio streams is generally not the same as the 149 number of video streams.) 151 The fundamental parameters describing today's typical telepresence 152 scenario include: 154 1. The number of participating sites 156 2. The number of visible seats at a site 158 3. The number of cameras 160 4. The number and type of microphones 162 5. The number of audio channels 164 6. The screen size 166 7. The display capabilities - such as resolution, frame rate, 167 aspect ratio 169 8. The arrangement of the monitors in relation to each other 171 9. The same or a different number of primary monitors at all sites 173 10. Type and number of presentation monitors 174 11. Multipoint conference display strategies - for example, the 175 camera-to-display mappings may be static or dynamic 177 12. The camera viewpoint 179 13. The cameras fields of view and how they do or do not overlap 181 The basic features that give telepresence its distinctive 182 characteristics are implemented in disparate ways in different 183 systems. Currently Telepresence systems from diverse vendors 184 interoperate to some extent, but this is not supported in a standards 185 based fashion. Interworking requires that translation and 186 transcoding devices be included in the architecture. Such devices 187 increase latency, reducing the quality of interpersonal interaction. 188 Use of these devices is often not automatic; it frequently requires 189 substantial manual configuration and a detailed understanding of the 190 nature of underlying audio and video streams. This state of affairs 191 is not acceptable for the continued growth of telepresence - 192 telepresence systems should have the same ease of interoperability as 193 do telephones. 195 There is no agreed upon way to adequately describe the semantics of 196 how streams of various media types relate to each other. Without a 197 standard for stream semantics to describe the particular roles and 198 activities of each stream in the conference, interoperability is 199 cumbersome at best. 201 In a multiple screen conference, the video and audio streams sent 202 from remote participants must be understood by receivers so that they 203 can be presented in a coherent and life-like manner. This includes 204 the ability to present remote participants at their actual size for 205 their apparent distance, while maintaining correct eye contact, 206 gesticular cues, and simultaneously providing a spatial audio sound 207 stage that is consistent with the displayed video. 209 The receiving device that decides how to display incoming information 210 needs to understand a number of variables such as the spatial 211 position of the speaker, the field of view of the cameras; the camera 212 zoom; which media stream is related to each of the displays; etc. It 213 is not simply that individual streams must be adequately described, 214 to a large extent this already exists, but rather that the semantics 215 of the relationships between the streams must be communicated. Note 216 that all of this is still required even if the basic aspects of the 217 streams, such as the bit rate, frame rate, and aspect ratio, are 218 known. Thus, this problem has aspects considerably beyond those 219 encountered in interoperation of single-node video conferencing 220 units. 222 3. Use Case Scenarios 224 Our development of use cases is staged, initially focusing on what is 225 currently typical and important. Use cases that add future or more 226 specialized features will be added later as needed. Also, there are 227 a number of possible variants for these use cases, for example, the 228 audio supported may differ at the end points (such as mono or stereo 229 versus surround sound), etc. 231 The use cases here are intended to be hierarchical, in that the 232 earlier use cases describe basics of telepresence that will also be 233 used by later use cases. 235 Many of these systems offer a full conference room solution where 236 local participants sit on one side of a table and remote participants 237 are displayed as if they are sitting on the other side of the table. 238 The cameras and screens are typically arranged to provide a panoramic 239 (left to right from the local user view point) view of the remote 240 room. 242 The sense of immersion and non-verbal communication is fostered by a 243 number of technical features, such as: 245 1. Good eye contact, which is achieved by careful placement of 246 participants, cameras and screens. 248 2. Camera field of view and screen sizes are matched so that the 249 images of the remote room appear to be full size. 251 3. The left side of each room is presented on the right display at 252 the far end; similarly the right side of the room is presented on 253 the left display. The effect of this is that participants of 254 each site appear to be sitting across the table from each other. 255 If two participants on the same site glance at each other, all 256 participants can observe it. Likewise, if a participant on one 257 site gestures to a participant on the other site, all 258 participants observe the gesture itself and the participants it 259 includes. 261 3.1. Point to point meeting: symmetric 263 In this case each of the two sites has an identical number of 264 screens, with cameras having fixed fields of view, and one camera for 265 each screen. The sound type is the same at each end. As an example, 266 there could be 3 cameras and 3 screens in each room, with stereo 267 sound being sent and received at each end. 269 The important thing here is that each of the 2 sites has the same 270 number of screens. Each screen is paired with a corresponding 271 camera. Each camera / screen pair is typically connected to a 272 separate codec, producing a video encoded stream for transmission to 273 the remote site, and receiving a similarly encoded stream from the 274 remote site. 276 Each system has one or multiple microphones for capturing audio. In 277 some cases, stereophonic microphones are employed. In other systems, 278 a microphone may be placed in front of each participant (or pair of 279 participants). In typical systems all the microphones are connected 280 to a single codec that sends and receives the audio streams as either 281 stereo or surround sound. The number of microphones and the number 282 of audio channels are often not the same as the number of cameras. 283 Also the number of microphones is often not the same as the number of 284 loudspeakers. 286 The audio may be transmitted as multi-channel (stereo/surround sound) 287 or as distinct and separate monophonic streams. Audio levels should 288 be matched, so the sound levels at both sites are identical. 289 Loudspeaker and microphone placements are chosen so that the sound 290 "stage" (orientation of apparent audio sources) is coordinated with 291 the video. That is, if a participant on one site speaks, the 292 participants at the remote site perceive her voice as originating 293 from her visual image. In order to accomplish this, the audio needs 294 to be mapped at the received site in the same fashion as the video. 295 That is, audio received from the right side of the room needs to be 296 output from loudspeaker(s) on the left side at the remote site, and 297 vice versa. 299 3.2. Point to point meeting: asymmetric 301 In this case, each site has a different number of screens and cameras 302 than the other site. The important characteristic of this scenario 303 is that the number of displays is different between the two sites. 304 This creates challenges which are handled differently by different 305 telepresence systems. 307 This use case builds on the basic scenario of 3 screens to 3 screens. 308 Here, we use the common case of 3 screens and 3 cameras at one site, 309 and 1 screen and 1 camera at the other site, connected by a point to 310 point call. The display sizes and camera fields of view at both 311 sites are basically similar, such that each camera view is designed 312 to show two people sitting side by side. Thus the 1 screen room has 313 up to 2 people seated at the table, while the 3 screen room may have 314 up to 6 people at the table. 316 The basic considerations of defining left and right and indicating 317 relative placement of the multiple audio and video streams are the 318 same as in the 3-3 use case. However, handling the mismatch between 319 the two sites of the number of displays and cameras requires more 320 complicated manoeuvres. 322 For the video sent from the 1 camera room to the 3 screen room, 323 usually what is done is to simply use 1 of the 3 displays and keep 324 the second and third displays inactive, or put up the date, for 325 example. This would maintain the "full size" image of the remote 326 side. 328 For the other direction, the 3 camera room sending video to the 1 329 screen room, there are more complicated variations to consider. Here 330 are several possible ways in which the video streams can be handled. 332 1. The 1 screen system might simply show only 1 of the 3 camera 333 images, since the receiving side has only 1 screen. Two people 334 are seen at full size, but 4 people are not seen at all. The 335 choice of which 1 of the 3 streams to display could be fixed, or 336 could be selected by the users. It could also be made 337 automatically based on who is speaking in the 3 screen room, such 338 that the people in the 1 screen room always see the person who is 339 speaking. If the automatic selection is done at the sender, the 340 transmission of streams that are not displayed could be 341 suppressed, which would avoid wasting bandwidth. 343 2. The 1 screen system might be capable of receiving and decoding 344 all 3 streams from all 3 cameras. The 1 screen system could then 345 compose the 3 streams into 1 local image for display on the 346 single screen. All six people would be seen, but smaller than 347 full size. This could be done in conjunction with reducing the 348 image resolution of the streams, such that encode/decode 349 resources and bandwidth are not wasted on streams that will be 350 downsized for display anyway. 352 3. The 3 screen system might be capable of including all 6 people in 353 a single stream to send to the 1 screen system. For example, it 354 could use PTZ (Pan Tilt Zoom) cameras to physically adjust the 355 cameras such that 1 camera captures the whole room of six people. 356 Or it could recompose the 3 camera images into 1 encoded stream 357 to send to the remote site. These variations also show all six 358 people, but at a reduced size. 360 4. Or, there could be a combination of these approaches, such as 361 simultaneously showing the speaker in full size with a composite 362 of all the 6 participants in smaller size. 364 The receiving telepresence system needs to have information about the 365 content of the streams it receives to make any of these decisions. 367 If the systems are capable of supporting more than one strategy, 368 there needs to be some negotiation between the two sites to figure 369 out which of the possible variations they will use in a specific 370 point to point call. 372 3.3. Multipoint meeting 374 In a multipoint telepresence conference, there are more than two 375 sites participating. Additional complexity is required to enable 376 media streams from each participant to show up on the displays of the 377 other participants. 379 Clearly, there are a great number of topologies that can be used to 380 display the streams from multiple sites participating in a 381 conference. 383 One major objective for telepresence is to be able to preserve the 384 "Being there" user experience. However, in multi-site conferences it 385 is often (in fact usually) not possible to simultaneously provide 386 full size video, eye contact, common perception of gestures and gaze 387 by all participants. Several policies can be used for stream 388 distribution and display: all provide good results but they all make 389 different compromises. 391 One common policy is called site switching. Let's say the speaker is 392 at site A and everyone else is at a "remote" site. When the room at 393 site A shown, all the camera images from site A are forwarded to the 394 remote sites. Therefore at each receiving remote site, all the 395 screens display camera images from site A. This can be used to 396 preserve full size image display, and also provide full visual 397 context of the displayed far end, site A. In site switching, there is 398 a fixed relation between the cameras in each room and the displays in 399 remote rooms. The room or participants being shown is switched from 400 time to time based on who is speaking or by manual control, e.g., 401 from site A to site B. 403 Segment switching is another policy choice. Still using site A as 404 where the speaker is, and "remote" to refer to all the other sites, 405 in segment switching, rather than sending all the images from site A, 406 only the speaker at site A is shown. The camera images of the 407 current speaker and previous speakers (if any) are forwarded to the 408 other sites in the conference. Therefore the screens in each site 409 are usually displaying images from different remote sites - the 410 current speaker at site A and the previous ones. This strategy can 411 be used to preserve full size image display, and also capture the 412 non-verbal communication between the speakers. In segment switching, 413 the display depends on the activity in the remote rooms - generally, 414 but not necessarily based on audio / speech detection). 416 A third possibility is to reduce the image size so that multiple 417 camera views can be composited onto one or more screens. This does 418 not preserve full size image display, but provides the most visual 419 context (since more sites or segments can be seen). Typically in 420 this case the display mapping is static, i.e., each part of each room 421 is shown in the same location on the display screens throughout the 422 conference. 424 Other policies and combinations are also possible. For example, 425 there can be a static display of all screens from all remote rooms, 426 with part or all of one screen being used to show the current speaker 427 at full size. 429 3.4. Presentation 431 In addition to the video and audio streams showing the participants, 432 additional streams are used for presentations. 434 In systems available today, generally only one additional video 435 stream is available for presentations. Often this presentation 436 stream is half-duplex in nature, with presenters taking turns. The 437 presentation stream may be captured from a PC screen, or it may come 438 from a multimedia source such as a document camera, camcorder or a 439 DVD. In a multipoint meeting, the presentation streams for the 440 currently active presentation are always distributed to all sites in 441 the meeting, so that the presentations are viewed by all. 443 Some systems display the presentation streams on a screen that is 444 mounted either above or below the three participant screens. Other 445 systems provide monitors on the conference table for observing 446 presentations. If multiple presentation monitors are used, they 447 generally display identical content. There is considerable variation 448 in the placement, number, and size or presentation displays. 450 In some systems presentation audio is pre-mixed with the room audio. 451 In others, a separate presentation audio stream is provided (if the 452 presentation includes audio). 454 In H.323[ITU.H323] systems, H.239[ITU.H239] is typically used to 455 control the video presentation stream. In SIP systems, similar 456 control mechanisms can be provided using BFCP [RFC4582] for 457 presentation token. These mechanisms are suitable for managing a 458 single presentation stream. 460 Although today's systems remain limited to a single video 461 presentation stream, there are obvious uses for multiple presentation 462 streams: 464 1. Frequently the meeting convener is following a meeting agenda, 465 and it is useful for her to be able to show that agenda to all 466 participants during the meeting. Other participants at various 467 remote sites are able to make presentations during the meeting, 468 with the presenters taking turns. The presentations and the 469 agenda are both shown, either on separate displays, or perhaps 470 re-scaled and shown on a single display. 472 2. A single multimedia presentation can itself include multiple 473 video streams that should be shown together. For instance, a 474 presenter may be discussing the fairness of media coverage. In 475 addition to slides which support the presenter's conclusions, she 476 also has video excerpts from various news programs which she 477 shows to illustrate her findings. She uses a DVD player for the 478 video excerpts so that she can pause and reposition the video as 479 needed. 481 3. An educator who is presenting a multi-screen slide show. This 482 show requires that the placement of the images on the multiple 483 displays at each site be consistent. 485 There are many other examples where multiple presentation streams are 486 useful. 488 3.5. Heterogeneous Systems 490 It is common in meeting scenarios for people to join the conference 491 from a variety of environments, using different types of endpoint 492 devices. A multi-screen immersive telepresence conference may 493 include someone on a PC-based video conferencing system, a 494 participant calling in by phone, and (soon) someone on a handheld 495 device. 497 What experience/view will each of these devices have? 499 Some may be able to handle multiple streams and others can handle 500 only a single stream. (We are not here talking about legacy systems, 501 but rather systems built to participate in such a conference, 502 although they are single stream only.) In a single video stream , 503 the stream may contain one or more compositions depending on the 504 available screen space on the device. In most cases an intermediate 505 transcoding device will be relied upon to produce a single stream, 506 perhaps with some kind of continuous presence. 508 Bit rates will vary - the handheld and phone having lower bit rates 509 than PC and multi-screen systems. 511 Layout is accomplished according to different policies. For example, 512 a handheld and PC may receive the active speaker stream. The 513 decision can either be made explicitly by the receiver or by the 514 sender if it can receive some kind of rendering hint. The same is 515 true for audio -- i.e., that it receives a mixed stream or a number 516 of the loudest speakers if mixing is not available in the network. 518 For the PC based conferencing participant, the user's experience 519 depends on the application. It could be single stream, similar to a 520 handheld but with a bigger screen. Or, it could be multiple streams, 521 similar to an immersive telepresence system but with a smaller 522 screen. Control for manipulation of streams can be local in the 523 software application, or in another location and sent to the 524 application over the network. 526 The handheld device is the most extreme. How will that participant 527 be viewed and heard? It should be an equal participant, though the 528 bandwidth will be significantly less than an immersive system. A 529 receiver may choose to display output coming from a handheld 530 differently based on the resolution, but that would be the case with 531 any low resolution video stream, e. g., from a powerful PC on a bad 532 network. 534 The handheld will send and receive a single video stream, which could 535 be a composite or a subset of the conference. The handheld could say 536 what it wants or could accept whatever the sender (conference server 537 or sending endpoint) thinks is best. The handheld will have to 538 signal any actions it wants to take the same way that immersive 539 system signals actions. 541 3.6. Multipoint Education Usage 543 The importance of this example is that the multiple video streams are 544 not used to create an immersive conferencing experience with 545 panoramic views at all the site. Instead the multiple streams are 546 dynamically used to enable full participation of remote students in a 547 university class. In some instances the same video stream is 548 displayed on multiple displays in the room, in other instances an 549 available stream is not displayed at all. 551 The main site is a university auditorium which is equipped with three 552 cameras. One camera is focused on the professor at the podium. A 553 second camera is mounted on the wall behind the professor and 554 captures the class in its entirety. The third camera is co-located 555 with the second, and is designed to capture a close up view of a 556 questioner in the audience. It automatically zooms in on that 557 student using sound localization. 559 Although the auditorium is equipped with three cameras, it is only 560 equipped with two screens. One is a large screen located at the 561 front so that the class can see it. The other is located at the rear 562 so the professor can see it. When someone asks a question, the front 563 screen shows the questioner. Otherwise it shows the professor 564 (ensuring everyone can easily see her). 566 The remote sites are typical immersive telepresence room with three 567 camera/screen pairs. 569 All remote sites display the professor on the center screen at full 570 size. A second screen shows the entire classroom view when the 571 professor is speaking. However, when a student asks a question, the 572 second screen shows the close up view of the student at full size. 573 Sometimes the student is in the auditorium; sometimes the speaking 574 student is at another remote site. The remote systems never display 575 the students that are actually in that room. 577 If someone at the remote site asks a question, then the screen in the 578 auditorium will show the remote student at full size (as if they were 579 present in the auditorium itself). The display in the rear also 580 shows this questioner, allowing the professor to see and respond to 581 the student without needing to turn her back on the main class. 583 When no one is asking a question, the screen in the rear briefly 584 shows a full-room view of each remote site in turn, allowing the 585 professor to monitor the entire class (remote and local students). 586 The professor can also use a control on the podium to see a 587 particular site - she can choose either a full-room view or a single 588 camera view. 590 Realization of this use case does not require any negotiation between 591 the participating sites. Endpoint devices (and an MCU if present) - 592 need to know who is speaking and what video stream includes the view 593 of that speaker. The remote systems need some knowledge of which 594 stream should be placed in the center. The ability of the professor 595 to see specific sites (or for the system to show all the sites in 596 turn) would also require the auditorium system to know what sites are 597 available, and to be able to request a particular view of any site. 598 Bandwidth is optimized if video that is not being shown at a 599 particular site is not distributed to that site. 601 3.7. Multipoint Multiview (Virtual space) 603 This use case describes a virtual space multipoint meeting with good 604 eye contact and spatial layout of prticipants.The use case was 605 proposed very early in the development of video conferencing systems 606 as described in 1983 by Allardyce and Randal [virtualspace]. The use 607 case is illustrated in figure 2-5 of their report. The virtual space 608 expands the point to point case by having all multipoint conference 609 participants "seat" in a virtual room. In this case each participant 610 has a fixed "seat" in the virtual room so each participant expects to 611 see a different view having a different participant on his left and 612 right side. Today, the use case is implemented in multiple 613 telepresence type video conferencing systems on the market. The term 614 "virtual space" was used in their report. The main difference 615 between the result obtained with modern systems and those from 1983 616 are larger display sizes. 618 Virtual space multipoint as defined here assumes endpoints with 619 multiple cameras and displays. Usually there is the same number of 620 cameras and displays at a given endpoint. A camera is positioned 621 above each display. A key aspect of virtual space multipoint is the 622 details of how the cameras are aimed. The cameras are each aimed on 623 the same area of view of the participants at the site. Thus each 624 camera takes a picture of the same set of people but from a different 625 angle. Each endpoint sender in the virtual space multipoint meeting 626 therefore offers a choice of video streams to remote receivers, each 627 stream representing a different view point. For example a camera 628 positioned above a display to a participant's left may take video 629 pictures of the participant's left ear while at the same time, a 630 camera positioned above a display to the participant's right may take 631 video pictures of the participant's right ear. 633 Since a sending endpoint has a camera associated with each display, 634 an association is made between the receiving stream output on a 635 particular display and the corresponding sending stream from the 636 camera associated with that display. These associations are repeated 637 for each display/camera pair in a meeting. The result of this system 638 is a horizontal arrangement of video images from remote sites, one 639 per display. The image from each display is paired with the camera 640 output from the camera above that display resulting in excellent eye 641 contact. 643 3.8. Multiple presentations streams - Telemedicine 645 This use case describes a scenario where multiple presentation 646 streams are used. In this use case, the local site is a surgery room 647 connected to one or more remote sites that may have different 648 capabilities. At the local site three main cameras capture the whole 649 room (typical 3 camera Telepresence case). Also multiple 650 presentation inputs are available: a surgery camera which is used to 651 provide a zoomed view of the operation, an endoscopic monitor, an 652 X-ray CT image output device, a B-ultrasonic apparatus, a cardiogram 653 generator, an MRI image instrument, etc. These devices are used to 654 provide multiple local video presentation streams to help the surgeon 655 monitor the status of the patient and assist the process of the 656 surgery. 658 The local site may have three main screens and one (or more) 659 presentation screen(s). The main screens can be used to display the 660 remote experts. The presentation screen(s) can be used to display 661 multiple presentation streams from local and remote sites 662 simultaneously. The three main cameras capture different parts of 663 the surgery room. The surgeon can decide the number, the size and 664 the placement of the presentations displayed on the local 665 presentation screen(s). He can also indicate which local 666 presentation captures are provided for the remote sites. The local 667 site can send multiple presentation captures to remote sites and it 668 can receive multiple presentations related to the patient or the 669 procedure from them. 671 One type of remote site is a single or dual screen and one camera 672 system used by a consulting expert. In the general case the remote 673 sites can be part of a multipoint Telepresence conference. The 674 presentation screens at the remote sites allow the experts to see the 675 details of the operation and related data. Like the main site, the 676 experts can decide the number, the size and the placement of the 677 presentations displayed on the presentation screens. The 678 presentation screens can display presentation streams from the 679 surgery room or from other remote sites and also local presentation 680 streams. Thus the experts can also start sending presentation 681 streams, which can carry medical records, pathology data, or their 682 reference and analysis, etc. 684 Another type of remote site is a typical immersive Telepresence room 685 with three camera/screen pairs allowing more experts to join the 686 consultation. These sites can also be used for education. The 687 teacher, who is not necessarily the surgeon, and the students are in 688 different remote sites. Students can observe and learn the details 689 of the whole procedure, while the teacher can explain and answer 690 questions during the operation. 692 All remote education sites can display the surgery room. Another 693 option is to display the surgery room on the center screen, and the 694 rest of the screens can show the teacher and the student who is 695 asking a question. For all the above sites, multiple presentation 696 screens can be used to enhance visibility: one screen for the zoomed 697 surgery stream and the others for medical image streams, such as MRI 698 images, cardiogram, B-ultrasonic images and pathology data. 700 4. Acknowledgements 702 The draft has benefitted from input from a number of people including 703 Alex Eleftheriadis, Tommy Andre Nyquist, Mark Gorzynski, Charles 704 Eckel, Nermeen Ismail, Mary Barnes, Pascal Buhler, Jim Cole. 706 Special acknowledgement to Lennard Xiao who contributed the text for 707 the telemedicine use case 709 5. IANA Considerations 711 This document contains no IANA considerations. 713 6. Security Considerations 715 While there are likely to be security considerations for any solution 716 for telepresence interoperability, this document has no security 717 considerations. 719 7. Informative References 721 [ITU.H239] 722 "Role management and additional media channels for H.300- 723 series terminals", ITU-T Recommendation H.239, 724 September 2005. 726 [ITU.H323] 727 "Packet-based Multimedia Communications Systems", ITU- 728 T Recommendation H.323, December 2009. 730 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 731 A., Peterson, J., Sparks, R., Handley, M., and E. 732 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 733 June 2002. 735 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 736 Jacobson, "RTP: A Transport Protocol for Real-Time 737 Applications", STD 64, RFC 3550, July 2003. 739 [RFC4582] Camarillo, G., Ott, J., and K. Drage, "The Binary Floor 740 Control Protocol (BFCP)", RFC 4582, November 2006. 742 [virtualspace] 743 Allardyce and Randall, "Development of Teleconferencing 744 Methodologies With Emphasis on Virtual Space Videe and 745 Interactive Graphics", 1983. 747 Authors' Addresses 749 Allyn Romanow 750 Cisco 751 San Jose, CA 95134 752 US 754 Email: allyn@cisco.com 756 Stephen Botzko 757 Polycom 758 Andover, MA 01810 759 US 761 Email: stephen.botzko@polycom.com 763 Mark Duckworth 764 Polycom 765 Andover, MA 01810 766 US 768 Email: mark.duckworth@polycom.com 770 Roni Even (editor) 771 Huawei Technologies 772 Tel Aviv, 773 Israel 775 Email: roni.even@mail01.huawei.com 777 Marshall Eubanks 778 Iformata Communications 779 Dayton, Ohio 45402 780 US 782 Email: marshall.eubanks@ilformata.com