idnits 2.17.1 draft-ietf-clue-telepresence-use-cases-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 9, 2011) is 4668 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 4582 (Obsoleted by RFC 8855) Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CLUE WG A. Romanow 3 Internet-Draft Cisco 4 Intended status: Informational S. Botzko 5 Expires: January 10, 2012 M. Duckworth 6 Polycom 7 R. Even, Ed. 8 Huawei Technologies 9 T. Eubanks 10 Iformata Communications 11 July 9, 2011 13 Use Cases for Telepresence Multi-streams 14 draft-ietf-clue-telepresence-use-cases-01.txt 16 Abstract 18 Telepresence conferencing systems seek to create the sense of really 19 being present. A number of techniques for handling audio and video 20 streams are used to create this experience. When these techniques 21 are not similar, interoperability between different systems is 22 difficult at best, and often not possible. Conveying information 23 about the relationships between multiple streams of media would allow 24 senders and receivers to make choices to allow telepresence systems 25 to interwork. This memo describes the most typical and important use 26 cases for sending multiple streams in a telepresence conference. 28 Status of this Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at http://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on January 10, 2012. 45 Copyright Notice 47 Copyright (c) 2011 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (http://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 63 2. Telepresence Scenarios Overview . . . . . . . . . . . . . . . 3 64 3. Use Case Scenarios . . . . . . . . . . . . . . . . . . . . . . 6 65 3.1. Point to point meeting: symmetric . . . . . . . . . . . . 6 66 3.2. Point to point meeting: asymmetric . . . . . . . . . . . . 7 67 3.3. Multipoint meeting . . . . . . . . . . . . . . . . . . . . 9 68 3.4. Presentation . . . . . . . . . . . . . . . . . . . . . . . 10 69 3.5. Heterogeneous Systems . . . . . . . . . . . . . . . . . . 11 70 3.6. Multipoint Education Usage . . . . . . . . . . . . . . . . 12 71 3.7. Multipoint Multiview (Virtual space) . . . . . . . . . . . 13 72 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 14 73 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 74 6. Security Considerations . . . . . . . . . . . . . . . . . . . 15 75 7. Informative References . . . . . . . . . . . . . . . . . . . . 15 76 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 78 1. Introduction 80 Telepresence applications try to provide a "being there" experience 81 for conversational video conferencing. Often this telepresence 82 application is described as "immersive telepresence" in order to 83 distinguish it from traditional video conferencing, and from other 84 forms of remote presence not related to conversational video 85 conferencing, such as avatars and robots. The salient 86 characteristics of telepresence are often described as: full-sized, 87 immersive video, preserving interpersonal interaction and allowing 88 non-verbal communication. 90 Although telepresence systems are based on open standards such as RTP 91 [RFC3550], SIP [RFC3261] , H.264, and the H.323 suite of protocols, 92 they cannot easily interoperate with each other without operator 93 assistance and expensive additional equipment which translates from 94 one vendor to another. A standard way of describing the multiple 95 streams constituting the media flows and the fundamental aspects of 96 their behavior, would allow telepresence systems to interwork. 98 This draft presents a set of use cases describing typical scenarios. 99 Requirements will be derived from these use cases in a separate 100 document. The use cases are described from the viewpoint of the 101 users. They are illustrative of the user experience that needs to be 102 supported. It is possible to implement these use cases in a variety 103 of different ways. 105 Many different scenarios need to be supported. Our strategy in this 106 document is to describe in detail the most common and basic use 107 cases. These will cover most of the requirements. Additional 108 scenarios that bring new features and requirements will be added. 110 We look at telepresence conferences that are point-to-point and 111 multipoint. In some settings, the number of displays is similar at 112 all sites, in others, the number of displays differs at different 113 sites. Both cases are considered. Also included is a use case 114 describing display of presentation or content. 116 The document structure is as follows:Section 2 gives an overview of 117 the scenarios, and Section 3 describes use cases. 119 2. Telepresence Scenarios Overview 121 This section describes the general characteristics of the use cases 122 and what the scenarios are intended to show. The typical setting is 123 a business conference, which was the initial focus of telepresence. 124 Recently consumer products are also being developed. We specifically 125 do not include in our scenarios the infrastructure aspects of 126 telepresence, such as room construction, layout and decoration. 128 Telepresence systems are typically composed of one or more video 129 cameras and encoders and one or more display monitors of large size 130 (around 60"). Microphones pick up sound and audio codec(s)produce 131 one or more audio streams. The cameras used to present the 132 telepresence users we will call participant cameras (and likewise for 133 displays). There may also be other cameras, such as for document 134 display. These will be referred to as presentation or content 135 cameras, which generally have different formats, aspect ratios, and 136 frame rates from the participant cameras. The presentation videos 137 may be shown on participant screen, or on auxiliary display screens. 138 A user's computer may also serve as a virtual content camera, 139 generating an animation or playing back a video for display to the 140 remote participants. 142 We describe such a telepresence system as sending M video streams, N 143 audio streams, and D content streams to the remote system(s). (Note 144 that the number of audio streams is generally not the same as the 145 number of video streams.) 147 The fundamental parameters describing today's typical telepresence 148 scenario include: 150 1. The number of participating sites 152 2. The number of visible seats at a site 154 3. The number of cameras 156 4. The number of audio channels 158 5. The screen size 160 6. The display capabilities - such as resolution, frame rate, 161 aspect ratio 163 7. The arrangement of the displays in relation to each other 165 8. Similar or dissimilar number of primary screens at all sites 167 9. Type and number of presentation displays 169 10. Multipoint conference display strategies - for example, the 170 camera-to-display mappings may be static or dynamic 172 11. The camera viewpoint 174 12. The cameras fields of view and how they do or do not overlap 176 The basic features that give telepresence its distinctive 177 characteristics are implemented in disparate ways in different 178 systems. Currently Telepresence systems from diverse vendors 179 interoperate to some extent, but this is not supported in a standards 180 based fashion. Interworking requires that translation and 181 transcoding devices be included in the architecture. Such devices 182 increase latency, reducing the quality of interpersonal interaction. 183 Use of these devices is often not automatic; it frequently requires 184 substantial manual configuration and a detailed understanding of the 185 nature of underlying audio and video streams. This state of affairs 186 is not acceptable for the continued growth of telepresence - we 187 believe telepresence systems should have the same ease of 188 interoperability as do telephones. 190 There is no agreed upon way to adequately describe the semantics of 191 how streams of various media types relate to each other. Without a 192 standard for stream semantics to describe the particular roles and 193 activities of each stream in the conference, interoperability is 194 cumbersome at best. 196 In a multiple screen conference, the video and audio streams sent 197 from remote participants must be understood by receivers so that they 198 can be presented in a coherent and life-like manner. This includes 199 the ability to present remote participants at their true size for 200 their apparent distance, while maintaining correct eye contact, 201 gesticular cues, and simultaneously providing a spatial audio sound 202 stage that is consistent with the video presentation. 204 The receiving device that decides how to display incoming information 205 needs to understand a number of variables such as the spatial 206 position of the speaker, the field of view of the cameras; the camera 207 zoom; which media stream is related to each of the displays; etc. It 208 is not simply that individual streams must be adequately described, 209 to a large extent this already exists, but rather that the semantics 210 of the relationships between the streams must be communicated. Note 211 that all of this is still required even if the basic aspects of the 212 streams, such as the bit rate, frame rate, and aspect ratio, are 213 known. Thus, this problem has aspects considerably beyond those 214 encountered in interoperation of single-node video conferencing 215 units. 217 3. Use Case Scenarios 219 Our development of use cases is staged, initially focusing on what is 220 currently typical and important. Use cases that add future or more 221 specialized features will be added later as needed. Also, there are 222 a number of possible variants for these use cases, for example, the 223 audio supported may differ at the end points (such as mono or stereo 224 versus surround sound), etc. 226 The use cases here are intended to be hierarchical, in that the 227 earlier use cases describe basics of telepresence that will also be 228 used by later use cases. 230 Many of these systems offer a full conference room solution where 231 local participants sit on one side of a table and remote participants 232 are displayed as if they are sitting on the other side of the table. 233 The cameras and screens are typically arranged to provide a panoramic 234 (left to right from the local user view point) view of the remote 235 room. 237 The sense of immersion and non-verbal communication is fostered by a 238 number of technical features, such as: 240 1. Good eye contact, which is achieved by careful placement of 241 participants, cameras and screens. 243 2. Camera field of view and screen sizes are matched so that the 244 images of the remote room appear to be full size. 246 3. The left side of each room is presented on the right display at 247 the far end; similarly the right side of the room is presented on 248 the left display. The effect of this is that participants of 249 each site appear to be sitting across the table from each other. 250 If two participants on the same site glance at each other, all 251 participants can observe it. Likewise, if a participant on one 252 site gestures to a participant on the other site, all 253 participants observe the gesture itself and the participants it 254 includes. 256 3.1. Point to point meeting: symmetric 258 In this case each of the two sites has an identical number of 259 screens, with cameras having fixed fields of view, and one camera for 260 each screen. The sound type is the same at each end. As an example, 261 there could be 3 cameras and 3 screens in each room, with stereo 262 sound being sent and received at each end. 264 The important thing here is that each of the 2 sites has the same 265 number of screens. Each screen is paired with a corresponding 266 camera. Each camera / screen pair is typically connected to a 267 separate codec, producing a video encoded stream for transmission to 268 the remote site, and receiving a similarly encoded stream from the 269 remote site. 271 Each system has one or multiple microphones for capturing audio. In 272 some cases, stereophonic microphones are employed. In other systems, 273 a microphone may be placed in front of each participant (or pair of 274 participants). In typical systems all the microphones are connected 275 to a single codec that sends and receives the audio streams as either 276 stereo or surround sound. The number of microphones and the number 277 of audio channels are often not the same as the number of cameras. 278 Also the number of microphones is often not the same as the number of 279 loudspeakers. 281 The audio may be transmitted as multi-channel (stereo/surround sound) 282 or as distinct and separate monophonic streams. Audio levels should 283 be matched, so the sound levels at both sites are identical. 284 Loudspeaker and microphone placements are chosen so that the sound 285 "stage" (orientation of apparent audio sources) is coordinated with 286 the video. That is, if a participant on one site speaks, the 287 participants at the remote site perceive her voice as originating 288 from her visual image. In order to accomplish this, the audio needs 289 to be mapped at the received site in the same fashion as the video. 290 That is, audio received from the right side of the room needs to be 291 output from loudspeaker(s) on the left side at the remote site, and 292 vice versa. 294 3.2. Point to point meeting: asymmetric 296 In this case, each site has a different number of screens and cameras 297 than the other site. The important characteristic of this scenario 298 is that the number of displays is different between the two sites. 299 This creates challenges which are handled differently by different 300 telepresence systems. 302 This use case builds on the basic scenario of 3 screens to 3 screens. 303 Here, we use the common case of 3 screens and 3 cameras at one site, 304 and 1 screen and 1 camera at the other site, connected by a point to 305 point call. The display sizes and camera fields of view at both 306 sites are basically similar, such that each camera view is designed 307 to show two people sitting side by side. Thus the 1 screen room has 308 up to 2 people seated at the table, while the 3 screen room may have 309 up to 6 people at the table. 311 The basic considerations of defining left and right and indicating 312 relative placement of the multiple audio and video streams are the 313 same as in the 3-3 use case. However, handling the mismatch between 314 the two sites of the number of displays and cameras requires more 315 complicated maneuvers. 317 For the video sent from the 1 camera room to the 3 screen room, 318 usually what is done is to simply use 1 of the 3 displays and keep 319 the second and third displays inactive, or put up the date, for 320 example. This would maintain the "full size" image of the remote 321 side. 323 For the other direction, the 3 camera room sending video to the 1 324 screen room, there are more complicated variations to consider. Here 325 are several possible ways in which the video streams can be handled. 327 1. The 1 screen system might simply show only 1 of the 3 camera 328 images, since the receiving side has only 1 screen. Two people 329 are seen at full size, but 4 people are not seen at all. The 330 choice of which 1 of the 3 streams to display could be fixed, or 331 could be selected by the users. It could also be made 332 automatically based on who is speaking in the 3 screen room, such 333 that the people in the 1 screen room always see the person who is 334 speaking. If the automatic selection is done at the sender, the 335 transmission of streams that are not displayed could be 336 suppressed, which would avoid wasting bandwidth. 338 2. The 1 screen system might be capable of receiving and decoding 339 all 3 streams from all 3 cameras. The 1 screen system could then 340 compose the 3 streams into 1 local image for display on the 341 single screen. All six people would be seen, but smaller than 342 full size. This could be done in conjunction with reducing the 343 image resolution of the streams, such that encode/decode 344 resources and bandwidth are not wasted on streams that will be 345 downsized for display anyway. 347 3. The 3 screen system might be capable of including all 6 people in 348 a single stream to send to the 1 screen system. For example, it 349 could use PTZ (Pan Tilt Zoom) cameras to physically adjust the 350 cameras such that 1 camera captures the whole room of six people. 351 Or it could recompose the 3 camera images into 1 encoded stream 352 to send to the remote site. These variations also show all six 353 people, but at a reduced size. 355 4. Or, there could be a combination of these approaches, such as 356 simultaneously showing the speaker in full size with a composite 357 of all the 6 participants in smaller size. 359 The receiving telepresence system needs to have information about the 360 content of the streams it receives to make any of these decisions. 362 If the systems are capable of supporting more than one strategy, 363 there needs to be some negotiation between the two sites to figure 364 out which of the possible variations they will use in a specific 365 point to point call. 367 3.3. Multipoint meeting 369 In a multipoint telepresence conference, there are more than two 370 sites participating. Additional complexity is required to enable 371 media streams from each participant to show up on the displays of the 372 other participants. 374 Clearly, there are a great number of topologies that can be used to 375 display the streams from multiple sites participating in a 376 conference. 378 One major objective for telepresence is to be able to preserve the 379 "Being there" user experience. However, in multi-site conferences it 380 is often (in fact usually) not possible to simultaneously provide 381 full size video, eye contact, common perception of gestures and gaze 382 by all participants. Several policies can be used for stream 383 distribution and display: all provide good results but they all make 384 different compromises. 386 One common policy is called site switching. Let's say the speaker is 387 at site A and everyone else is at a "remote" site. When the room at 388 site A shown, all the camera images from site A are forwarded to the 389 remote sites. Therefore at each receiving remote site, all the 390 screens display camera images from site A. This can be used to 391 preserve full size image display, and also provide full visual 392 context of the displayed far end, site A. In site switching, there is 393 a fixed relation between the cameras in each room and the displays in 394 remote rooms. The room or participants being shown is switched from 395 time to time based on who is speaking or by manual control, e.g., 396 from site A to site B. 398 Segment switching is another policy choice. Still using site A as 399 where the speaker is, and "remote" to refer to all the other sites, 400 in segment switching, rather than sending all the images from site A, 401 only the speaker at site A is shown. The camera images of the 402 current speaker and previous speakers (if any) are forwarded to the 403 other sites in the conference. Therefore the screens in each site 404 are usually displaying images from different remote sites - the 405 current speaker at site A and the previous ones. This strategy can 406 be used to preserve full size image display, and also capture the 407 non-verbal communication between the speakers. In segment switching, 408 the display depends on the activity in the remote rooms - generally, 409 but not necessarily based on audio / speech detection). 411 A third possibility is to reduce the image size so that multiple 412 camera views can be composited onto one or more screens. This does 413 not preserve full size image display, but provides the most visual 414 context (since more sites or segments can be seen). Typically in 415 this case the display mapping is static, i.e., each part of each room 416 is shown in the same location on the display screens throughout the 417 conference. 419 Other policies and combinations are also possible. For example, 420 there can be a static display of all screens from all remote rooms, 421 with part or all of one screen being used to show the current speaker 422 at full size. 424 3.4. Presentation 426 In addition to the video and audio streams showing the participants, 427 additional streams are used for presentations. 429 In systems available today, generally only one additional video 430 stream is available for presentations. Often this presentation 431 stream is half-duplex in nature, with presenters taking turns. The 432 presentation video may be captured from a PC screen, or it may come 433 from a multimedia source such as a document camera, camcorder or a 434 DVD. In a multipoint meeting, the presentation streams for the 435 currently active presentation are always distributed to all sites in 436 the meeting, so that the presentations are viewed by all. 438 Some systems display the presentation video on a screen that is 439 mounted either above or below the three participant screens. Other 440 systems provide monitors on the conference table for observing 441 presentations. If multiple presentation monitors are used, they 442 generally display identical content. There is considerable variation 443 in the placement, number, and size or presentation displays. 445 In some systems presentation audio is pre-mixed with the room audio. 446 In others, a separate presentation audio stream is provided (if the 447 presentation includes audio). 449 In H.323 systems, H.239 is typically used to control the video 450 presentation stream. In SIP systems, similar control mechanisms can 451 be provided using BFCP [RFC4582] for presentation token. These 452 mechanisms are suitable for managing a single presentation stream. 454 Although today's systems remain limited to a single video 455 presentation stream, there are obvious uses for multiple presentation 456 streams. 458 1. Frequently the meeting convener is following a meeting agenda, 459 and it is useful for her to be able to show that agenda to all 460 participants during the meeting. Other participants at various 461 remote sites are able to make presentations during the meeting, 462 with the presenters taking turns. The presentations and the 463 agenda are both shown, either on separate displays, or perhaps 464 re-scaled and shown on a single display. 466 2. A single multimedia presentation can itself include multiple 467 video streams that should be shown together. For instance, a 468 presenter may be discussing the fairness of media coverage. In 469 addition to slides which support the presenter's conclusions, she 470 also has video excerpts from various news programs which she 471 shows to illustrate her findings. She uses a DVD player for the 472 video excerpts so that she can pause and reposition the video as 473 needed. Another example is an educator who is presenting a 474 multi-screen slide show. This show requires that the placement 475 of the images on the multiple displays at each site be 476 consistent. 478 There are many other examples where multiple presentation streams are 479 useful. 481 3.5. Heterogeneous Systems 483 It is common in meeting scenarios for people to join the conference 484 from a variety of environments, using different types of endpoint 485 devices. In a multi-screen immersive telepresence conference may 486 include someone on a PC-based video conferencing system, a 487 participant calling in by phone, and (soon) someone on a handheld 488 device. 490 What experience/view will each of these devices have? 492 Some may be able to handle multiple streams and others can handle 493 only a single stream. (We are not here talking about legacy systems, 494 but rather systems built to participate in such a conference, 495 although they are single stream only.) In a single video stream , 496 the stream may contain one or more compositions depending on the 497 available screen space on the device. In most cases a transcoding 498 intermediate device will be relied upon to produce a single stream, 499 perhaps with some kind of continuous presence. 501 Bit rates will vary - the handheld and phone having lower bit rates 502 than PC and multi-screen systems. 504 Layout is accomplished according to different policies. For example, 505 a handheld and PC may receive the active speaker stream. The 506 decision can either be made explicitly by the receiver or by the 507 sender if it can receive some kind of rendering hint. The same is 508 true for audio -- i. e., that it receives a mixed stream or a number 509 of the loudest speakers if mixing is not available in the network. 511 For the software conferencing participant, the user's experience 512 depends on the application. It could be single stream, similar to a 513 handheld but with a bigger screen. Or, it could be multiple streams, 514 similar to an immersive but with a smaller screen. Control for 515 manipulation of streams can be local in the software application, or 516 in another location and sent to the application over the network. 518 The handheld device is the most extreme. How will that participant 519 be viewed and heard? it should be an equal participant, though the 520 bandwidth will be significantly less than an immersive system. A 521 receiver may choose to display output coming from a handheld 522 differently based on the resolution, but that would be the case with 523 any low resolution video stream, e. g., from a powerful PC on a bad 524 network. 526 The handheld will send and receive a single video stream, which could 527 be a composite or a subset of the conference. The handheld could say 528 what it wants or could accept whatever the sender (conference server 529 or sending endpoint) thinks is best. The handheld will have to 530 signal any actions it wants to take the same way that immersive 531 signals. 533 3.6. Multipoint Education Usage 535 The importance of this example is that the multiple video streams are 536 not used to create an immersive conferencing experience with 537 panoramic views at all the site. Instead the multiple streams are 538 dynamically used to enable full participation of remote students in a 539 university class. In some instances the same video stream is 540 displayed on multiple displays in the room, in other instances an 541 available stream is not displayed at all. 543 The main site is a university auditorium which is equipped with three 544 cameras. One camera is focused on the professor at the podium. A 545 second camera is mounted on the wall behind the professor and 546 captures the class in its entirety. The third camera is co-located 547 with the second, and is designed to capture a close up view of a 548 questioner in the audience. It automatically zooms in on that 549 student using sound localization. 551 Although the auditorium is equipped with three cameras, it is only 552 equipped with two screens. One is a large screen located at the 553 front so that the class can see it. The other is located at the rear 554 so the professor can see it. When someone asks a question, the front 555 screen shows the questioner. Otherwise it shows the professor 556 (ensuring everyone can easily see her). 558 The remote sites are typical immersive telepresence room with three 559 camera/screen pairs. 561 All remote sites display the professor on the center screen at full 562 size. A second screen shows the entire classroom view when the 563 professor is speaking. However, when a student asks a question, the 564 second screen shows the close up view of the student at full size. 565 Sometimes the student is in the auditorium; sometimes the speaking 566 student is at another remote site. The remote systems never display 567 the students that are actually in that room. 569 If someone at the remote site asks a question, then the screen in the 570 auditorium will show the remote student at full size (as if they were 571 present in the auditorium itself). The display in the rear also 572 shows this questioner, allowing the professor to see and respond to 573 the student without needing to turn her back on the main class. 575 When no one is asking a question, the screen in the rear briefly 576 shows a full-room view of each remote site in turn, allowing the 577 professor to monitor the entire class (remote and local students). 578 The professor can also use a control on the podium to see a 579 particular site - she can choose either a full-room view or a single 580 camera view. 582 Realization of this use case does not require any negotiation between 583 the participating sites. Endpoint devices (and an MCU if present) - 584 need to know who is speaking and what video stream includes the view 585 of that speaker. The remote systems need some knowledge of which 586 stream should be placed in the center. The ability of the professor 587 to see specific sites (or for the system to show all the sites in 588 turn) would also require the auditorium system to know what sites are 589 available, and to be able to request a particular view of any site. 590 Bandwidth is optimized if video that is not being shown at a 591 particular site is not distributed to that site. 593 3.7. Multipoint Multiview (Virtual space) 595 This use case describes a virtual space multipoint meeting with good 596 eye contact and spatial layout of prticipants.The use case was 597 proposed very early in the development of video conferencing systems 598 as described in 1983 by allardyce and Randal [virtualspace]. The use 599 case is illustrated in figure 2-5 of their report. The virtual space 600 expands the point to point case by having all multipoint conference 601 participants "seat" in a virtual room. In theis case each 602 participant has a fixed "seat" in the virtual room so each 603 participant expects to see a different view having a different 604 participant on his left and right side. Today, the use case is 605 implmented in multiple telepresence type video confeencing systems on 606 the market. The term "virtual space" was used in their report. The 607 main difference between the result obtained with modern systems and 608 those from 1983 are larger display sizes. 610 Virtual space multipoint as defined here assumes endpoints with 611 multiple cameras and displays. Usually there are the same number of 612 cameras and displays at a given endpoint. A camera is positioned 613 above each display. A key aspect of virtual space multipoint is the 614 details of how the cameras are aimed. The cameras are each aimed on 615 the same area of view of the participants at the site. Thus each 616 camera takes a picture of the same set of people but from a different 617 angle. Each endpoint sender in the virtual space multipoint meeting 618 therefore offers a choice of video streams to remote receivers, each 619 stream representing a different view point. For example a camera 620 positioned above a display to a participant's left may take video 621 pictures of the participant's left ear while at the same time, a 622 camera positioned above a display to the participant's right may take 623 video pictures of the participant's right ear. 625 Since a sending endpoint has a camera associated with each display, 626 an association is made between the receiving stream output on a 627 particular display and the corresponding sending stream from the 628 camera associated with that display. These associations are repeated 629 for each display/camera pair in a meeting. The result of this system 630 is a horizontal arrangement of video images from remote sites, one 631 per display. The image from each display is paired with the camera 632 output from the camera above that display resulting in excellent eye 633 contact. 635 4. Acknowledgements 637 The draft has benefitted from input from a number of people including 638 Alex Eleftheriadis, Tommy Andre Nyquist, Mark Gorzynski, Charles 639 Eckel, Nermeen Ismail, Mary Barnes, Pascal Buhler, Jim Cole. 641 5. IANA Considerations 643 This document contains no IANA considerations. 645 6. Security Considerations 647 While there are likely to be security considerations for any solution 648 for telepresence interoperability, this document has no security 649 considerations. 651 7. Informative References 653 [RFC3261] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, 654 A., Peterson, J., Sparks, R., Handley, M., and E. 655 Schooler, "SIP: Session Initiation Protocol", RFC 3261, 656 June 2002. 658 [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V. 659 Jacobson, "RTP: A Transport Protocol for Real-Time 660 Applications", STD 64, RFC 3550, July 2003. 662 [RFC4582] Camarillo, G., Ott, J., and K. Drage, "The Binary Floor 663 Control Protocol (BFCP)", RFC 4582, November 2006. 665 [virtualspace] 666 Allardyre and Randall, "Development of Teleconferencing 667 Methodologies With Emphasis on Virtual Space Videe and 668 Interactive Graphics", 1983. 670 Authors' Addresses 672 Allyn Romanow 673 Cisco 674 San Jose, CA 95134 675 US 677 Email: allyn@cisco.com 679 Stephen Botzko 680 Polycom 681 Andover, MA 01810 682 US 684 Email: stephen.botzko@polycom.com 685 Mark Duckworth 686 Polycom 687 Andover, MA 01810 688 US 690 Email: mark.duckworth@polycom.com 692 Roni Even (editor) 693 Huawei Technologies 694 Tel Aviv, 695 Israel 697 Email: even.roni@huawei.com 699 Marshall Eubanks 700 Iformata Communications 701 Dayton, Ohio 45402 702 US 704 Email: marshall.eubanks@ilformata.com