idnits 2.17.1 draft-hansen-clue-consumer-layout-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (May 31, 2012) is 4345 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'VC0' is mentioned on line 230, but not defined == Missing Reference: 'VC1' is mentioned on line 231, but not defined == Missing Reference: 'VC2' is mentioned on line 231, but not defined == Missing Reference: 'VC3' is mentioned on line 232, but not defined == Missing Reference: 'VC4' is mentioned on line 232, but not defined == Missing Reference: 'VC5' is mentioned on line 232, but not defined == Missing Reference: 'VC6' is mentioned on line 233, but not defined == Missing Reference: 'VC7' is mentioned on line 233, but not defined == Missing Reference: 'VC8' is mentioned on line 233, but not defined == Missing Reference: 'VC9' is mentioned on line 233, but not defined == Outdated reference: A later version (-25) exists of draft-ietf-clue-framework-05 Summary: 1 error (**), 0 flaws (~~), 13 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 CLUE R. Hansen 3 Internet-Draft Cisco Systems 4 Intended status: Standards Track A. Pepperell 5 Expires: December 2, 2012 Silverflare 6 A. Romanow 7 B. Baldino 8 Cisco Systems 9 M. Duckworth 10 Polycom 11 May 31, 2012 13 The need for consumer spatial information in CLUE 14 draft-hansen-clue-consumer-layout-00 16 Abstract 18 This draft is for discussion in the CLUE working group. It proposes 19 adding the ability for the consumer to provide specific information 20 to the provider. 22 This document proposes allowing consumers to include spatial 23 parameters in their consumer requests to providers in order to 24 improve the provider's ability to assign media to streams in a way 25 that is helpful for rendering. The solution proposed here is in 26 partial response to CLUE Task #10, Does Framework provide sufficient 27 info for receiver? 29 Status of this Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at http://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on December 2, 2012. 46 Copyright Notice 48 Copyright (c) 2012 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3 65 3. Motiviation - Conferencing in CLUE . . . . . . . . . . . . . . 3 66 4. Issues associated with subscribing to multiple switched 67 captures . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 68 4.1. Provider advertising spatially-related switched 69 captures . . . . . . . . . . . . . . . . . . . . . . . . . 5 70 5. Consumer includes optional spatial information . . . . . . . . 6 71 5.1. Applicability of consumer spatial information to audio . . 8 72 6. Implications and conclusions . . . . . . . . . . . . . . . . . 8 73 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 74 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 75 8.1. Normative References . . . . . . . . . . . . . . . . . . . 8 76 8.2. Informative References . . . . . . . . . . . . . . . . . . 9 77 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 9 79 1. Introduction 81 This draft notes some limitations of CLUE when it comes to correctly 82 rendering video under certain conditions, and proposes the optional 83 addition of spatial information by the consumer to resolve these 84 issues. This does not imply that the authors believe that the 85 proposed solution is the only option available; rather, this draft is 86 meant as a starting point for discussion. 88 2. Terminology 90 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 91 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 92 document are to be interpreted as described in RFC 2119 [RFC2119] and 93 indicate requirement levels for compliant implementations. 95 3. Motiviation - Conferencing in CLUE 97 The current methodology of the CLUE framework 98 [I-D.ietf-clue-framework] is well suited to the case of systems with 99 a relatively static set of capture devices. However, scenarios with 100 a much more dynamic set of capture devices being presented to 101 consumers, such as a voice-switched conferencing where multiple 102 endpoints connect to a middle box such as an MCU, present additional 103 challenges. An example of such a scenario is shown below, with four 104 endpoints A, B, C and D in a conference: 106 +-----+ 107 +---+ / \ +---+ 108 | A |----/ \----| B | 109 +---+ / \ +---+ 110 + MCU + 111 +---+ \ / +---+ 112 | C |----\ /----| D | 113 +---+ \ / +---+ 114 +-----+ 116 In this scenario endpoint A is not directly connected to any of the 117 other endpoints and so will not have the capture information 118 associated with their media streams directly available. 120 One approach is for the MCU to advertise B, C and D's captures as 121 separate capture scenes to A - A can then subscribe to any capture 122 from any of the other endpoints. 124 However, as the size of the conference increases the number of 125 captures that must be advertised will quickly become impractical. 126 Further, in many conferencing scenarios, endpoints do not wish to 127 specify the endpoints they want to see - instead they wish to see the 128 video and audio from the 'most relevant' endpoints as determined by 129 the MCU (where relevance is usually determined by audio activity 130 level). Finally, advertising all available captures in this fashion 131 can be problematic in the case of captures that are simultaneously 132 exclusive, as one consumer may ask for one and a second for its 133 mutually exclusive partner. 135 As such, the MCU has the ability in CLUE to advertise switched 136 captures; these don't directly represent specific real video or audio 137 captures. Instead, subscribing to one of these captures means that 138 the provider will switch the stream it sends to the consumer based on 139 its internal logic. In the example above, the MCU might advertise a 140 single, switched video capture to A; if A subscribed to this then the 141 MCU would forward the video stream from B, C or D based on which it 142 felt was most relevant (often calculated based on the loudness of an 143 associated audio stream). 145 4. Issues associated with subscribing to multiple switched captures 147 As such, The consumer A from the previous example can subscribe to 148 one or more of these switched captures and will receive that many 149 streams from the MCU, switched from their originating source. 150 However, A does not receive the spatial capture information from the 151 originating source associated with these streams alongside the RTP 152 packets. As a result things become more complicated when A 153 subscribes to multiple video captures, and when the other endpoints 154 provide multiple video streams with correlated spatial information. 155 For example, if A is a three-screen system and hence requests three 156 streams, if all the streams it receives are independent it can render 157 them as it wishes, as shown below where it receives one stream from 158 each of B, C and D: 160 +------+ +------+ +------+ 161 | | | | | | 162 | B | | C | | D | 163 | | | | | | 164 +------+ +------+ +------+ 166 However, if A receives more than one stream from a particular 167 endpoint and these streams have related spatial relationships then it 168 is possible for A to lay them out erroneously. This is illustrated 169 below, where A is receiving three streams of video that originated at 170 B, which should correctly be ordered (L)eft, (C)enter, (R)ight: 172 +------+ +------+ +------+ 173 | | | | | | 174 | B(L) | | B(C) | | B(R) | Correct 175 | | | | | | 176 +------+ +------+ +------+ 178 +------+ +------+ +------+ 179 | | | | | | 180 | B(C) | | B(R) | | B(L) | Incorrect 181 | | | | | | 182 +------+ +------+ +------+ 184 When laid out incorrectly this leads to objects (such as a person 185 being viewed) being split into sections displayed in disparate, non- 186 contiguous locations. 188 This problem could be solved if A had the spatial capture information 189 from B. In a small conference it may be possible for the middle box 190 to pre-send all the capture information from all other endpoints to A 191 (and to every other endpoint), but as the number of captures per 192 endpoint and the number of endpoints in a conference rise caching all 193 the data becomes impractical. 195 An alternative would be for A to request the originating capture 196 information for streams it is receiving, or for the MCU to send it 197 whenever it switches streams. However, because the RTP packets and 198 the CLUE capture information will be sent in separate channels this 199 will lead to cases where A is receiving RTP packets but has not yet 200 received the corresponding capture data and the same problem occurs. 201 The endpoint must then choose between displaying nothing or risk 202 making incorrect layout choices. 204 4.1. Provider advertising spatially-related switched captures 206 One tool that already exists within the CLUE framework that can be 207 used to partially solve this problem is the MCU including spatial 208 information for the switched captures it advertised. In this case, 209 for example, it would advertise three captures with area of capture 210 information for each that portray them as the left, center and right 211 captures of a single hypothetical room. In this case, when the MCU 212 has unrelated one-screen streams to send to A it can associate them 213 with whichever switched capture it chooses. But when sending a two- 214 or three-screen set of streams it can ensure that they are correctly 215 laid out adjacent to each other and in the correct order. A could 216 then request these three captures and render the streams 217 appropriately on its left, center and right screen, needing to take 218 no action to ensure that the streams are correctly laid out. 220 However, this solution is not sufficient for all use-cases. The 221 issue is that the MCU will need to advertise a suitable separate 222 group of switched captures for each endpoint configuration that could 223 connect to it. If the possible endpoint configurations are limited, 224 this may still represent a plausible number; for instance, an MCU 225 that wanted to support endpoints with one, two, three or four screens 226 laid out contigously left-to-right could advertise a capture set with 227 the following entries: 229 { 230 [VC0] 231 [VC1, VC2] 232 [VC3, VC4, VC5] 233 [VC6, VC7, VC8, VC9] 234 } 236 where VC0 was a single switched capture, VC1 and VC2 were two 237 switched captures each representing half the scene, and so on. 239 But this means that the MCU is only able to support certain pre- 240 defined layouts - supporting additional configurations of screens 241 (such as a 2x2 array) requires a new entry for each, and designing a 242 new endpoint configuration means updating all the MCUs it 243 interoperates with. This problem becomes particularly acute if the 244 endpoint has many screens, or wants to perform local composition 245 (subscribing to multiple streams per screen and rendering them 246 locally for display) - this both substantially increases the number 247 of streams that the endpoint would wish to subscribe to, and 248 increases the complexity of layouts possible. For instance, an 249 endpoint with two screens that wanted to show a 2x2 grid of 250 participants on each would need to subscribe to eight captures with 251 appropriate spatial information. 253 5. Consumer includes optional spatial information 255 We can address these issues and allow an endpoint more complex stream 256 rendering configurations, while substantially reducing the number and 257 complexity of switched captures the MCU must advertise. The approach 258 is for the consumer to optionally include some information on the 259 spatial relationships with its rendering as part of its request. 260 This allows the MCU to advertise a single collection of switched 261 captures with no spatial information for the consumer to subscribe 262 to, rather than attempting to anticipate every layout an endpoint 263 might desire, and having to advertise an entry for each with suitable 264 spatial information. 266 There are a number of forms this consumer information could take. 268 However, the form most consistent with the existing CLUE data model, 269 and offering most flexibility for the future, is for the consumer to 270 be able to describe the spatial relationship of its screens in the 271 same fashion and using the same system as in the provider's capture 272 attributes. 'Area of Display' would be an optional attribute of a 273 consumer request, and would have the same properties as the 274 provider's 'Area of Capture' (i.e. four co-planar {X,Y,Z} 275 coordinates). If the consumer includes information on the area of 276 display the provider may then choose to use that information to 277 inform its choice when switching video. Alternatively, in the cases 278 where there were no spatial constraints on the video the provider was 279 switching, or where fixed streams were being sent, the area of 280 display information could be ignored. 282 A straightforward example of this would be where consumer A is a 283 three-screen system wishing to join a large conference including 284 one-, two- and three-screen systems. The MCU offers a capture scene 285 including three switched captures, to which A wishes to subscribe. A 286 then sends a choice for each of those captures, and for each choice 287 includes an area of display attribute giving the position of each of 288 its screens. The MCU can then use that information to ensure that, 289 when switching in the video streams from multi-screen systems, it 290 does so in a way that they will be rendered correctly on A. 292 A more complicated example is where A is still a three-screen system 293 wishing to join a large conference including one-, two- and three- 294 screen systems, but now wishes to receive more than one video stream 295 per screen, composing them locally. The layout A wishes to achieve 296 is (three large screens, each with one main video displayed full- 297 screen and three picture-in-picture views): 299 +-------------+ +-------------+ +-------------+ 300 | | | | | | 301 | | | | | | 302 | | | | | | 303 | | | | | | 304 | +-+ +-+ +-+ | | +-+ +-+ +-+ | | +-+ +-+ +-+ | 305 | +-+ +-+ +-+ | | +-+ +-+ +-+ | | +-+ +-+ +-+ | 306 +-------------+ +-------------+ +-------------+ 308 The MCU advertises that it can send at least 12 switched video 309 streams to A simultaneously. A makes 12 choices, including a 310 suitable area of display for each one. This information allows the 311 MCU to not just ensure that multi-screen systems are not laid out 312 incorrectly, but potentially to also optimize other choices, such as 313 not splitting multi-screen systems being rendered in the smaller PiP 314 panes across bezels, show presentation and full-motion video received 315 from the same participant on the same screen, and so on. 317 5.1. Applicability of consumer spatial information to audio 319 The text above is primarily concerned with resolving issues for 320 video, but it may still be relevant for audio; the consumer may wish 321 to provide spatial information about the locations at which they will 322 be playing out their audio. However, for the most part I believe 323 this is less relevant; that audio does not have the same rigid 324 requirements for playout that were described above for video, and 325 that for the most part the problem can be solved with the provider- 326 specified spatial coordinates already defined in the specification. 328 6. Implications and conclusions 330 CLUE has been designed as a provider-oriented protocol, with the 331 provider giving a list of the resources it can supply and the 332 consumer selecting from these. This proposal fits into that pattern; 333 spatial information included in a consumer request forms part of that 334 request, insofar as it does not limit the provider but instead gives 335 additional information for the provider to use as it sees fit. 336 Consumers that have no need for the spatial information need not 337 include it, and providers can choose to ignore the spatial 338 information if it is not relevant to their selection process. 340 Allowing the optional reuse of spatial information that is currently 341 sent only by the provider in the consumer request increases the range 342 of problems for which CLUE can provide a solution, while placing no 343 additional burden on systems which do not have these concerns, as 344 they can safely ignore the information 346 7. Security Considerations 348 The proposal herein has no security implications; the new information 349 from the consumer is optional and sent at their discretion, and 350 reveals nothing that can compromise their system. 352 8. References 354 8.1. Normative References 356 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 357 Requirement Levels", BCP 14, RFC 2119, March 1997. 359 8.2. Informative References 361 [I-D.ietf-clue-framework] 362 Romanow, A., Duckworth, M., Pepperell, A., and B. Baldino, 363 "Framework for Telepresence Multi-Streams", 364 draft-ietf-clue-framework-05 (work in progress), May 2012. 366 Authors' Addresses 368 Robert Hansen 369 Cisco Systems 370 San Jose, CA 95134 371 USA 373 Email: rohanse2@cisco.com 375 Andy Pepperell 376 Silverflare 378 Email: andy.pepperell@silverflare.com 380 Allyn Romanow 381 Cisco Systems 382 San Jose, CA 95134 383 USA 385 Email: allyn@cisco.com 387 Brian Baldino 388 Cisco Systems 389 San Jose, CA 95134 390 USA 392 Email: bbaldino@cisco.com 394 Mark Duckworth 395 Polycom 396 Andover, MA 01810 397 USA 399 Email: mark.duckworth@polycom.com