idnits 2.17.1 

draft-hansen-clue-consumer-layout-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (May 31, 2012) is 4345 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'VC0' is mentioned on line 230, but not defined

  == Missing Reference: 'VC1' is mentioned on line 231, but not defined

  == Missing Reference: 'VC2' is mentioned on line 231, but not defined

  == Missing Reference: 'VC3' is mentioned on line 232, but not defined

  == Missing Reference: 'VC4' is mentioned on line 232, but not defined

  == Missing Reference: 'VC5' is mentioned on line 232, but not defined

  == Missing Reference: 'VC6' is mentioned on line 233, but not defined

  == Missing Reference: 'VC7' is mentioned on line 233, but not defined

  == Missing Reference: 'VC8' is mentioned on line 233, but not defined

  == Missing Reference: 'VC9' is mentioned on line 233, but not defined

  == Outdated reference: A later version (-25) exists of
     draft-ietf-clue-framework-05


     Summary: 1 error (**), 0 flaws (~~), 13 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	CLUE                                                           R. Hansen
3	Internet-Draft                                             Cisco Systems
4	Intended status: Standards Track                            A. Pepperell
5	Expires: December 2, 2012                                    Silverflare
6	                                                              A. Romanow
7	                                                              B. Baldino
8	                                                           Cisco Systems
9	                                                            M. Duckworth
10	                                                                 Polycom
11	                                                            May 31, 2012

13	           The need for consumer spatial information in CLUE
14	                  draft-hansen-clue-consumer-layout-00

16	Abstract

18	   This draft is for discussion in the CLUE working group.  It proposes
19	   adding the ability for the consumer to provide specific information
20	   to the provider.

22	   This document proposes allowing consumers to include spatial
23	   parameters in their consumer requests to providers in order to
24	   improve the provider's ability to assign media to streams in a way
25	   that is helpful for rendering.  The solution proposed here is in
26	   partial response to CLUE Task #10, Does Framework provide sufficient
27	   info for receiver?

29	Status of this Memo

31	   This Internet-Draft is submitted in full conformance with the
32	   provisions of BCP 78 and BCP 79.

34	   Internet-Drafts are working documents of the Internet Engineering
35	   Task Force (IETF).  Note that other groups may also distribute
36	   working documents as Internet-Drafts.  The list of current Internet-
37	   Drafts is at http://datatracker.ietf.org/drafts/current/.

39	   Internet-Drafts are draft documents valid for a maximum of six months
40	   and may be updated, replaced, or obsoleted by other documents at any
41	   time.  It is inappropriate to use Internet-Drafts as reference
42	   material or to cite them other than as "work in progress."

44	   This Internet-Draft will expire on December 2, 2012.

46	Copyright Notice

48	   Copyright (c) 2012 IETF Trust and the persons identified as the
49	   document authors.  All rights reserved.

51	   This document is subject to BCP 78 and the IETF Trust's Legal
52	   Provisions Relating to IETF Documents
53	   (http://trustee.ietf.org/license-info) in effect on the date of
54	   publication of this document.  Please review these documents
55	   carefully, as they describe your rights and restrictions with respect
56	   to this document.  Code Components extracted from this document must
57	   include Simplified BSD License text as described in Section 4.e of
58	   the Trust Legal Provisions and are provided without warranty as
59	   described in the Simplified BSD License.

61	Table of Contents

63	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . . . 3
64	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 3
65	   3.  Motiviation - Conferencing in CLUE  . . . . . . . . . . . . . . 3
66	   4.  Issues associated with subscribing to multiple switched
67	       captures  . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
68	     4.1.  Provider advertising spatially-related switched
69	           captures  . . . . . . . . . . . . . . . . . . . . . . . . . 5
70	   5.  Consumer includes optional spatial information  . . . . . . . . 6
71	     5.1.  Applicability of consumer spatial information to audio  . . 8
72	   6.  Implications and conclusions  . . . . . . . . . . . . . . . . . 8
73	   7.  Security Considerations . . . . . . . . . . . . . . . . . . . . 8
74	   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . . . 8
75	     8.1.  Normative References  . . . . . . . . . . . . . . . . . . . 8
76	     8.2.  Informative References  . . . . . . . . . . . . . . . . . . 9
77	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . . . 9

79	1.  Introduction

81	   This draft notes some limitations of CLUE when it comes to correctly
82	   rendering video under certain conditions, and proposes the optional
83	   addition of spatial information by the consumer to resolve these
84	   issues.  This does not imply that the authors believe that the
85	   proposed solution is the only option available; rather, this draft is
86	   meant as a starting point for discussion.

88	2.  Terminology

90	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
91	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
92	   document are to be interpreted as described in RFC 2119 [RFC2119] and
93	   indicate requirement levels for compliant implementations.

95	3.  Motiviation - Conferencing in CLUE

97	   The current methodology of the CLUE framework
98	   [I-D.ietf-clue-framework] is well suited to the case of systems with
99	   a relatively static set of capture devices.  However, scenarios with
100	   a much more dynamic set of capture devices being presented to
101	   consumers, such as a voice-switched conferencing where multiple
102	   endpoints connect to a middle box such as an MCU, present additional
103	   challenges.  An example of such a scenario is shown below, with four
104	   endpoints A, B, C and D in a conference:

106	                   +-----+
107	        +---+     /       \     +---+
108	        | A |----/         \----| B |
109	        +---+   /           \   +---+
110	               +     MCU     +
111	        +---+   \           /   +---+
112	        | C |----\         /----| D |
113	        +---+     \       /     +---+
114	                   +-----+

116	   In this scenario endpoint A is not directly connected to any of the
117	   other endpoints and so will not have the capture information
118	   associated with their media streams directly available.

120	   One approach is for the MCU to advertise B, C and D's captures as
121	   separate capture scenes to A - A can then subscribe to any capture
122	   from any of the other endpoints.

124	   However, as the size of the conference increases the number of
125	   captures that must be advertised will quickly become impractical.
126	   Further, in many conferencing scenarios, endpoints do not wish to
127	   specify the endpoints they want to see - instead they wish to see the
128	   video and audio from the 'most relevant' endpoints as determined by
129	   the MCU (where relevance is usually determined by audio activity
130	   level).  Finally, advertising all available captures in this fashion
131	   can be problematic in the case of captures that are simultaneously
132	   exclusive, as one consumer may ask for one and a second for its
133	   mutually exclusive partner.

135	   As such, the MCU has the ability in CLUE to advertise switched
136	   captures; these don't directly represent specific real video or audio
137	   captures.  Instead, subscribing to one of these captures means that
138	   the provider will switch the stream it sends to the consumer based on
139	   its internal logic.  In the example above, the MCU might advertise a
140	   single, switched video capture to A; if A subscribed to this then the
141	   MCU would forward the video stream from B, C or D based on which it
142	   felt was most relevant (often calculated based on the loudness of an
143	   associated audio stream).

145	4.  Issues associated with subscribing to multiple switched captures

147	   As such, The consumer A from the previous example can subscribe to
148	   one or more of these switched captures and will receive that many
149	   streams from the MCU, switched from their originating source.
150	   However, A does not receive the spatial capture information from the
151	   originating source associated with these streams alongside the RTP
152	   packets.  As a result things become more complicated when A
153	   subscribes to multiple video captures, and when the other endpoints
154	   provide multiple video streams with correlated spatial information.
155	   For example, if A is a three-screen system and hence requests three
156	   streams, if all the streams it receives are independent it can render
157	   them as it wishes, as shown below where it receives one stream from
158	   each of B, C and D:

160	      +------+ +------+ +------+
161	      |      | |      | |      |
162	      |  B   | |  C   | |  D   |
163	      |      | |      | |      |
164	      +------+ +------+ +------+

166	   However, if A receives more than one stream from a particular
167	   endpoint and these streams have related spatial relationships then it
168	   is possible for A to lay them out erroneously.  This is illustrated
169	   below, where A is receiving three streams of video that originated at
170	   B, which should correctly be ordered (L)eft, (C)enter, (R)ight:

172	      +------+ +------+ +------+
173	      |      | |      | |      |
174	      | B(L) | | B(C) | | B(R) |  Correct
175	      |      | |      | |      |
176	      +------+ +------+ +------+

178	      +------+ +------+ +------+
179	      |      | |      | |      |
180	      | B(C) | | B(R) | | B(L) |  Incorrect
181	      |      | |      | |      |
182	      +------+ +------+ +------+

184	   When laid out incorrectly this leads to objects (such as a person
185	   being viewed) being split into sections displayed in disparate, non-
186	   contiguous locations.

188	   This problem could be solved if A had the spatial capture information
189	   from B. In a small conference it may be possible for the middle box
190	   to pre-send all the capture information from all other endpoints to A
191	   (and to every other endpoint), but as the number of captures per
192	   endpoint and the number of endpoints in a conference rise caching all
193	   the data becomes impractical.

195	   An alternative would be for A to request the originating capture
196	   information for streams it is receiving, or for the MCU to send it
197	   whenever it switches streams.  However, because the RTP packets and
198	   the CLUE capture information will be sent in separate channels this
199	   will lead to cases where A is receiving RTP packets but has not yet
200	   received the corresponding capture data and the same problem occurs.
201	   The endpoint must then choose between displaying nothing or risk
202	   making incorrect layout choices.

204	4.1.  Provider advertising spatially-related switched captures

206	   One tool that already exists within the CLUE framework that can be
207	   used to partially solve this problem is the MCU including spatial
208	   information for the switched captures it advertised.  In this case,
209	   for example, it would advertise three captures with area of capture
210	   information for each that portray them as the left, center and right
211	   captures of a single hypothetical room.  In this case, when the MCU
212	   has unrelated one-screen streams to send to A it can associate them
213	   with whichever switched capture it chooses.  But when sending a two-
214	   or three-screen set of streams it can ensure that they are correctly
215	   laid out adjacent to each other and in the correct order.  A could
216	   then request these three captures and render the streams
217	   appropriately on its left, center and right screen, needing to take
218	   no action to ensure that the streams are correctly laid out.

220	   However, this solution is not sufficient for all use-cases.  The
221	   issue is that the MCU will need to advertise a suitable separate
222	   group of switched captures for each endpoint configuration that could
223	   connect to it.  If the possible endpoint configurations are limited,
224	   this may still represent a plausible number; for instance, an MCU
225	   that wanted to support endpoints with one, two, three or four screens
226	   laid out contigously left-to-right could advertise a capture set with
227	   the following entries:

229	   {
230	     [VC0]
231	     [VC1, VC2]
232	     [VC3, VC4, VC5]
233	     [VC6, VC7, VC8, VC9]
234	   }

236	   where VC0 was a single switched capture, VC1 and VC2 were two
237	   switched captures each representing half the scene, and so on.

239	   But this means that the MCU is only able to support certain pre-
240	   defined layouts - supporting additional configurations of screens
241	   (such as a 2x2 array) requires a new entry for each, and designing a
242	   new endpoint configuration means updating all the MCUs it
243	   interoperates with.  This problem becomes particularly acute if the
244	   endpoint has many screens, or wants to perform local composition
245	   (subscribing to multiple streams per screen and rendering them
246	   locally for display) - this both substantially increases the number
247	   of streams that the endpoint would wish to subscribe to, and
248	   increases the complexity of layouts possible.  For instance, an
249	   endpoint with two screens that wanted to show a 2x2 grid of
250	   participants on each would need to subscribe to eight captures with
251	   appropriate spatial information.

253	5.  Consumer includes optional spatial information

255	   We can address these issues and allow an endpoint more complex stream
256	   rendering configurations, while substantially reducing the number and
257	   complexity of switched captures the MCU must advertise.  The approach
258	   is for the consumer to optionally include some information on the
259	   spatial relationships with its rendering as part of its request.
260	   This allows the MCU to advertise a single collection of switched
261	   captures with no spatial information for the consumer to subscribe
262	   to, rather than attempting to anticipate every layout an endpoint
263	   might desire, and having to advertise an entry for each with suitable
264	   spatial information.

266	   There are a number of forms this consumer information could take.

268	   However, the form most consistent with the existing CLUE data model,
269	   and offering most flexibility for the future, is for the consumer to
270	   be able to describe the spatial relationship of its screens in the
271	   same fashion and using the same system as in the provider's capture
272	   attributes.  'Area of Display' would be an optional attribute of a
273	   consumer request, and would have the same properties as the
274	   provider's 'Area of Capture' (i.e. four co-planar {X,Y,Z}
275	   coordinates).  If the consumer includes information on the area of
276	   display the provider may then choose to use that information to
277	   inform its choice when switching video.  Alternatively, in the cases
278	   where there were no spatial constraints on the video the provider was
279	   switching, or where fixed streams were being sent, the area of
280	   display information could be ignored.

282	   A straightforward example of this would be where consumer A is a
283	   three-screen system wishing to join a large conference including
284	   one-, two- and three-screen systems.  The MCU offers a capture scene
285	   including three switched captures, to which A wishes to subscribe.  A
286	   then sends a choice for each of those captures, and for each choice
287	   includes an area of display attribute giving the position of each of
288	   its screens.  The MCU can then use that information to ensure that,
289	   when switching in the video streams from multi-screen systems, it
290	   does so in a way that they will be rendered correctly on A.

292	   A more complicated example is where A is still a three-screen system
293	   wishing to join a large conference including one-, two- and three-
294	   screen systems, but now wishes to receive more than one video stream
295	   per screen, composing them locally.  The layout A wishes to achieve
296	   is (three large screens, each with one main video displayed full-
297	   screen and three picture-in-picture views):

299	      +-------------+  +-------------+  +-------------+
300	      |             |  |             |  |             |
301	      |             |  |             |  |             |
302	      |             |  |             |  |             |
303	      |             |  |             |  |             |
304	      | +-+ +-+ +-+ |  | +-+ +-+ +-+ |  | +-+ +-+ +-+ |
305	      | +-+ +-+ +-+ |  | +-+ +-+ +-+ |  | +-+ +-+ +-+ |
306	      +-------------+  +-------------+  +-------------+

308	   The MCU advertises that it can send at least 12 switched video
309	   streams to A simultaneously.  A makes 12 choices, including a
310	   suitable area of display for each one.  This information allows the
311	   MCU to not just ensure that multi-screen systems are not laid out
312	   incorrectly, but potentially to also optimize other choices, such as
313	   not splitting multi-screen systems being rendered in the smaller PiP
314	   panes across bezels, show presentation and full-motion video received
315	   from the same participant on the same screen, and so on.

317	5.1.  Applicability of consumer spatial information to audio

319	   The text above is primarily concerned with resolving issues for
320	   video, but it may still be relevant for audio; the consumer may wish
321	   to provide spatial information about the locations at which they will
322	   be playing out their audio.  However, for the most part I believe
323	   this is less relevant; that audio does not have the same rigid
324	   requirements for playout that were described above for video, and
325	   that for the most part the problem can be solved with the provider-
326	   specified spatial coordinates already defined in the specification.

328	6.  Implications and conclusions

330	   CLUE has been designed as a provider-oriented protocol, with the
331	   provider giving a list of the resources it can supply and the
332	   consumer selecting from these.  This proposal fits into that pattern;
333	   spatial information included in a consumer request forms part of that
334	   request, insofar as it does not limit the provider but instead gives
335	   additional information for the provider to use as it sees fit.
336	   Consumers that have no need for the spatial information need not
337	   include it, and providers can choose to ignore the spatial
338	   information if it is not relevant to their selection process.

340	   Allowing the optional reuse of spatial information that is currently
341	   sent only by the provider in the consumer request increases the range
342	   of problems for which CLUE can provide a solution, while placing no
343	   additional burden on systems which do not have these concerns, as
344	   they can safely ignore the information

346	7.  Security Considerations

348	   The proposal herein has no security implications; the new information
349	   from the consumer is optional and sent at their discretion, and
350	   reveals nothing that can compromise their system.

352	8.  References

354	8.1.  Normative References

356	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
357	              Requirement Levels", BCP 14, RFC 2119, March 1997.

359	8.2.  Informative References

361	   [I-D.ietf-clue-framework]
362	              Romanow, A., Duckworth, M., Pepperell, A., and B. Baldino,
363	              "Framework for Telepresence Multi-Streams",
364	              draft-ietf-clue-framework-05 (work in progress), May 2012.

366	Authors' Addresses

368	   Robert Hansen
369	   Cisco Systems
370	   San Jose, CA  95134
371	   USA

373	   Email: rohanse2@cisco.com

375	   Andy Pepperell
376	   Silverflare

378	   Email: andy.pepperell@silverflare.com

380	   Allyn Romanow
381	   Cisco Systems
382	   San Jose, CA  95134
383	   USA

385	   Email: allyn@cisco.com

387	   Brian Baldino
388	   Cisco Systems
389	   San Jose, CA  95134
390	   USA

392	   Email: bbaldino@cisco.com

394	   Mark Duckworth
395	   Polycom
396	   Andover, MA  01810
397	   USA

399	   Email: mark.duckworth@polycom.com