idnits 2.17.1 draft-ietf-clue-framework-15.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 1088 has weird spacing: '... switch betwe...' == Line 1923 has weird spacing: '...om left bot...' == Line 1973 has weird spacing: '...om left bot...' == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'SHOULD not' in this paragraph: A separate data channel is established to transport the CLUE protocol messages. The contents of the CLUE protocol messages are based on information introduced in this document, which is represented by an XML schema for this information defined in the CLUE data model [ref]. Some of the information which could possibly introduce privacy concerns is the xCard information as described in section x. In addition, the (text) description field in the Media Capture attribute (section 7.1.1.7) could possibly reveal sensitive information or specific identities. The same would be true for the descriptions in the Capture Scene (section 7.3.1) and Capture Scene Entry (7.3.2) attributes. One other important consideration for the information in the xCard as well as the description field in the Media Capture and Capture Scene Entry attributes is that while the endpoints involved in the session have been authenticated, there is no assurance that the information in the xCard or description fields is authentic. Thus, this information SHOULD not be used to make any authorization decisions and the participants in the sessions SHOULD be made aware of this. -- The document date (May 15, 2014) is 3634 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '
' and
     '' lines.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Missing Reference: 'RFC6351' is mentioned on line 840, but not defined

  == Missing Reference: 'RFC6350' is mentioned on line 851, but not defined

  == Missing Reference: 'RFC4566' is mentioned on line 1533, but not defined

  ** Obsolete undefined reference: RFC 4566 (Obsoleted by RFC 8866)

  == Missing Reference: 'RFC 6503' is mentioned on line 2945, but not defined

  == Missing Reference: 'RFC 3261' is mentioned on line 2967, but not defined

  == Unused Reference: 'RFC4579' is defined on line 3277, but no explicit
     reference was found in the text

  -- Obsolete informational reference (is this intentional?): RFC 5117
     (Obsoleted by RFC 7667)


     Summary: 1 error (**), 0 flaws (~~), 12 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	CLUE WG                                              M. Duckworth, Ed.
2	Internet Draft                                                  Polycom
3	Intended status: Standards Track                           A. Pepperell
4	Expires: November 15, 2014                                        Acano
5	                                                              S. Wenger
6	                                                                  Vidyo
7	                                                           May 15, 2014

9	                Framework for Telepresence Multi-Streams
10	                    draft-ietf-clue-framework-15.txt

12	Abstract

14	   This document defines a framework for a protocol to enable devices
15	   in a telepresence conference to interoperate.  The protocol enables
16	   communication of information about multiple media streams so a
17	   sending system and receiving system can make reasonable decisions
18	   about transmitting, selecting and rendering the media streams.
19	   This protocol is used in addition to SIP signaling for setting up a
20	   telepresence session.

22	Status of this Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current
30	   Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six
33	   months and may be updated, replaced, or obsoleted by other
34	   documents at any time.  It is inappropriate to use Internet-Drafts
35	   as reference material or to cite them other than as "work in
36	   progress."

38	   This Internet-Draft will expire on November 15, 2014.

40	Copyright Notice

42	   Copyright (c) 2013 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (http://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with
50	   respect to this document.  Code Components extracted from this
51	   document must include Simplified BSD License text as described in
52	   Section 4.e of the Trust Legal Provisions and are provided without
53	   warranty as described in the Simplified BSD License.

55	Table of Contents

57	   1. Introduction...................................................3
58	   2. Terminology....................................................4
59	   3. Definitions....................................................4
60	   4. Overview & Motivation..........................................7
61	   5. Overview of the Framework/Model................................9
62	   6. Spatial Relationships.........................................15
63	   7. Media Captures and Capture Scenes.............................16
64	      7.1. Media Captures...........................................16
65	         7.1.1. Media Capture Attributes............................17
66	      7.2. Multiple Content Capture.................................22
67	         7.2.1. MCC Attributes......................................23
68	      7.3. Capture Scene............................................28
69	         7.3.1. Capture Scene attributes............................31
70	         7.3.2. Capture Scene Entry attributes......................32
71	   8. Simultaneous Transmission Set Constraints.....................33
72	   9. Encodings.....................................................35
73	      9.1. Individual Encodings.....................................35
74	      9.2. Encoding Group...........................................36
75	      9.3. Associating Captures with Encoding Groups................37
76	   10. Consumer's Choice of Streams to Receive from the Provider....38
77	      10.1. Local preference........................................41
78	      10.2. Physical simultaneity restrictions......................41
79	      10.3. Encoding and encoding group limits......................41
80	   11. Extensibility................................................42
81	   12. Examples - Using the Framework (Informative).................42
82	      12.1. Provider Behavior.......................................42
83	         12.1.1. Three screen Endpoint Provider.....................43
84	         12.1.2. Encoding Group Example.............................49
85	         12.1.3. The MCU Case.......................................50
86	      12.2. Media Consumer Behavior.................................51
87	         12.2.1. One screen Media Consumer..........................51
88	         12.2.2. Two screen Media Consumer configuring the example..52
89	         12.2.3. Three screen Media Consumer configuring the example53
90	      12.3. Multipoint Conference utilizing Multiple Content Captures53
91	         12.3.1. Single Media Captures and MCC in the same
92	         Advertisement..............................................53
93	         12.3.2. Several MCCs in the same Advertisement.............56
94	         12.3.3. Heterogeneous conference with switching and
95	         composition................................................58
96	   13. Acknowledgements.............................................67
97	   14. IANA Considerations..........................................67
98	   15. Security Considerations......................................68
99	   16. Changes Since Last Version...................................69
100	   17. Authors' Addresses...........................................75

102	1. Introduction

104	   Current telepresence systems, though based on open standards such
105	   as RTP [RFC3550] and SIP [RFC3261], cannot easily interoperate with
106	   each other.  A major factor limiting the interoperability of
107	   telepresence systems is the lack of a standardized way to describe
108	   and negotiate the use of the multiple streams of audio and video
109	   comprising the media flows.  This document provides a framework for
110	   protocols to enable interoperability by handling multiple streams
111	   in a standardized way.  The framework is intended to support the
112	   use cases described in draft-ietf-clue-telepresence-use-cases and
113	   to meet the requirements in draft-ietf-clue-telepresence-
114	   requirements.

116	   The basic session setup for the use cases is based on SIP [RFC3261]
117	   and SDP offer/answer [RFC3264].  In addition to basic SIP & SDP
118	   offer/answer, CLUE specific signaling is required to exchange the
119	   information describing the multiple media streams.  The motivation
120	   for this framework, an overview of the signaling, and information
121	   required to be exchanged is described in subsequent sections of
122	   this document.  The signaling details and data model are provided
123	   in subsequent documents.

125	2. Terminology

127	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
128	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
129	   this document are to be interpreted as described in RFC 2119
130	   [RFC2119].

132	3. Definitions

134	   The terms defined below are used throughout this document and
135	   companion documents and they are normative.  In order to easily
136	   identify the use of a defined term, those terms are capitalized.

138	   Advertisement: a CLUE message a Media Provider sends to a Media
139	   Consumer describing specific aspects of the content of the media,
140	   the formatting of the media streams it can send, and any
141	   restrictions it has in terms of being able to provide certain
142	   Streams simultaneously.

144	   Audio Capture: Media Capture for audio.  Denoted as ACn in the
145	   example cases in this document.

147	   Camera-Left and Right: For Media Captures, camera-left and camera-
148	   right are from the point of view of a person observing the rendered
149	   media.  They are the opposite of Stage-Left and Stage-Right.

151	   Capture: Same as Media Capture.

153	   Capture Device: A device that converts audio and video input into
154	   an electrical signal, in most cases to be fed into a media encoder.

156	   Capture Encoding: A specific encoding of a Media Capture, to be
157	   sent by a Media Provider to a Media Consumer via RTP.

159	   Capture Scene: a structure representing a spatial region containing
160	   one or more Capture Devices, each capturing media representing a
161	   portion of the region. The spatial region represented by a Capture
162	   Scene MAY or may not correspond to a real region in physical space,
163	   such as a room.  A Capture Scene includes attributes and one or
164	   more Capture Scene Entries, with each entry including one or more
165	   Media Captures.

167	   Capture Scene Entry (CSE): a list of Media Captures of the same
168	   media type that together form one way to represent the entire
169	   Capture Scene.

171	   Conference: used as defined in [RFC4353], A Framework for
172	   Conferencing within the Session Initiation Protocol (SIP).

174	   Configure Message: A CLUE message a Media Consumer sends to a Media
175	   Provider specifying which content and media streams it wants to
176	   receive, based on the information in a corresponding Advertisement
177	   message.

179	   Consumer: short for Media Consumer.

181	   Encoding or Individual Encoding: a set of parameters representing a
182	   way to encode a Media Capture to become a Capture Encoding.

184	   Encoding Group: A set of encoding parameters representing a total
185	   media encoding capability to be sub-divided across potentially
186	   multiple Individual Encodings.

188	   Endpoint: The logical point of final termination through receiving,
189	   decoding and rendering, and/or initiation through capturing,
190	   encoding, and sending of media streams.  An endpoint consists of
191	   one or more physical devices which source and sink media streams,
192	   and exactly one [RFC4353] Participant (which, in turn, includes
193	   exactly one SIP User Agent).  Endpoints can be anything from
194	   multiscreen/multicamera rooms to handheld devices.

196	   Front: the portion of the room closest to the cameras.  In going
197	   towards back you move away from the cameras.

199	   MCU: Multipoint Control Unit (MCU) - a device that connects two or
200	   more endpoints together into one single multimedia conference
201	   [RFC5117].  An MCU includes an [RFC4353] like Mixer, without the
202	   [RFC4353] requirement to send media to each participant.

204	   Media: Any data that, after suitable encoding, can be conveyed over
205	   RTP, including audio, video or timed text.

207	   Media Capture: a source of Media, such as from one or more Capture
208	   Devices or constructed from other Media streams.

210	   Media Consumer: an Endpoint or middle box that receives Media
211	   streams

213	   Media Provider: an Endpoint or middle box that sends Media streams
214	   Model: a set of assumptions a telepresence system of a given vendor
215	   adheres to and expects the remote telepresence system(s) also to
216	   adhere to.

218	   Multiple Content Capture: A Capture for audio or video that
219	   indicates that the Capture contains multiple audio or video
220	   Captures. Single Media Captures may or may not be present in the
221	   resultant Capture Encoding depending on time or space.  Denoted as
222	   MCCn in the example cases in this document.

224	   Plane of Interest: The spatial plane containing the most relevant
225	   subject matter.

227	   Provider: Same as Media Provider.

229	   Render: the process of generating a representation from a media,
230	   such as displayed motion video or sound emitted from loudspeakers.

232	   Simultaneous Transmission Set: a set of Media Captures that can be
233	   transmitted simultaneously from a Media Provider.

235	   Single Media Capture: A capture which contains media from a single
236	   source capture device, i.e. audio capture, video capture.

238	   Spatial Relation: The arrangement in space of two objects, in
239	   contrast to relation in time or other relationships.  See also
240	   Camera-Left and Right.

242	   Stage-Left and Right: For Media Captures, Stage-left and Stage-
243	   right are the opposite of Camera-left and Camera-right.  For the
244	   case of a person facing (and captured by) a camera, Stage-left and
245	   Stage-right are from the point of view of that person.

247	   Stream: a Capture Encoding sent from a Media Provider to a Media
248	   Consumer via RTP [RFC3550].

250	   Stream Characteristics: the media stream attributes commonly used
251	   in non-CLUE SIP/SDP environments (such as: media codec, bit rate,
252	   resolution, profile/level etc.) as well as CLUE specific
253	   attributes, such as the Capture ID or a spatial location.

255	   Video Capture: Media Capture for video.  Denoted as VCn in the
256	   example cases in this document.

258	   Video Composite: A single image that is formed, normally by an RTP
259	   mixer inside an MCU, by combining visual elements from separate
260	   sources.

262	4. Overview & Motivation

264	   This section provides an overview of the functional elements
265	   defined in this document to represent a telepresence system.  The
266	   motivations for the framework described in this document are also
267	   provided.

269	   Two key concepts introduced in this document are the terms "Media
270	   Provider" and "Media Consumer". A Media Provider represents the
271	   entity that is sending the media and a Media Consumer represents
272	   the entity that is receiving the media. A Media Provider provides
273	   Media in the form of RTP packets, a Media Consumer consumes those
274	   RTP packets.  Media Providers and Media Consumers can reside in
275	   Endpoints or in middleboxes such as Multipoint Control Units
276	   (MCUs).  A Media Provider in an Endpoint is usually associated
277	   with the generation of media for Media Captures; these Media
278	   Captures are typically sourced from cameras, microphones, and the
279	   like.  Similarly, the Media Consumer in an Endpoint is usually
280	   associated with renderers, such as screens and loudspeakers.  In
281	   middleboxes, Media Providers and Consumers can have the form of
282	   outputs and inputs, respectively, of RTP mixers, RTP translators,
283	   and similar devices.  Typically, telepresence devices such as
284	   Endpoints and middleboxes would perform as both Media Providers
285	   and Media Consumers, the former being concerned with those
286	   devices' transmitted media and the latter with those devices'
287	   received media.  In a few circumstances, a CLUE Endpoint middlebox
288	   includes only Consumer or Provider functionality, such as
289	   recorder-type Consumers or webcam-type Providers.

291	   The motivations for the framework outlined in this document
292	   include the following:

294	   (1) Endpoints in telepresence systems typically have multiple Media
295	   Capture and Media Render devices, e.g., multiple cameras and
296	   screens. While previous system designs were able to set up calls
297	   that would capture media using all cameras and display media on all
298	   screens, for example, there is no mechanism that can associate
299	   these Media Captures with each other in space and time.

301	   (2) The mere fact that there are multiple capture and rendering
302	   devices, each of which may be configurable in aspects such as zoom,
303	   leads to the difficulty that a variable number of such devices can
304	   be used to capture different aspects of a region.  The Capture
305	   Scene concept allows for the description of multiple setups for
306	   those multiple capture devices that could represent sensible
307	   operation points of the physical capture devices in a room, chosen
308	   by the operator.  A Consumer can pick and choose from those
309	   configurations based on its rendering abilities and inform the
310	   Provider about its choices.  Details are provided in section 7.

312	   (3) In some cases, physical limitations or other reasons disallow
313	   the concurrent use of a device in more than one setup.  For
314	   example, the center camera in a typical three-camera conference
315	   room can set its zoom objective either to capture only the middle
316	   few seats, or all seats of a room, but not both concurrently.  The
317	   Simultaneous Transmission Set concept allows a Provider to signal
318	   such limitations.  Simultaneous Transmission Sets are part of the
319	   Capture Scene description, and discussed in section 8.

321	   (4) Often, the devices in a room do not have the computational
322	   complexity or connectivity to deal with multiple encoding options
323	   simultaneously, even if each of these options is sensible in
324	   certain scenarios, and even if the simultaneous transmission is
325	   also sensible (i.e. in case of multicast media distribution to
326	   multiple endpoints).   Such constraints can be expressed by the
327	   Provider using the Encoding Group concept, described in section 9.

329	   (5) Due to the potentially large number of RTP flows required for a
330	   Multimedia Conference involving potentially many Endpoints, each of
331	   which can have many Media Captures and media renderers, it has
332	   become common to multiplex multiple RTP media flows onto the same
333	   transport address, so to avoid using the port number as a
334	   multiplexing point and the associated shortcomings such as
335	   NAT/firewall traversal.  While the actual mapping of those RTP
336	   flows to the header fields of the RTP packets is not subject of
337	   this specification, the large number of possible permutations of
338	   sensible options a Media Provider can make available to a Media
339	   Consumer makes a mechanism desirable that allows to narrow down the
340	   number of possible options that a SIP offer-answer exchange has to
341	   consider.  Such information is made available using protocol
342	   mechanisms specified in this document and companion documents,
343	   although it should be stressed that its use in an implementation is
344	   OPTIONAL.  Also, there are aspects of the control of both Endpoints
345	   and middleboxes/MCUs that dynamically change during the progress of
346	   a call, such as audio-level based screen switching, layout changes,
347	   and so on, which need to be conveyed.  Note that these control
348	   aspects are complementary to those specified in traditional SIP
349	   based conference management such as BFCP.  An exemplary call flow
350	   can be found in section 5.

352	   Finally, all this information needs to be conveyed, and the notion
353	   of support for it needs to be established.  This is done by the
354	   negotiation of a "CLUE channel", a data channel negotiated early
355	   during the initiation of a call.  An Endpoint or MCU that rejects
356	   the establishment of this data channel, by definition, is not
357	   supporting CLUE based mechanisms, whereas an Endpoint or MCU that
358	   accepts it is REQUIRED to use it to the extent specified in this
359	   document and its companion documents.

361	5. Overview of the Framework/Model

363	   The CLUE framework specifies how multiple media streams are to be
364	   handled in a telepresence conference.

366	   A Media Provider (transmitting Endpoint or MCU) describes specific
367	   aspects of the content of the media and the formatting of the media
368	   streams it can send in an Advertisement; and the Media Consumer
369	   responds to the Media Provider by specifying which content and
370	   media streams it wants to receive in a Configure message.  The
371	   Provider then transmits the asked-for content in the specified
372	   streams.

374	   This Advertisement and Configure MUST occur during call initiation
375	   but MAY also happen at any time throughout the call, whenever there
376	   is a change in what the Consumer wants to receive or (perhaps less
377	   common) the Provider can send.

379	   An Endpoint or MCU typically act as both Provider and Consumer at
380	   the same time, sending Advertisements and sending Configurations in
381	   response to receiving Advertisements.  (It is possible to be just
382	   one or the other.)

384	   The data model is based around two main concepts: a Capture and an
385	   Encoding.  A Media Capture (MC), such as audio or video, describes
386	   the content a Provider can send.  Media Captures are described in
387	   terms of CLUE-defined attributes, such as spatial relationships and
388	   purpose of the capture.  Providers tell Consumers which Media
389	   Captures they can provide, described in terms of the Media Capture
390	   attributes.

392	   A Provider organizes its Media Captures into one or more Capture
393	   Scenes, each representing a spatial region, such as a room.  A
394	   Consumer chooses which Media Captures it wants to receive from each
395	   Capture Scene.

397	   In addition, the Provider can send the Consumer a description of
398	   the Individual Encodings it can send in terms of the media
399	   attributes of the Encodings, in particular, audio and video
400	   parameters such as bandwidth, frame rate, macroblocks per second.
401	   Note that this is OPTIONAL, and intended to minimize the number of
402	   options a later SDP offer-answer would have to include in the SDP
403	   in case of complex setups, as should become clearer shortly when
404	   discussing an outline of the call flow.

406	   The Provider can also specify constraints on its ability to provide
407	   Media, and a sensible design choice for a Consumer is to take these
408	   into account when choosing the content and Capture Encodings it
409	   requests in the later offer-answer exchange.  Some constraints are
410	   due to the physical limitations of devices--for example, a camera
411	   may not be able to provide zoom and non-zoom views simultaneously.
412	   Other constraints are system based, such as maximum bandwidth and
413	   maximum video coding performance measured in macroblocks/second.

415	   The following diagram illustrates the information contained in an
416	   Advertisement.

418	   ...................................................................
419	   .  Provider Advertisement                                         .
420	   .                                                                 .
421	   .        +------------------------+   +--------------------+      .
422	   .        |       Capture Scene N  |   | Simultaneous       |      .
423	   .      +-+----------------------+ |   +--------------------+      .
424	   .      |       Capture Scene 2  | |                               .
425	   .    +-+----------------------+ | |      +----------------------+ .
426	   .    |  Capture Scene 1       | | |      |  Encoding Group N    | .
427	   .    |    +---------------+   | | |    +-+--------------------+ | .
428	   .    |    | Attributes    |   | | |    |   Encoding Group 2   | | .
429	   .    |    +---------------+   | | |  +-+--------------------+ | | .
430	   .    |                        | | |  |   Encoding Group 1   | | | .
431	   .    |    +----------------+  | | |  |     parameters       | | | .
432	   .    |    | E n t r i e s  |  | | |  |                      | | | .
433	   .    |    |  +---------+   |  | | |  | +-------------------+| | | .
434	   .    |    |  |Attribute|   |  | | |  | | V i d e o         || | | .
435	   .    |    |  +---------+   |  | | |  | | E n c o d i n g s || | | .
436	   .    |    |                |  | | |  | | Encoding 1        || | | .
437	   .    |    | Entry 1        |  | | |  | | (parameters)      || | | .
438	   .    |    |  (list of MCs) |  | |-+  | +-------------------+| | | .
439	   .    |    +----|-|--|------+  |-+    |                      | | | .
440	   .    +---------|-|--|---------+      | +-------------------+| | | .
441	   .              | |  |                | | A u d i o         || | | .
442	   .              | |  |                | | E n c o d i n g s || | | .
443	   .              v |  |                | | Encoding 1        || | | .
444	   .      +---------|--|--------+       | | (ID,maxBandwidth) || | | .
445	   .      | Media Capture N     |------>| +-------------------+| | | .
446	   .    +-+---------v--|------+ |       |                      | | | .
447	   .    | Media Capture 2     | |       |                      | |-+ .
448	   .  +-+--------------v----+ |-------->|                      | |   .
449	   .  | Media Capture  1    | | |       |                      |-+   .
450	   .  |  +----------------+ |---------->|                      |     .
451	   .  |  | Attributes     | | |_+       +----------------------+     .
452	   .  |  +----------------+ |_+                                      .
453	   .  +---------------------+                                        .
454	   .                                                                 .
455	   ...................................................................
456	                    Figure 1: Advertisement Structure

458	   A very brief outline of the call flow used by a simple system (two
459	   Endpoints) in compliance with this document can be described as
460	   follows, and as shown in the following figure.

462	         +-----------+                     +-----------+
463	         | Endpoint1 |                     | Endpoint2 |
464	         +----+------+                     +-----+-----+
465	              | INVITE (BASIC SDP+CLUECHANNEL)   |
466	              |--------------------------------->|
467	              |    200 0K (BASIC SDP+CLUECHANNEL)|
468	              |<---------------------------------|
469	              | ACK                              |
470	              |--------------------------------->|
471	              |                                  |
472	              |<################################>|
473	              |     BASIC SDP MEDIA SESSION      |
474	              |<################################>|
475	              |                                  |
476	              |    CONNECT (CLUE CTRL CHANNEL)   |
477	              |=================================>|
478	              |            ...                   |
479	              |<================================>|
480	              |   CLUE CTRL CHANNEL ESTABLISHED  |
481	              |<================================>|
482	              |                                  |
483	              | ADVERTISEMENT 1                  |
484	              |*********************************>|
485	              |                  ADVERTISEMENT 2 |
486	              |<*********************************|
487	              |                                  |
488	              |                      CONFIGURE 1 |
489	              |<*********************************|
490	              | CONFIGURE 2                      |
491	              |*********************************>|
492	              |                                  |
493	              | REINVITE (UPDATED SDP)           |
494	              |--------------------------------->|
495	              |              200 0K (UPDATED SDP)|
496	              |<---------------------------------|
497	              | ACK                              |
498	              |--------------------------------->|
499	              |                                  |
500	              |<################################>|
501	              |   UPDATED SDP MEDIA SESSION      |
502	              |<################################>|
503	              |                                  |
504	              v                                  v

506	                    Figure 2: Basic Information Flow

508	   An initial offer/answer exchange establishes a basic media session,
509	   for example audio-only, and a CLUE channel between two Endpoints.
510	   With the establishment of that channel, the endpoints have
511	   consented to use the CLUE protocol mechanisms and, therefore, MUST
512	   adhere to the CLUE protocol suite as outlined herein.

514	   Over this CLUE channel, the Provider in each Endpoint conveys its
515	   characteristics and capabilities by sending an Advertisement as
516	   specified herein.  The Advertisement is typically not sufficient to
517	   set up all media.  The Consumer in the Endpoint receives the
518	   information provided by the Provider, and can use it for two
519	   purposes.  First, it MUST construct and send a CLUE Configure
520	   message to tell the Provider what the Consumer wishes to receive.
521	   Second, it MAY, but is not necessarily REQUIRED to, use the
522	   information provided to tailor the SDP it is going to send during
523	   the following SIP offer/answer exchange, and its reaction to SDP it
524	   receives in that step.  It is often a sensible implementation
525	   choice to do so, as the representation of the media information
526	   conveyed over the CLUE channel can dramatically cut down on the
527	   size of SDP messages used in the O/A exchange that follows.
528	   Spatial relationships associated with the Media can be included in
529	   the Advertisement, and it is often sensible for the Media Consumer
530	   to take those spatial relationships into account when tailoring the
531	   SDP.

533	   This CLUE exchange MUST be followed by an SDP offer answer exchange
534	   that not only establishes those aspects of the media that have not
535	   been "negotiated" over CLUE, but has also the side effect of
536	   setting up the media transmission itself, involving potentially
537	   security exchanges, ICE, and whatnot.  This step is plain vanilla
538	   SIP, with the exception that the SDP used herein, in most (but not
539	   necessarily all) cases can be considerably smaller than the SDP a
540	   system would typically need to exchange if there were no pre-
541	   established knowledge about the Provider and Consumer
542	   characteristics.  (The need for cutting down SDP size is not quite
543	   obvious for a point-to-point call involving simple endpoints;
544	   however, when considering a large multipoint conference involving
545	   many multi-screen/multi-camera endpoints, each of which can operate
546	   using multiple codecs for each camera and microphone, it becomes
547	   perhaps somewhat more intuitive.)

549	   During the lifetime of a call, further exchanges MAY occur over the
550	   CLUE channel.  In some cases, those further exchanges lead to a
551	   modified system behavior of Provider or Consumer (or both) without
552	   any other protocol activity such as further offer/answer exchanges.
553	   For example, voice-activated screen switching, signaled over the
554	   CLUE channel, ought not to lead to heavy-handed mechanisms like SIP
555	   re-invites.  However, in other cases, after the CLUE negotiation an
556	   additional offer/answer exchange becomes necessary.  For example,
557	   if both sides decide to upgrade the call from a single screen to a
558	   multi-screen call and more bandwidth is required for the additional
559	   video channels compared to what was previously negotiated using
560	   offer/answer, a new O/A exchange is REQUIRED.

562	   One aspect of the protocol outlined herein and specified in more
563	   detail in companion documents is that it makes available
564	   information regarding the Provider's capabilities to deliver Media,
565	   and attributes related to that Media such as their spatial
566	   relationship, to the Consumer.  The operation of the renderer
567	   inside the Consumer is unspecified in that it can choose to ignore
568	   some information provided by the Provider, and/or not render media
569	   streams available from the Provider (although it MUST follow the
570	   CLUE protocol and, therefore, MUST gracefully receive and respond
571	   (through a Configure) to the Provider's information).  All CLUE
572	   protocol mechanisms are OPTIONAL in the Consumer in the sense that,
573	   while the Consumer MUST be able to receive (and, potentially,
574	   gracefully acknowledge) CLUE messages, it is free to ignore the
575	   information provided therein.  Obviously, this is not a
576	   particularly sensible design choice in almost all conceivable
577	   cases.

579	   A CLUE-implementing device interoperates with a device that does
580	   not support CLUE, because the non-CLUE device does, by definition,
581	   not understand the offer of a CLUE channel in the initial
582	   offer/answer exchange and, therefore, will reject it. This
583	   rejection MUST be used as the indication to the CLUE-implementing
584	   device that the other side of the communication is not compliant
585	   with CLUE, and to fall back to behavior that does not require CLUE.

587	   As for the media, Provider and Consumer have an end-to-end
588	   communication relationship with respect to (RTP transported) media;
589	   and the mechanisms described herein and in companion documents do
590	   not change the aspects of setting up those RTP flows and sessions.
591	   In other words, the RTP media sessions conform to the negotiated
592	   SDP whether or not CLUE is used.

594	6. Spatial Relationships

596	   In order for a Consumer to perform a proper rendering, it is often
597	   necessary or at least helpful for the Consumer to have received
598	   spatial information about the streams it is receiving.  CLUE
599	   defines a coordinate system that allows Media Providers to describe
600	   the spatial relationships of their Media Captures to enable proper
601	   scaling and spatially sensible rendering of their streams.  The
602	   coordinate system is based on a few principles:

604	   o  Simple systems which do not have multiple Media Captures to
605	      associate spatially need not use the coordinate model.

607	   o  Coordinates can be either in real, physical units (millimeters),
608	      have an unknown scale or have no physical scale.  Systems which
609	      know their physical dimensions (for example professionally
610	      installed Telepresence room systems) MUST always provide those
611	      real-world measurements.  Systems which don't know specific
612	      physical dimensions but still know relative distances MUST use
613	      'unknown scale'.  'No scale' is intended to be used where Media
614	      Captures from different devices (with potentially different
615	      scales) will be forwarded alongside one another (e.g. in the
616	      case of a middle box).

618	      *  "Millimeters" means the scale is in millimeters.

620	      *  "Unknown" means the scale is not necessarily millimeters, but
621	         the scale is the same for every Capture in the Capture Scene.

623	      *  "No Scale" means the scale could be different for each
624	         capture- an MCU provider that advertises two adjacent
625	         captures and picks sources (which can change quickly) from
626	         different endpoints might use this value; the scale could be
627	         different and changing for each capture.  But the areas of
628	         capture still represent a spatial relation between captures.

630	   o  The coordinate system is Cartesian X, Y, Z with the origin at a
631	      spatial location of the provider's choosing.  The Provider MUST
632	      use the same coordinate system with the same scale and origin
633	      for all coordinates within the same Capture Scene.

635	   The direction of increasing coordinate values is:
636	   X increases from Camera-Left to Camera-Right
637	   Y increases from front to back
638	   Z increases from low to high (i.e. floor to ceiling)

640	7. Media Captures and Capture Scenes

642	   This section describes how Providers can describe the content of
643	   media to Consumers.

645	7.1. Media Captures

647	   Media Captures are the fundamental representations of streams that
648	   a device can transmit.  What a Media Capture actually represents is
649	   flexible:

651	   o  It can represent the immediate output of a physical source (e.g.
652	      camera, microphone) or 'synthetic' source (e.g. laptop computer,
653	      DVD player).

655	   o  It can represent the output of an audio mixer or video composer

657	   o  It can represent a concept such as 'the loudest speaker'

659	   o  It can represent a conceptual position such as 'the leftmost
660	      stream'

662	   To identify and distinguish between multiple Capture instances
663	   Captures have a unique identity.  For instance: VC1, VC2 and AC1,
664	   AC2, where VC1 and VC2 refer to two different video captures and
665	   AC1 and AC2 refer to two different audio captures.

667	   Some key points about Media Captures:

669	     . A Media Capture is of a single media type (e.g. audio or
670	        video)
671	     . A Media Capture is defined in a Capture Scene and is given an
672	        advertisement unique identity.  The identity may be referenced
673	        outside the Capture Scene that defines it through a Multiple
674	        Content Capture (MCC)
675	     . A Media Capture is associated with one or more Capture Scene
676	        Entries
677	     . A Media Capture has exactly one set of spatial information
678	     . A Media Capture can be the source of one or more Capture
679	        Encodings

681	   Each Media Capture can be associated with attributes to describe
682	   what it represents.

684	7.1.1. Media Capture Attributes

686	   Media Capture Attributes describe information about the Captures.
687	   A Provider can use the Media Capture Attributes to describe the
688	   Captures for the benefit of the Consumer in the Advertisement
689	   message.  Media Capture Attributes include:

691	     . Spatial information, such as point of capture, point on line
692	        of capture, and area of capture, all of which, in combination
693	        define the capture field of, for example, a camera;
694	     . Capture multiplexing information (mono/stereo audio, maximum
695	        number of simultaneous encodings per Capture and so on); and
696	     . Other descriptive information to help the Consumer choose
697	        between captures (description, presentation, view, priority,
698	        language, person information and type).
699	     . Control information for use inside the CLUE protocol suite.

701	   The sub-sections below define the Capture attributes.

703	7.1.1.1. Point of Capture

705	   The Point of Capture attribute is a field with a single Cartesian
706	   (X, Y, Z) point value which describes the spatial location of the
707	   capturing device (such as camera).

709	7.1.1.2. Point on Line of Capture

711	   The Point on Line of Capture attribute is a field with a single
712	   Cartesian (X, Y, Z) point value which describes a position in space
713	   of a second point on the axis of the capturing device; the first
714	   point being the Point of Capture (see above).

716	   Together, the Point of Capture and Point on Line of Capture define
717	   an axis of the capturing device, for example the optical axis of a
718	   camera.  The Media Consumer can use this information to adjust how
719	   it renders the received media if it so chooses.

721	7.1.1.3. Area of Capture

723	   The Area of Capture is a field with a set of four (X, Y, Z) points
724	   as a value which describes the spatial location of what is being
725	   "captured".  By comparing the Area of Capture for different Media
726	   Captures within the same Capture Scene a consumer can determine the
727	   spatial relationships between them and render them correctly.

729	   The four points MUST be co-planar, forming a quadrilateral, which
730	   defines the Plane of Interest for the particular media capture.

732	   If the Area of Capture is not specified, it means the Media Capture
733	   is not spatially related to any other Media Capture.

735	   For a switched capture that switches between different sections
736	   within a larger area, the area of capture MUST use coordinates for
737	   the larger potential area.

739	7.1.1.4. Mobility of Capture

741	   The Mobility of Capture attribute indicates whether or not the
742	   point of capture, line on point of capture, and area of capture
743	   values stay the same over time, or are expected to change
744	   (potentially frequently).  Possible values are static, dynamic, and
745	   highly dynamic.

747	   An example for "dynamic" is a camera mounted on a stand which is
748	   occasionally hand-carried and placed at different positions in
749	   order to provide the best angle to capture a work task.  A camera
750	   worn by a person who moves around the room is an example for
751	   "highly dynamic". In either case, the effect is that the capture
752	   point, capture axis and area of capture change with time.

754	   The capture point of a static capture MUST NOT move for the life of
755	   the conference. The capture point of dynamic captures is
756	   categorized by a change in position followed by a reasonable period
757	   of stability--in the order of magnitude of minutes. High dynamic
758	   captures are categorized by a capture point that is constantly
759	   moving.  If the "area of capture", "capture point" and "line of
760	   capture" attributes are included with dynamic or highly dynamic
761	   captures they indicate spatial information at the time of the
762	   Advertisement.

764	7.1.1.5. Audio Channel Format

766	   The Audio Channel Format attribute is a field with enumerated
767	   values which describes the method of encoding used for audio. A
768	   value of 'mono' means the Audio Capture has one channel.  'stereo'
769	   means the Audio Capture has two audio channels, left and right.

771	   This attribute applies only to Audio Captures.  A single stereo
772	   capture is different from two mono captures that have a left-right
773	   spatial relationship.  A stereo capture maps to a single Capture
774	   Encoding, while each mono audio capture maps to a separate Capture
775	   Encoding.

777	7.1.1.6. Max Capture Encodings

779	   The Max Capture Encodings attribute is an optional attribute
780	   indicating the maximum number of Capture Encodings that can be
781	   simultaneously active for the Media Capture.  The number of
782	   simultaneous Capture Encodings is also limited by the restrictions
783	   of the Encoding Group for the Media Capture.

785	7.1.1.7. Description

787	   The Description attribute is a human-readable description of the
788	   Capture, which could be in multiple languages.

790	7.1.1.8. Presentation

792	   The Presentation attribute indicates that the capture originates
793	   from a presentation device, that is one that provides supplementary
794	   information to a conference through slides, video, still images,
795	   data etc.  Where more information is known about the capture it MAY
796	   be expanded hierarchically to indicate the different types of
797	   presentation media, e.g. presentation.slides, presentation.image
798	   etc.

800	   Note: It is expected that a number of keywords will be defined that
801	   provide more detail on the type of presentation.

803	7.1.1.9. View

805	   The View attribute is a field with enumerated values, indicating
806	   what type of view the Capture relates to.  The Consumer can use
807	   this information to help choose which Media Captures it wishes to
808	   receive.  The value MUST be one of:

810	   Room - Captures the entire scene

812	   Table - Captures the conference table with seated people

814	   Individual - Captures an individual person

816	   Lectern - Captures the region of the lectern including the
817	   presenter, for example in a classroom style conference room
818	   Audience - Captures a region showing the audience in a classroom
819	   style conference room

821	7.1.1.10. Language

823	   The language attribute indicates one or more languages used in the
824	   content of the Media Capture.  Captures MAY be offered in different
825	   languages in case of multilingual and/or accessible conferences.  A
826	   Consumer can use this attribute to differentiate between them and
827	   pick the appropriate one.

829	   Note that the Language attribute is defined and meaningful both for
830	   audio and video captures.  In case of audio captures, the meaning
831	   is obvious.  For a video capture, "Language" could, for example, be
832	   sign interpretation or text.

834	7.1.1.11. Person Information

836	   The person information attribute allows a Provider to provide
837	   specific information regarding the people in a Capture (regardless
838	   of whether or not the capture has a Presentation attribute). The
839	   Provider may gather the information automatically or manually from
840	   a variety of sources however the xCard [RFC6351] format is used to
841	   convey the information. This allows various information such as
842	   Identification information (section 6.2/[RFC6350]), Communication
843	   Information (section 6.4/[RFC6350]) and Organizational information
844	   (section 6.6/[RFC6350]) to be communicated. A Consumer may then
845	   automatically (i.e. via a policy) or manually select Captures
846	   based on information about who is in a Capture. It also allows a
847	   Consumer to render information regarding the people participating
848	   in the conference or to use it for further processing.

850	   The Provider may supply a minimal set of information or a larger
851	   set of information. However it MUST be compliant to [RFC6350] and
852	   supply a "VERSION" and "FN" property. A Provider may supply
853	   multiple xCards per Capture of any KIND (section 6.1.4/[RFC6350]).

855	   In order to keep CLUE messages compact the Provider SHOULD use a
856	   URI to point to any LOGO, PHOTO or SOUND contained in the xCARD
857	   rather than transmitting the LOGO, PHOTO or SOUND data in a CLUE
858	   message.

860	7.1.1.12. Person Type

862	   The person type attribute indicates the type of people contained in
863	   the capture in the conference with respect to the meeting agenda
864	   (regardless of whether or not the capture has a Presentation
865	   attribute). As a capture may include multiple people the attribute
866	   may contain multiple values. However values shall not be repeated
867	   within the attribute.

869	   An Advertiser associates the person type with an individual capture
870	   when it knows that a particular type is in the capture. If an
871	   Advertiser cannot link a particular type with some certainty to a
872	   capture then it is not included. A Consumer on reception of a
873	   capture with a person type attribute knows with some certainly that
874	   the capture contains that person type. The capture may contain
875	   other person types but the Advertiser has not been able to
876	   determine that this is the case.

878	   The types of Captured people include:

880	     . Chairman - the person responsible for running the conference
881	        according to the agenda.
882	     . Vice-Chairman - the person responsible for assisting the
883	        chairman in running the meeting.
884	     . Minute Taker - the person responsible for recording the
885	        minutes of the conference
886	     . Member - the person has no particular responsibilities with
887	        respect to running the meeting.
888	     . Presenter - the person is scheduled on the agenda to make a
889	        presentation in the meeting. Note: This is not related to any
890	        "active speaker" functionality.
891	     . Translator - the person is providing some form of translation
892	        or commentary in the meeting.
893	     . Timekeeper - the person is responsible for maintaining the
894	        meeting schedule.

896	   Furthermore the person type attribute may contain one or more
897	   strings allowing the Provider to indicate custom meeting specific
898	   roles.

900	7.1.1.13. Priority

902	   The priority attribute indicates a relative priority between
903	   different Media Captures.  The Provider sets this priority, and the
904	   Consumer MAY use the priority to help decide which captures it
905	   wishes to receive.

907	   The "priority" attribute is an integer which indicates a relative
908	   priority between Captures. For example it is possible to assign a
909	   priority between two presentation Captures that would allow a
910	   remote endpoint to determine which presentation is more important.
911	   Priority is assigned at the individual capture level. It represents
912	   the Provider's view of the relative priority between Captures with
913	   a priority. The same priority number MAY be used across multiple
914	   Captures. It indicates they are equally important. If no priority
915	   is assigned no assumptions regarding relative important of the
916	   Capture can be assumed.

918	7.1.1.14. Embedded Text

920	   The Embedded Text attribute indicates that a Capture provides
921	   embedded textual information. For example the video Capture MAY
922	   contain speech to text information composed with the video image.
923	   This attribute is only applicable to video Captures and
924	   presentation streams with visual information.

926	7.1.1.15. Related To

928	   The Related To attribute indicates the Capture contains additional
929	   complementary information related to another Capture.  The value
930	   indicates the identity of the other Capture to which this Capture
931	   is providing additional information.

933	   For example, a conference can utilize translators or facilitators
934	   that provide an additional audio stream (i.e. a translation or
935	   description or commentary of the conference).  Where multiple
936	   captures are available, it may be advantageous for a Consumer to
937	   select a complementary Capture instead of or in addition to a
938	   Capture it relates to.

940	7.2. Multiple Content Capture

942	   The MCC indicates that one or more Single Media Captures are
943	   contained in one Media Capture.  Only one Capture type (i.e. audio,
944	   video, etc.) is allowed in each MCC instance.  The MCC may contain
945	   a reference to the Single Media Captures (which may have their own
946	   attributes) as well as attributes associated with the MCC itself.
947	   A MCC may also contain other MCCs.  The MCC MAY reference Captures
948	   from within the Capture Scene that defines it or from other Capture
949	   Scenes.  No ordering is implied by the order that Captures appear
950	   within a MCC. A MCC MAY contain no references to other Captures to
951	   indicate that the MCC contains content from multiple sources but no
952	   information regarding those sources is given.

954	   One or more MCCs may also be specified in a CSE.  This allows an
955	   Advertiser to indicate that several MCC captures are used to
956	   represent a capture scene.  Table 14 provides an example of this
957	   case.

959	   As outlined in section 7.1. each instance of the MCC has its own
960	   Capture identity i.e. MCC1. It allows all the individual captures
961	   contained in the MCC to be referenced by a single MCC identity.

963	   The example below shows the use of a Multiple Content Capture:

965	        +-----------------------+---------------------------------+
966	        | Capture Scene #1      |                                 |
967	        +-----------------------|---------------------------------+
968	        | VC1                   | {attributes}                    |
969	        | VC2                   | {attributes}                    |
970	        | VCn                   | {attributes}                    |
971	        | MCC1(VC1,VC2,...VCn)  | {attributes}                    |
972	        | CSE(MCC1)             |                                 |
973	        +---------------------------------------------------------+

975	                Table 1: Multiple Content Capture concept

977	   This indicates that MCC1 is a single capture that contains the
978	   Captures VC1, VC2 and VC3 according to any MCC1 attributes.

980	7.2.1. MCC Attributes

982	   Attributes may be associated with the MCC instance and the Single
983	   Media Captures that the MCC references.  A provider should avoid
984	   providing conflicting attribute values between the MCC and Single
985	   Media Captures. Where there is conflict the attributes of the MCC
986	   override any that may be present in the individual captures.

988	   A Provider MAY include as much or as little of the original source
989	   Capture information as it requires.

991	   There are MCC specific attributes that MUST only be used with
992	   Multiple Content Captures. These are described in the sections
993	   below. The attributes described in section 7.1.1. MAY also be used
994	   with MCCs.

996	   The spatial related attributes of an MCC indicate its area of
997	   capture and point of capture within the scene, just like any other
998	   media capture.  The spatial information does not imply anything
999	   about how other captures are composed within an MCC.

1001	   For example:  A virtual scene could be constructed for the MCC
1002	   capture with two Video Captures with a "MaxCaptures" attribute set
1003	   to 2 and an "Area of Capture" attribute provided with an overall
1004	   area.  Each of the individual Captures could then also include an
1005	   "Area of Capture" attribute with a sub-set of the overall area.
1006	   The Consumer would then know how each capture is related to others
1007	   within the scene, but not the relative position of the individual
1008	   captures within the composed capture.

1010	        +-----------------------+---------------------------------+
1011	        | Capture Scene #1      |                                 |
1012	        +-----------------------|---------------------------------+
1013	        | VC1                   | AreaofCapture=(0,0,0)(9,0,0)    |
1014	        |                       |               (0,0,9)(9,0,9)    |
1015	        | VC2                   | AreaofCapture=(10,0,0)(19,0,0)  |
1016	        |                       |               (10,0,9)(19,0,9)  |
1017	        | MCC1(VC1,VC2)         | MaxCaptures=2                   |
1018	        |                       | AreaofCapture=(0,0,0)(19,0,0)   |
1019	        |                       |               (0,0,9)(19,0,9)   |
1020	        | CSE(MCC1)             |                                 |
1021	        +---------------------------------------------------------+

1023	        Table 2: Example of MCC and Single Media Capture attributes

1025	   The sections below describe the MCC only attributes.

1027	7.2.1.1. Maximum Number of Captures within a MCC

1029	   The Maximum Number of Captures MCC attribute indicates the maximum
1030	   number of individual captures that may appear in a Capture Encoding
1031	   at a time.  The actual number at any given time can be less than
1032	   this maximum.  It may be used to derive how the Single Media
1033	   Captures within the MCC are composed / switched with regards to
1034	   space and time.

1036	   A Provider can indicate that the number of captures in a MCC
1037	   capture encoding is equal "=" to the MaxCaptures value or that
1038	   there may be any number of captures up to and including "<=" the
1039	   MaxCaptures value. This allows a Provider to distinguish between a
1040	   MCC that purely represents a composition of sources versus a MCC
1041	   that represents switched or switched and composed sources.

1043	   MaxCaptures MAY be set to one so that only content related to one
1044	   of the sources are shown in the MCC Capture Encoding at a time or
1045	   it may be set to any value up to the total number of Source Media
1046	   Captures in the MCC.

1048	   The bullets below describe how the setting of MaxCapture versus the
1049	   number of captures in the MCC affects how sources appear in a
1050	   capture encoding:

1052	     . When MaxCaptures is set to <= 1 and the number of captures in
1053	        the MCC is greater than 1 (or not specified) in the MCC this
1054	        is a switched case. Zero or 1 captures may be switched into
1055	        the capture encoding. Note: zero is allowed because of the
1056	        "<=".
1057	     . When MaxCaptures is set to = 1 and the number of captures in
1058	        the MCC is greater than 1 (or not specified) in the MCC this
1059	        is a switched case. Only one capture source is contained in a
1060	        capture encoding at a time.
1061	     . When MaxCaptures is set to <= N (with N > 1) and the number of
1062	        captures in the MCC is greater than N (or not specified) this
1063	        is a switched and composed case. The capture encoding may
1064	        contain purely switched sources (i.e. <=2 allows for 1 source
1065	        on its own), or may contain composed and switched sources
1066	        (i.e. a composition of 2 sources switched between the
1067	        sources).
1068	     . When MaxCaptures is set to = N (with N > 1) and the number of
1069	        captures in the MCC is greater than N (or not specified) this
1070	        is a switched and composed case. The capture encoding contains
1071	        composed and switched sources (i.e. a composition of N sources
1072	        switched between the sources). It is not possible to have a
1073	        single source.
1074	     . When MaxCaptures is set to <= to the number of captures in the
1075	        MCC this is a switched and composed case. The capture encoding
1076	        may contain media switched between any number (up to the
1077	        MaxCaptures) of composed sources.
1078	     . When MaxCaptures is set to = to the number of captures in the
1079	        MCC this is a composed case. All the sources are composed into
1080	        a single capture encoding.

1082	   If this attribute is not set then as default it is assumed that all
1083	   source content can appear concurrently in the Capture Encoding
1084	   associated with the MCC.

1086	   For example: The use of MaxCaptures equal to 1 on a MCC with three
1087	   Video Captures VC1, VC2 and VC3 would indicate that the Advertiser
1088	   in the capture encoding would switch  between VC1, VC2 or VC3 as
1089	   there may be only a maximum of one capture at a time.

1091	7.2.1.2. Policy

1093	   The Policy MCC Attribute indicates the criteria that the Provider
1094	   uses to determine when and/or where media content appears in the
1095	   Capture Encoding related to the MCC.

1097	   The attribute is in the form of a token that indicates the policy
1098	   and index representing an instance of the policy.

1100	   The tokens are:

1102	   SoundLevel - This indicates that the content of the MCC is
1103	   determined by a sound level detection algorithm. For example: the
1104	   loudest (active) speaker is contained in the MCC.

1106	   RoundRobin - This indicates that the content of the MCC is
1107	   determined by a time based algorithm. For example: the Provider
1108	   provides content from a particular source for a period of time and
1109	   then provides content from another source and so on.

1111	   An index is used to represent an instance in the policy setting. A
1112	   index of 0 represents the most current instance of the policy, i.e.
1113	   the active speaker, 1 represents the previous instance, i.e. the
1114	   previous active speaker and so on.

1116	   The following example shows a case where the Provider provides two
1117	   media streams, one showing the active speaker and a second stream
1118	   showing the previous speaker.

1120	        +-----------------------+---------------------------------+
1121	        | Capture Scene #1      |                                 |
1122	        +-----------------------|---------------------------------+
1123	        | VC1                   |                                 |
1124	        | VC2                   |                                 |
1125	        | MCC1(VC1,VC2)         | Policy=SoundLevel:0             |
1126	        |                       | MaxCaptures=1                   |
1127	        | MCC2(VC1,VC2)         | Policy=SoundLevel:1             |
1128	        |                       | MaxCaptures=1                   |
1129	        | CSE(MCC1,MCC2)        |                                 |
1130	        +---------------------------------------------------------+

1132	                Table 3: Example Policy MCC attribute usage

1134	7.2.1.3. Synchronisation Identity

1136	   The Synchronisation Identity MCC attribute indicates how the
1137	   individual captures in multiple MCC captures are synchronised.  To
1138	   indicate that the Capture Encodings associated with MCCs contain
1139	   captures from the source at the same time a Provider should set the
1140	   same Synchronisation Identity on each of the concerned MCCs.  It is
1141	   the provider that determines what the source for the Captures is,
1142	   so a provider can choose how to group together Single Media
1143	   Captures for the purpose of keeping them synchronized according to
1144	   the SynchronisationID attribute.  For example when the provider is
1145	   in an MCU it may determine that each separate CLUE endpoint is a
1146	   remote source of media. The Synchronisation Identity may be used
1147	   across media types, i.e. to synchronize audio and video related
1148	   MCCs.

1150	   Without this attribute it is assumed that multiple MCCs may provide
1151	   content from different sources at any particular point in time.

1153	   For example:

1155	        +=======================+=================================+
1156	        | Capture Scene #1      |                                 |
1157	        +-----------------------|---------------------------------+
1158	        | VC1                   | Description=Left                |
1159	        | VC2                   | Description=Centre              |
1160	        | VC3                   | Description=Right               |
1161	        | AC1                   | Description=room                |
1162	        | CSE(VC1,VC2,VC3)      |                                 |
1163	        | CSE(AC1)              |                                 |
1164	        +=======================+=================================+
1165	        | Capture Scene #2      |                                 |
1166	        +-----------------------|---------------------------------+
1167	        | VC4                   | Description=Left                |
1168	        | VC5                   | Description=Centre              |
1169	        | VC6                   | Description=Right               |
1170	        | AC2                   | Description=room                |
1171	        | CSE(VC4,VC5,VC6)      |                                 |
1172	        | CSE(AC2)              |                                 |
1173	        +=======================+=================================+
1174	        | Capture Scene #3      |                                 |
1175	        +-----------------------|---------------------------------+
1176	        | VC7                   |                                 |
1177	        | AC3                   |                                 |
1178	        +=======================+=================================+
1179	        | Capture Scene #4      |                                 |
1180	        +-----------------------|---------------------------------+
1181	        | VC8                   |                                 |
1182	        | AC4                   |                                 |
1183	        +=======================+=================================+
1184	        | Capture Scene #3      |                                 |
1185	        +-----------------------|---------------------------------+
1186	        | MCC1(VC1,VC4,VC7)     | SynchronisationID=1             |
1187	        |                       | MaxCaptures=1                   |
1188	        | MCC2(VC2,VC5,VC8)     | SynchronisationID=1             |
1189	        |                       | MaxCaptures=1                   |
1190	        | MCC3(VC3,VC6)         | MaxCaptures=1                   |
1191	        | MCC4(AC1,AC2,AC3,AC4) | SynchronisationID=1             |
1192	        |                       | MaxCaptures=1                   |
1193	        | CSE(MCC1,MCC2,MCC3)   |                                 |
1194	        | CSE(MCC4)             |                                 |
1195	        +=======================+=================================+

1197	       Table 4: Example Synchronisation Identity MCC attribute usage

1199	   The above Advertisement would indicate that MCC1, MCC2, MCC3 and
1200	   MCC4 make up a Capture Scene.  There would be four capture
1201	   encodings (one for each MCC).  Because MCC1 and MCC2 have the same
1202	   SynchronisationID, each encoding from MCC1 and MCC2 respectively
1203	   would together have content from only Capture Scene 1 or only
1204	   Capture Scene 2 or the combination of VC7 and VC8 at a particular
1205	   point in time.  In this case the provider has decided the sources
1206	   to be synchronized are Scene #1, Scene #2, and Scene #3 and #4
1207	   together. The encoding from MCC3 would not be synchronised with
1208	   MCC1 or MCC2. As MCC4 also has the same Synchronisation Identity
1209	   as MCC1 and MCC2 the content of the audio encoding will be
1210	   synchronised with the video content.

1212	7.3. Capture Scene

1214	   In order for a Provider's individual Captures to be used
1215	   effectively by a Consumer, the provider organizes the Captures into
1216	   one or more Capture Scenes, with the structure and contents of
1217	   these Capture Scenes being sent from the Provider to the Consumer
1218	   in the Advertisement.

1220	   A Capture Scene is a structure representing a spatial region
1221	   containing one or more Capture Devices, each capturing media
1222	   representing a portion of the region.  A Capture Scene includes one
1223	   or more Capture Scene entries, with each entry including one or
1224	   more Media Captures.  A Capture Scene represents, for example, the
1225	   video image of a group of people seated next to each other, along
1226	   with the sound of their voices, which could be represented by some
1227	   number of VCs and ACs in the Capture Scene Entries.  A middle box
1228	   can also describe in Capture Scenes what it constructs from media
1229	   Streams it receives.

1231	   A Provider MAY advertise one or more Capture Scenes.  What
1232	   constitutes an entire Capture Scene is up to the Provider.  A
1233	   simple Provider might typically use one Capture Scene for
1234	   participant media (live video from the room cameras) and another
1235	   Capture Scene for a computer generated presentation.  In more
1236	   complex systems, the use of additional Capture Scenes is also
1237	   sensible.  For example, a classroom may advertise two Capture
1238	   Scenes involving live video, one including only the camera
1239	   capturing the instructor (and associated audio), the other
1240	   including camera(s) capturing students (and associated audio).

1242	   A Capture Scene MAY (and typically will) include more than one type
1243	   of media.  For example, a Capture Scene can include several Capture
1244	   Scene Entries for Video Captures, and several Capture Scene Entries
1245	   for Audio Captures.  A particular Capture MAY be included in more
1246	   than one Capture Scene Entry.

1248	   A provider MAY express spatial relationships between Captures that
1249	   are included in the same Capture Scene.  However, there is not
1250	   necessarily the same spatial relationship between Media Captures
1251	   that are in different Capture Scenes.  In other words, Capture
1252	   Scenes can use their own spatial measurement system as outlined
1253	   above in section 6.

1255	   A Provider arranges Captures in a Capture Scene to help the
1256	   Consumer choose which captures it wants to render.  The Capture
1257	   Scene Entries in a Capture Scene are different alternatives the
1258	   Provider is suggesting for representing the Capture Scene.  Each
1259	   Capture Scene Entry is given an advertisement unique identity.  The
1260	   order of Capture Scene Entries within a Capture Scene has no
1261	   significance.  The Media Consumer can choose to receive all Media
1262	   Captures from one Capture Scene Entry for each media type (e.g.
1263	   audio and video), or it can pick and choose Media Captures
1264	   regardless of how the Provider arranges them in Capture Scene
1265	   Entries.  Different Capture Scene Entries of the same media type
1266	   are not necessarily mutually exclusive alternatives.  Also note
1267	   that the presence of multiple Capture Scene Entries (with
1268	   potentially multiple encoding options in each entry) in a given
1269	   Capture Scene does not necessarily imply that a Provider is able to
1270	   serve all the associated media simultaneously (although the
1271	   construction of such an over-rich Capture Scene is probably not
1272	   sensible in many cases).  What a Provider can send simultaneously
1273	   is determined through the Simultaneous Transmission Set mechanism,
1274	   described in section 8.

1276	   Captures within the same Capture Scene entry MUST be of the same
1277	   media type - it is not possible to mix audio and video captures in
1278	   the same Capture Scene Entry, for instance.  The Provider MUST be
1279	   capable of encoding and sending all Captures (that have an encoding
1280	   group) in a single Capture Scene Entry simultaneously.  The order
1281	   of Captures within a Capture Scene Entry has no significance.  A
1282	   Consumer can decide to receive all the Captures in a single Capture
1283	   Scene Entry, but a Consumer could also decide to receive just a
1284	   subset of those captures.  A Consumer can also decide to receive
1285	   Captures from different Capture Scene Entries, all subject to the
1286	   constraints set by Simultaneous Transmission Sets, as discussed in
1287	   section 8.

1289	   When a Provider advertises a Capture Scene with multiple entries,
1290	   it is essentially signaling that there are multiple representations
1291	   of the same Capture Scene available.  In some cases, these multiple
1292	   representations would typically be used simultaneously (for
1293	   instance a "video entry" and an "audio entry").  In some cases the
1294	   entries would conceptually be alternatives (for instance an entry
1295	   consisting of three Video Captures covering the whole room versus
1296	   an entry consisting of just a single Video Capture covering only
1297	   the center of a room).  In this latter example, one sensible choice
1298	   for a Consumer would be to indicate (through its Configure and
1299	   possibly through an additional offer/answer exchange) the Captures
1300	   of that Capture Scene Entry that most closely matched the
1301	   Consumer's number of display devices or screen layout.

1303	   The following is an example of 4 potential Capture Scene Entries
1304	   for an endpoint-style Provider:

1306	   1.  (VC0, VC1, VC2) - left, center and right camera Video Captures
1307	   2.  (VC3) - Video Capture associated with loudest room segment

1309	   3.  (VC4) - Video Capture zoomed out view of all people in the room

1311	   4.  (AC0) - main audio

1313	   The first entry in this Capture Scene example is a list of Video
1314	   Captures which have a spatial relationship to each other.
1315	   Determination of the order of these captures (VC0, VC1 and VC2) for
1316	   rendering purposes is accomplished through use of their Area of
1317	   Capture attributes.  The second entry (VC3) and the third entry
1318	   (VC4) are alternative representations of the same room's video,
1319	   which might be better suited to some Consumers' rendering
1320	   capabilities.  The inclusion of the Audio Capture in the same
1321	   Capture Scene indicates that AC0 is associated with all of those
1322	   Video Captures, meaning it comes from the same spatial region.
1323	   Therefore, if audio were to be rendered at all, this audio would be
1324	   the correct choice irrespective of which Video Captures were
1325	   chosen.

1327	7.3.1. Capture Scene attributes

1329	   Capture Scene Attributes can be applied to Capture Scenes as well
1330	   as to individual media captures.  Attributes specified at this
1331	   level apply to all constituent Captures.  Capture Scene attributes
1332	   include

1334	     . Human-readable description of the Capture Scene, which could
1335	        be in multiple languages;
1336	     . xCard scene information
1337	     . Scale information (millimeters, unknown, no scale), as
1338	        described in Section 6.

1340	7.3.1.1. Scene Information

1342	   The Scene information attribute provides information regarding the
1343	   Capture Scene rather than individual participants. The Provider
1344	   may gather the information automatically or manually from a
1345	   variety of sources. The scene information attribute allows a
1346	   Provider to indicate information such as: organizational or
1347	   geographic information allowing a Consumer to determine which
1348	   Capture Scenes are of interest in order to then perform Capture
1349	   selection. It also allows a Consumer to render information
1350	   regarding the Scene or to use it for further processing.

1352	   As per 7.1.1.11. the xCard format is used to convey this
1353	   information and the Provider may supply a minimal set of
1354	   information or a larger set of information.

1356	   In order to keep CLUE messages compact the Provider SHOULD use a
1357	   URI to point to any LOGO, PHOTO or SOUND contained in the xCARD
1358	   rather than transmitting the LOGO, PHOTO or SOUND data in a CLUE
1359	   message.

1361	7.3.2. Capture Scene Entry attributes

1363	   A Capture Scene can include one or more Capture Scene Entries in
1364	   addition to the Capture Scene wide attributes described above.
1365	   Capture Scene Entry attributes apply to the Capture Scene Entry as
1366	   a whole, i.e. to all Captures that are part of the Capture Scene
1367	   Entry.

1369	   Capture Scene Entry attributes include:

1371	     . Human-readable description of the Capture Scene Entry, which
1372	        could be in multiple languages;

1374	7.3.3. Global Capture Scene Entry List

1376	   An Advertisement can include an optional global Capture Scene
1377	   Entry list.  Each item in this list is a set of one or more
1378	   Capture Scene Entries of the same media type.  Each set of CSEs in
1379	   the list is a suggestion from the Provider to the Consumer for
1380	   which CSEs provide a complete representation of the simultaneous
1381	   captures provided by the provider, across multiple scenes.  The
1382	   Provider can include multiple sets, to allow a consumer to choose
1383	   sets of captures appropriate to its capabilities or application.
1384	   The choice of how to make these suggestions in the Global CSE list
1385	   for what represents all the scenes for which the provider can send
1386	   media is up to the provider.  This is very similar to how each CSE
1387	   represents a particular scene.

1389	   As an example, suppose an advertisement has three scenes, and each
1390	   scene has three CSEs, ranging from one to three video captures in
1391	   each CSE.  The provider is advertising a total of nine video
1392	   captures across three scenes.  The provider can use the Global CSE
1393	   list to suggest alternatives for consumers that can't receive all
1394	   nine video captures as separate media streams.  For accommodating
1395	   a consumer that wants to receive three video captures, a provider
1396	   might suggest a single CSE with three captures and nothing from
1397	   the other two scenes.  Or a provider might suggest three different
1398	   CSEs, one from each scene, with a single video capture in each.

1400	   Some additional rules:

1402	     . The ordering of items (sets of CSEs) in the global CSE list
1403	        is not important.
1404	     . The ordering of CSEs within each set is not important.
1405	     . A particular CSE may be used in multiple sets.
1406	     . The Provider must be capable of encoding and sending all
1407	        Captures within the CSEs of a given set simultaneously.

1409	8. Simultaneous Transmission Set Constraints

1411	   In many practical cases, a Provider has constraints or limitations
1412	   on its ability to send Captures simultaneously.  One type of
1413	   limitation is caused by the physical limitations of capture
1414	   mechanisms; these constraints are represented by a simultaneous
1415	   transmission set.  The second type of limitation reflects the
1416	   encoding resources available, such as bandwidth or video encoding
1417	   throughput (macroblocks/second).  This type of constraint is
1418	   captured by encoding groups, discussed below.

1420	   Some Endpoints or MCUs can send multiple Captures simultaneously;
1421	   however sometimes there are constraints that limit which Captures
1422	   can be sent simultaneously with other Captures.  A device may not
1423	   be able to be used in different ways at the same time.  Provider
1424	   Advertisements are made so that the Consumer can choose one of
1425	   several possible mutually exclusive usages of the device.  This
1426	   type of constraint is expressed in a Simultaneous Transmission Set,
1427	   which lists all the Captures of a particular media type (e.g.
1428	   audio, video, text) that can be sent at the same time.  There are
1429	   different Simultaneous Transmission Sets for each media type in the
1430	   Advertisement.  This is easier to show in an example.

1432	   Consider the example of a room system where there are three cameras
1433	   each of which can send a separate capture covering two persons
1434	   each- VC0, VC1, VC2.  The middle camera can also zoom out (using an
1435	   optical zoom lens) and show all six persons, VC3.  But the middle
1436	   camera cannot be used in both modes at the same time - it has to
1437	   either show the space where two participants sit or the whole six
1438	   seats, but not both at the same time.  As a result, VC1 and VC3
1439	   cannot be sent simultaneously.

1441	   Simultaneous Transmission Sets are expressed as sets of the Media
1442	   Captures that the Provider could transmit at the same time (though,
1443	   in some cases, it is not intuitive to do so).  If a Multiple
1444	   Content Capture is included in a Simultaneous Transmission Set it
1445	   indicates that the Capture Encoding associated with it could be
1446	   transmitted as the same time as the other Captures within the
1447	   Simultaneous Transmission Set. It does not imply that the Single
1448	   Media Captures contained in the Multiple Content Capture could all
1449	   be transmitted at the same time.

1451	   In this example the two simultaneous sets are shown in Table 1.  If
1452	   a Provider advertises one or more mutually exclusive Simultaneous
1453	   Transmission Sets, then for each media type the Consumer MUST
1454	   ensure that it chooses Media Captures that lie wholly within one of
1455	   those Simultaneous Transmission Sets.

1457	                           +-------------------+
1458	                           | Simultaneous Sets |
1459	                           +-------------------+
1460	                           | {VC0, VC1, VC2}   |
1461	                           | {VC0, VC3, VC2}   |
1462	                           +-------------------+

1464	                Table 5: Two Simultaneous Transmission Sets

1466	   A Provider OPTIONALLY can include the simultaneous sets in its
1467	   provider Advertisement.  These simultaneous set constraints apply
1468	   across all the Capture Scenes in the Advertisement.  It is a syntax
1469	   conformance requirement that the simultaneous transmission sets
1470	   MUST allow all the media captures in any particular Capture Scene
1471	   Entry to be used simultaneously.  Similarly, the simultaneous
1472	   transmission sets MUST reflect the simultaneity expressed by any
1473	   global CSE sets.

1475	   For shorthand convenience, a Provider MAY describe a Simultaneous
1476	   Transmission Set in terms of Capture Scene Entries and Capture
1477	   Scenes.  If a Capture Scene Entry is included in a Simultaneous
1478	   Transmission Set, then all Media Captures in the Capture Scene
1479	   Entry are included in the Simultaneous Transmission Set.  If a
1480	   Capture Scene is included in a Simultaneous Transmission Set, then
1481	   all its Capture Scene Entries (of the corresponding media type) are
1482	   included in the Simultaneous Transmission Set.  The end result
1483	   reduces to a set of Media Captures in either case.

1485	   If an Advertisement does not include Simultaneous Transmission
1486	   Sets, then the Provider MUST be able to provide all Capture Scenes
1487	   simultaneously.  If multiple capture Scene Entries are in a Capture
1488	   Scene then the Consumer chooses at most one Capture Scene Entry per
1489	   Capture Scene for each media type.  Likewise, if there are no
1490	   Simultaneous Transmission Sets and there is a global CSE list, then
1491	   the Consumer chooses at most one set of CSEs of each media type,
1492	   from the global CSE list.

1494	   If an Advertisement includes multiple Capture Scene Entries in a
1495	   Capture Scene then the Consumer MAY choose one Capture Scene Entry
1496	   for each media type, or MAY choose individual Captures based on the
1497	   Simultaneous Transmission Sets.

1499	9. Encodings

1501	   Individual encodings and encoding groups are CLUE's mechanisms
1502	   allowing a Provider to signal its limitations for sending Captures,
1503	   or combinations of Captures, to a Consumer.  Consumers can map the
1504	   Captures they want to receive onto the Encodings, with encoding
1505	   parameters they want.  As for the relationship between the CLUE-
1506	   specified mechanisms based on Encodings and the SIP Offer-Answer
1507	   exchange, please refer to section 4.

1509	9.1. Individual Encodings

1511	   An Individual Encoding represents a way to encode a Media Capture
1512	   to become a Capture Encoding, to be sent as an encoded media stream
1513	   from the Provider to the Consumer.  An Individual Encoding has a
1514	   set of parameters characterizing how the media is encoded.

1516	   Different media types have different parameters, and different
1517	   encoding algorithms may have different parameters.  An Individual
1518	   Encoding can be assigned to at most one Capture Encoding at any
1519	   given time.

1521	   The parameters of an Individual Encoding represent the maximum
1522	   values for certain aspects of the encoding.  A particular
1523	   instantiation into a Capture Encoding MAY use lower values than
1524	   these maximums if that is applicable for the media in question.
1525	   For example, most video codec specifications require a conformant
1526	   decoder to decode resolutions and frame rates smaller than what has
1527	   been negotiated as a maximum, so downgrading the CLUE maximum
1528	   values for macroblocks/second is appropriate.  On the other hand,
1529	   downgrading the sample rate of G.711 audio below 8kHz is not
1530	   specified in G.711 and therefore not applicable in the sense
1531	   described here.

1533	   Individual Encoding parameters are represented in SDP [RFC4566],
1534	   not in CLUE messages.  For example, for a video encoding using
1535	   H.26x compression technologies, this can include parameters such
1536	   as:

1538	     . Maximum bandwidth;
1539	     . Maximum picture size in pixels;
1540	     . Maxmimum number of pixels to be processed per second;

1542	   The bandwidth parameter is the only one that specifically relates
1543	   to a CLUE Advertisement, as it can be further constrained by the
1544	   maximum group bandwidth in an Encoding Group.

1546	9.2. Encoding Group

1548	   An Encoding Group includes a set of one or more Individual
1549	   Encodings, and parameters that apply to the group as a whole.  By
1550	   grouping multiple individual Encodings together, an Encoding Group
1551	   describes additional constraints on bandwidth for the group.

1553	   The Encoding Group data structure contains:

1555	     . Maximum bitrate for all encodings in the group combined;
1556	     . A list of identifiers for audio and video encodings,
1557	        respectively, belonging to the group.

1559	   When the Individual Encodings in a group are instantiated into
1560	   Capture Encodings, each Capture Encoding has a bitrate that MUST be
1561	   less than or equal to the max bitrate for the particular individual
1562	   encoding.  The "maximum bitrate for all encodings in the group"
1563	   parameter gives the additional restriction that the sum of all the
1564	   individual capture encoding bitrates MUST be less than or equal to
1565	   the this group value.

1567	   The following diagram illustrates one example of the structure of a
1568	   media provider's Encoding Groups and their contents.

1570	   ,-------------------------------------------------.
1571	   |             Media Provider                      |
1572	   |                                                 |
1573	   |  ,--------------------------------------.       |
1574	   |  | ,--------------------------------------.     |
1575	   |  | | ,--------------------------------------.   |
1576	   |  | | |          Encoding Group              |   |
1577	   |  | | | ,-----------.                        |   |
1578	   |  | | | |           | ,---------.            |   |
1579	   |  | | | |           | |         | ,---------.|   |
1580	   |  | | | | Encoding1 | |Encoding2| |Encoding3||   |
1581	   |  `.| | |           | |         | `---------'|   |
1582	   |    `.| `-----------' `---------'            |   |
1583	   |      `--------------------------------------'   |
1584	   `-------------------------------------------------'

1586	                    Figure 3: Encoding Group Structure

1588	   A Provider advertises one or more Encoding Groups.  Each Encoding
1589	   Group includes one or more Individual Encodings.  Each Individual
1590	   Encoding can represent a different way of encoding media.  For
1591	   example one Individual Encoding may be 1080p60 video, another could
1592	   be 720p30, with a third being CIF, all in, for example, H.264
1593	   format.
1594	   While a typical three codec/display system might have one Encoding
1595	   Group per "codec box" (physical codec, connected to one camera and
1596	   one screen), there are many possibilities for the number of
1597	   Encoding Groups a Provider may be able to offer and for the
1598	   encoding values in each Encoding Group.

1600	   There is no requirement for all Encodings within an Encoding Group
1601	   to be instantiated at the same time.

1603	9.3. Associating Captures with Encoding Groups

1605	   Each Media Capture MAY be associated with at least one Encoding
1606	   Group, which is used to instantiate that Capture into one or more
1607	   Capture Encodings.  Typically MCCs are assigned an Encoding Group
1608	   and thus become a Capture Encoding.  The Captures (including other
1609	   MCCs) referenced by the MCC do not need to be assigned to an
1610	   Encoding Group. This means that all the Media Captures referenced
1611	   by the MCC will appear in the Capture Encoding according to any MCC
1612	   attributes. This allows an Advertiser to specify Capture attributes
1613	   associated with the Media Captures without the need to provide an
1614	   individual Capture Encoding for each of the inputs.

1616	   If an Encoding Group is assigned to a Media Capture referenced by
1617	   the MCC it indicates that this Capture may also have an individual
1618	   Capture Encoding.

1620	   For example:

1622	        +--------------------+------------------------------------+
1623	        | Capture Scene #1   |                                    |
1624	        +--------------------+------------------------------------+
1625	        | VC1                | EncodeGroupID=1                    |
1626	        | VC2                |                                    |
1627	        | MCC1(VC1,VC2)      | EncodeGroupID=2                    |
1628	        | CSE(VC1)           |                                    |
1629	        | CSE(MCC1)          |                                    |
1630	        +--------------------+------------------------------------+

1632	     Table 6: Example usage of Encoding with MCC and source Captures

1634	   This would indicate that VC1 may be sent as its own Capture
1635	   Encoding from EncodeGroupID=1 or that it may be sent as part of a
1636	   Capture Encoding from EncodeGroupID=2 along with VC2.

1638	   More than one Capture MAY use the same Encoding Group.

1640	   The maximum number of streams that can result from a particular
1641	   Encoding Group constraint is equal to the number of individual
1642	   Encodings in the group.  The actual number of Capture Encodings
1643	   used at any time MAY be less than this maximum.  Any of the
1644	   Captures that use a particular Encoding Group can be encoded
1645	   according to any of the Individual Encodings in the group.  If
1646	   there are multiple Individual Encodings in the group, then the
1647	   Consumer can configure the Provider, via a Configure message, to
1648	   encode a single Media Capture into multiple different Capture
1649	   Encodings at the same time, subject to the Max Capture Encodings
1650	   constraint, with each capture encoding following the constraints of
1651	   a different Individual Encoding.

1653	   It is a protocol conformance requirement that the Encoding Groups
1654	   MUST allow all the Captures in a particular Capture Scene Entry to
1655	   be used simultaneously.

1657	10. Consumer's Choice of Streams to Receive from the Provider

1659	   After receiving the Provider's Advertisement message (that includes
1660	   media captures and associated constraints), the Consumer composes
1661	   its reply to the Provider in the form of a Configure message.  The
1662	   Consumer is free to use the information in the Advertisement as it
1663	   chooses, but there are a few obviously sensible design choices,
1664	   which are outlined below.

1666	   If multiple Providers connect to the same Consumer (i.e. in a n
1667	   MCU-less multiparty call), it is the responsibility of the Consumer
1668	   to compose Configures for each Provider that both fulfill each
1669	   Provider's constraints as expressed in the Advertisement, as well
1670	   as its own capabilities.

1672	   In an MCU-based multiparty call, the MCU can logically terminate
1673	   the Advertisement/Configure negotiation in that it can hide the
1674	   characteristics of the receiving endpoint and rely on its own
1675	   capabilities (transcoding/transrating/...) to create Media Streams
1676	   that can be decoded at the Endpoint Consumers.  The timing of an
1677	   MCU's sending of Advertisements (for its outgoing ports) and
1678	   Configures (for its incoming ports, in response to Advertisements
1679	   received there) is up to the MCU and implementation dependent.

1681	   As a general outline, A Consumer can choose, based on the
1682	   Advertisement it has received, which Captures it wishes to receive,
1683	   and which Individual Encodings it wants the Provider to use to
1684	   encode the Captures.

1686	   On receipt of an Advertisement with an MCC the Consumer treats the
1687	   MCC as per other non-MCC Captures with the following differences:

1689	   - The Consumer would understand that the MCC is a Capture that
1690	   includes the referenced individual Captures and that these
1691	   individual Captures are delivered as part of the MCC's Capture
1692	   Encoding.

1694	   - The Consumer may utilise any of the attributes associated with
1695	   the referenced individual Captures and any Capture Scene attributes
1696	   from where the individual Captures were defined to choose Captures
1697	   and for rendering decisions.

1699	   - The Consumer may or may not choose to receive all the indicated
1700	   captures.  Therefore it can choose to receive a sub-set ofCaptures
1701	   indicated by the MCC.

1703	   For example if the Consumer receives:

1705	           MCC1(VC1,VC2,VC3){attributes}

1707	   A Consumer could choose all the Captures within a MCCs however if
1708	   the Consumer determines that it doesn't want VC3 it can return
1709	   MCC1(VC1,VC2).  If it wants all the individual Captures then it
1710	   returns only the MCC identity (i.e. MCC1).  If the MCC in the
1711	   advertisement does not reference any individual captures, then the
1712	   Consumer cannot choose what is included in the MCC, it is up to the
1713	   Provider to decide.

1715	   A Configure Message includes a list of Capture Encodings.  These
1716	   are the Capture Encodings the Consumer wishes to receive from the
1717	   Provider.  Each Capture Encoding refers to one Media Capture, one
1718	   Individual Encoding, and includes the encoding parameter values.  A
1719	   Configure Message does not include references to Capture Scenes or
1720	   Capture Scene Entries.

1722	   For each Capture the Consumer wants to receive, it configures one
1723	   or more of the encodings in that capture's encoding group.  The
1724	   Consumer does this by telling the Provider, in its Configure
1725	   Message, parameters such as the resolution, frame rate, bandwidth,
1726	   etc. for each Capture Encodings for its chosen Captures.  Upon
1727	   receipt of this Configure from the Consumer, common knowledge is
1728	   established between Provider and Consumer regarding sensible
1729	   choices for the media streams and their parameters.  The setup of
1730	   the actual media channels, at least in the simplest case, is left
1731	   to a following offer-answer exchange.  Optimized implementations
1732	   MAY speed up the reaction to the offer-answer exchange by reserving
1733	   the resources at the time of finalization of the CLUE handshake.

1735	   CLUE advertisements and configure messages don't necessarily
1736	   require a new SDP offer-answer for every CLUE message
1737	   exchange.  But the resulting encodings sent via RTP must conform to
1738	   the most recent SDP offer-answer result.

1740	   In order to meaningfully create and send an initial Configure, the
1741	   Consumer needs to have received at least one Advertisement from the
1742	   Provider.

1744	   In addition, the Consumer can send a Configure at any time during
1745	   the call.  The Configure MUST be valid according to the most
1746	   recently received Advertisement.  The Consumer can send a Configure
1747	   either in response to a new Advertisement from the Provider or on
1748	   its own, for example because of a local change in conditions
1749	   (people leaving the room, connectivity changes, multipoint related
1750	   considerations).

1752	   When choosing which Media Streams to receive from the Provider, and
1753	   the encoding characteristics of those Media Streams, the Consumer
1754	   advantageously takes several things into account: its local
1755	   preference, simultaneity restrictions, and encoding limits.

1757	10.1. Local preference

1759	   A variety of local factors influence the Consumer's choice of
1760	   Media Streams to be received from the Provider:

1762	   o  if the Consumer is an Endpoint, it is likely that it would
1763	      choose, where possible, to receive video and audio Captures that
1764	      match the number of display devices and audio system it has

1766	   o  if the Consumer is a middle box such as an MCU, it MAY choose to
1767	      receive loudest speaker streams (in order to perform its own
1768	      media composition) and avoid pre-composed video Captures

1770	   o  user choice (for instance, selection of a new layout) MAY result
1771	      in a different set of Captures, or different encoding
1772	      characteristics, being required by the Consumer

1774	10.2. Physical simultaneity restrictions

1776	   Often there are physical simultaneity constraints of the Provider
1777	   that affect the Provider's ability to simultaneously send all of
1778	   the captures the Consumer would wish to receive.  For instance, a
1779	   middle box such as an MCU, when connected to a multi-camera room
1780	   system, might prefer to receive both individual video streams of
1781	   the people present in the room and an overall view of the room
1782	   from a single camera.  Some Endpoint systems might be able to
1783	   provide both of these sets of streams simultaneously, whereas
1784	   others might not (if the overall room view were produced by
1785	   changing the optical zoom level on the center camera, for
1786	   instance).

1788	10.3. Encoding and encoding group limits

1790	   Each of the Provider's encoding groups has limits on bandwidth and
1791	   computational complexity, and the constituent potential encodings
1792	   have limits on the bandwidth, computational complexity, video
1793	   frame rate, and resolution that can be provided.  When choosing
1794	   the Captures to be received from a Provider, a Consumer device
1795	   MUST ensure that the encoding characteristics requested for each
1796	   individual Capture fits within the capability of the encoding it
1797	   is being configured to use, as well as ensuring that the combined
1798	   encoding characteristics for Captures fit within the capabilities
1799	   of their associated encoding groups.  In some cases, this could
1800	   cause an otherwise "preferred" choice of capture encodings to be
1801	   passed over in favor of different Capture Encodings--for instance,
1802	   if a set of three Captures could only be provided at a low
1803	   resolution then a three screen device could switch to favoring a
1804	   single, higher quality, Capture Encoding.

1806	11. Extensibility

1808	   One important characteristics of the Framework is its
1809	   extensibility.  Telepresence is a relatively new industry and
1810	   while we can foresee certain directions, we also do not know
1811	   everything about how it will develop.  The standard for
1812	   interoperability and handling multiple streams must be future-
1813	   proof. The framework itself is inherently extensible through
1814	   expanding the data model types.  For example:

1816	   o  Adding more types of media, such as telemetry, can done by
1817	      defining additional types of Captures in addition to audio and
1818	      video.

1820	   o  Adding new functionalities, such as 3-D, say, may require
1821	      additional attributes describing the Captures.

1823	   o  Adding a new codecs, such as H.265, can be accomplished by
1824	      defining new encoding variables.

1826	   The infrastructure is designed to be extended rather than
1827	   requiring new infrastructure elements.  Extension comes through
1828	   adding to defined types.

1830	12. Examples - Using the Framework (Informative)

1832	   This section gives some examples, first from the point of view of
1833	   the Provider, then the Consumer, then some multipoint scenarios

1835	12.1. Provider Behavior

1837	   This section shows some examples in more detail of how a Provider
1838	   can use the framework to represent a typical case for telepresence
1839	   rooms.  First an endpoint is illustrated, then an MCU case is
1840	   shown.

1842	12.1.1. Three screen Endpoint Provider

1844	   Consider an Endpoint with the following description:

1846	   3 cameras, 3 displays, a 6 person table

1848	   o  Each camera can provide one Capture for each 1/3 section of the
1849	      table

1851	   o  A single Capture representing the active speaker can be provided
1852	      (voice activity based camera selection to a given encoder input
1853	      port implemented locally in the Endpoint)

1855	   o  A single Capture representing the active speaker with the other
1856	      2 Captures shown picture in picture within the stream can be
1857	      provided (again, implemented inside the endpoint)

1859	   o  A Capture showing a zoomed out view of all 6 seats in the room
1860	      can be provided

1862	   The audio and video Captures for this Endpoint can be described as
1863	   follows.

1865	   Video Captures:

1867	   o  VC0- (the camera-left camera stream), encoding group=EG0,
1868	      view=table

1870	   o  VC1- (the center camera stream), encoding group=EG1, view=table

1872	   o  VC2- (the camera-right camera stream), encoding group=EG2,
1873	      view=table

1875	   o  MCC3- (the loudest panel stream), encoding group=EG1,
1876	      view=table, MaxCaptures=1

1878	   o  MCC4- (the loudest panel stream with PiPs), encoding group=EG1,
1879	      view=room, MaxCaptures=3

1881	   o  VC5- (the zoomed out view of all people in the room), encoding
1882	      group=EG1, view=room

1884	   o  VC6- (presentation stream), encoding group=EG1, presentation
1885	   The following diagram is a top view of the room with 3 cameras, 3
1886	   displays, and 6 seats.  Each camera is capturing 2 people.  The
1887	   six seats are not all in a straight line.

1889	      ,-. d
1890	     (   )`--.__        +---+
1891	      `-' /     `--.__  |   |
1892	    ,-.  |            `-.._ |_-+Camera 2 (VC2)
1893	   (   ).'        ___..-+-''`+-+
1894	    `-' |_...---''      |   |
1895	    ,-.c+-..__          +---+
1896	   (   )|     ``--..__  |   |
1897	    `-' |             ``+-..|_-+Camera 1 (VC1)
1898	    ,-. |            __..--'|+-+
1899	   (   )|     __..--'   |   |
1900	    `-'b|..--'          +---+
1901	    ,-. |``---..___     |   |
1902	   (   )\          ```--..._|_-+Camera 0 (VC0)
1903	    `-'  \             _..-''`-+
1904	     ,-. \      __.--'' |   |
1905	    (   ) |..-''        +---+
1906	     `-' a
1907	                    Figure 4: Room Layout

1909	   The two points labeled b and c are intended to be at the midpoint
1910	   between the seating positions, and where the fields of view of the
1911	   cameras intersect.

1913	   The plane of interest for VC0 is a vertical plane that intersects
1914	   points 'a' and 'b'.

1916	   The plane of interest for VC1 intersects points 'b' and 'c'. The
1917	   plane of interest for VC2 intersects points 'c' and 'd'.

1919	   This example uses an area scale of millimeters.

1921	   Areas of capture:

1923	       bottom left    bottom right  top left         top right
1924	   VC0 (-2011,2850,0) (-673,3000,0) (-2011,2850,757) (-673,3000,757)
1925	   VC1 ( -673,3000,0) ( 673,3000,0) ( -673,3000,757) ( 673,3000,757)
1926	   VC2 (  673,3000,0) (2011,2850,0) (  673,3000,757) (2011,3000,757)
1927	   MCC3(-2011,2850,0) (2011,2850,0) (-2011,2850,757) (2011,3000,757)
1928	   MCC4(-2011,2850,0) (2011,2850,0) (-2011,2850,757) (2011,3000,757)
1929	   VC5 (-2011,2850,0) (2011,2850,0) (-2011,2850,757) (2011,3000,757)
1930	   VC6 none

1932	   Points of capture:
1933	   VC0 (-1678,0,800)
1934	   VC1 (0,0,800)
1935	   VC2 (1678,0,800)
1936	   MCC3 none
1937	   MCC4 none
1938	   VC5 (0,0,800)
1939	   VC6 none

1941	   In this example, the right edge of the VC0 area lines up with the
1942	   left edge of the VC1 area.  It doesn't have to be this way.  There
1943	   could be a gap or an overlap.  One additional thing to note for
1944	   this example is the distance from a to b is equal to the distance
1945	   from b to c and the distance from c to d.  All these distances are
1946	   1346 mm. This is the planar width of each area of capture for VC0,
1947	   VC1, and VC2.

1949	   Note the text in parentheses (e.g. "the camera-left camera
1950	   stream") is not explicitly part of the model, it is just
1951	   explanatory text for this example, and is not included in the
1952	   model with the media captures and attributes.  Also, MCC4 doesn't
1953	   say anything about how a capture is composed, so the media
1954	   consumer can't tell based on this capture that MCC4 is composed of
1955	   a "loudest panel with PiPs".

1957	   Audio Captures:

1959	   o  AC0 (camera-left), encoding group=EG3, channel format=mono

1961	   o  AC1 (camera-right), encoding group=EG3, channel format=mono

1963	   o  AC2 (center) encoding group=EG3, channel format=mono

1965	   o  AC3 being a simple pre-mixed audio stream from the room (mono),
1966	      encoding group=EG3, channel format=mono

1968	   o  AC4 audio stream associated with the presentation video (mono)
1969	      encoding group=EG3, presentation, channel format=mono

1971	   Areas of capture:

1973	       bottom left    bottom right  top left         top right

1975	   AC0 (-2011,2850,0) (-673,3000,0) (-2011,2850,757) (-673,3000,757)
1976	   AC1 (  673,3000,0) (2011,2850,0) (  673,3000,757) (2011,3000,757)
1977	   AC2 ( -673,3000,0) ( 673,3000,0) ( -673,3000,757) ( 673,3000,757)
1978	   AC3 (-2011,2850,0) (2011,2850,0) (-2011,2850,757) (2011,3000,757)
1979	   AC4 none

1981	   The physical simultaneity information is:

1983	      Simultaneous transmission set #1 {VC0, VC1, VC2, MCC3, MCC4,
1984	   VC6}

1986	      Simultaneous transmission set #2 {VC0, VC2, VC5, VC6}

1988	   This constraint indicates it is not possible to use all the VCs at
1989	   the same time.  VC5 cannot be used at the same time as VC1 or MCC3
1990	   or MCC4.  Also, using every member in the set simultaneously may
1991	   not make sense - for example MCC3(loudest) and MCC4 (loudest with
1992	   PIP).  (In addition, there are encoding constraints that make
1993	   choosing all of the VCs in a set impossible.  VC1, MCC3, MCC4,
1994	   VC5, VC6 all use EG1 and EG1 has only 3 ENCs.  This constraint
1995	   shows up in the encoding groups, not in the simultaneous
1996	   transmission sets.)
1997	   In this example there are no restrictions on which audio captures
1998	   can be sent simultaneously.

2000	   Encoding Groups:

2002	   This example has three encoding groups associated with the video
2003	   captures.  Each group can have 3 encodings, but with each
2004	   potential encoding having a progressively lower specification.  In
2005	   this example, 1080p60 transmission is possible (as ENC0 has a
2006	   maxPps value compatible with that).  Significantly, as up to 3
2007	   encodings are available per group, it is possible to transmit some
2008	   video captures simultaneously that are not in the same entry in
2009	   the capture scene.  For example VC1 and MCC3 at the same time.

2011	   It is also possible to transmit multiple capture encodings of a
2012	   single video capture.  For example VC0 can be encoded using ENC0
2013	   and ENC1 at the same time, as long as the encoding parameters
2014	   satisfy the constraints of ENC0, ENC1, and EG0, such as one at
2015	   4000000 bps and one at 2000000 bps.

2017	   encodeGroupID=EG0, maxGroupBandwidth=6000000
2018	       encodeID=ENC0, maxWidth=1920, maxHeight=1088, maxFrameRate=60,
2019	                      maxPps=124416000, maxBandwidth=4000000
2020	       encodeID=ENC1, maxWidth=1280, maxHeight=720, maxFrameRate=30,
2021	                      maxPps=27648000, maxBandwidth=4000000
2022	       encodeID=ENC2, maxWidth=960, maxHeight=544, maxFrameRate=30,
2023	                      maxPps=15552000, maxBandwidth=4000000
2024	   encodeGroupID=EG1  maxGroupBandwidth=6000000
2025	       encodeID=ENC3, maxWidth=1920, maxHeight=1088, maxFrameRate=60,
2026	                      maxPps=124416000, maxBandwidth=4000000
2027	       encodeID=ENC4, maxWidth=1280, maxHeight=720, maxFrameRate=30,
2028	                      maxPps=27648000, maxBandwidth=4000000
2029	       encodeID=ENC5, maxWidth=960, maxHeight=544, maxFrameRate=30,
2030	                      maxPps=15552000, maxBandwidth=4000000
2031	   encodeGroupID=EG2  maxGroupBandwidth=6000000
2032	       encodeID=ENC6, maxWidth=1920, maxHeight=1088, maxFrameRate=60,
2033	                      maxPps=124416000, maxBandwidth=4000000
2034	       encodeID=ENC7, maxWidth=1280, maxHeight=720, maxFrameRate=30,
2035	                      maxPps=27648000, maxBandwidth=4000000
2036	       encodeID=ENC8, maxWidth=960, maxHeight=544, maxFrameRate=30,
2037	                      maxPps=15552000, maxBandwidth=4000000

2039	                Figure 5: Example Encoding Groups for Video

2041	   For audio, there are five potential encodings available, so all
2042	   five audio captures can be encoded at the same time.

2044	   encodeGroupID=EG3, maxGroupBandwidth=320000
2045	       encodeID=ENC9, maxBandwidth=64000
2046	       encodeID=ENC10, maxBandwidth=64000
2047	       encodeID=ENC11, maxBandwidth=64000
2048	       encodeID=ENC12, maxBandwidth=64000
2049	       encodeID=ENC13, maxBandwidth=64000

2051	                Figure 6: Example Encoding Group for Audio

2053	   Capture Scenes:

2055	   The following table represents the capture scenes for this
2056	   provider. Recall that a capture scene is composed of alternative
2057	   capture scene entries covering the same spatial region.  Capture
2058	   Scene #1 is for the main people captures, and Capture Scene #2 is
2059	   for presentation.

2061	   Each row in the table is a separate Capture Scene Entry

2063	                           +------------------+
2064	                           | Capture Scene #1 |
2065	                           +------------------+
2066	                           | VC0, VC1, VC2    |
2067	                           | MCC3             |
2068	                           | MCC4             |
2069	                           | VC5              |
2070	                           | AC0, AC1, AC2    |
2071	                           | AC3              |
2072	                           +------------------+

2074	                           +------------------+
2075	                           | Capture Scene #2 |
2076	                           +------------------+
2077	                           | VC6              |
2078	                           | AC4              |
2079	                           +------------------+

2081	                Table 7: Example Capture Scene Entries

2083	   Different capture scenes are unique to each other, non-
2084	   overlapping. A consumer can choose an entry from each capture
2085	   scene.  In this case the three captures VC0, VC1, and VC2 are one
2086	   way of representing the video from the endpoint.  These three
2087	   captures should appear adjacent next to each other.
2088	   Alternatively, another way of representing the Capture Scene is
2089	   with the capture MCC3, which automatically shows the person who is
2090	   talking.  Similarly for the MCC4 and MCC5 alternatives.

2092	   As in the video case, the different entries of audio in Capture
2093	   Scene #1 represent the "same thing", in that one way to receive
2094	   the audio is with the 3 audio captures (AC0, AC1, AC2), and
2095	   another way is with the mixed AC3.  The Media Consumer can choose
2096	   an audio capture entry it is capable of receiving.

2098	   The spatial ordering is understood by the media capture attributes
2099	   Area of Capture and Point of Capture.

2101	   A Media Consumer would likely want to choose a capture scene entry
2102	   to receive based in part on how many streams it can simultaneously
2103	   receive.  A consumer that can receive three people streams would
2104	   probably prefer to receive the first entry of Capture Scene #1
2105	   (VC0, VC1, VC2) and not receive the other entries.  A consumer
2106	   that can receive only one people stream would probably choose one
2107	   of the other entries.

2109	   If the consumer can receive a presentation stream too, it would
2110	   also choose to receive the only entry from Capture Scene #2 (VC6).

2112	12.1.2. Encoding Group Example

2114	   This is an example of an encoding group to illustrate how it can
2115	   express dependencies between encodings.

2117	   encodeGroupID=EG0 maxGroupBandwidth=6000000
2118	       encodeID=VIDENC0, maxWidth=1920, maxHeight=1088,
2119	         maxFrameRate=60, maxPps=62208000, maxBandwidth=4000000
2120	       encodeID=VIDENC1, maxWidth=1920, maxHeight=1088,
2121	         maxFrameRate=60, maxPps=62208000, maxBandwidth=4000000
2122	       encodeID=AUDENC0, maxBandwidth=96000
2123	       encodeID=AUDENC1, maxBandwidth=96000
2124	       encodeID=AUDENC2, maxBandwidth=96000

2126	   Here, the encoding group is EG0.  Although the encoding group is
2127	   capable of transmitting up to 6Mbit/s, no individual video
2128	   encoding can exceed 4Mbit/s.

2130	   This encoding group also allows up to 3 audio encodings, AUDENC<0-
2131	   2>. It is not required that audio and video encodings reside
2132	   within the same encoding group, but if so then the group's overall
2133	   maxBandwidth value is a limit on the sum of all audio and video
2134	   encodings configured by the consumer.  A system that does not wish
2135	   or need to combine bandwidth limitations in this way should
2136	   instead use separate encoding groups for audio and video in order
2137	   for the bandwidth limitations on audio and video to not interact.

2139	   Audio and video can be expressed in separate encoding groups, as
2140	   in this illustration.

2142	   encodeGroupID=EG0 maxGroupBandwidth=6000000
2143	       encodeID=VIDENC0, maxWidth=1920, maxHeight=1088,
2144	         maxFrameRate=60, maxPps=62208000, maxBandwidth=4000000
2145	       encodeID=VIDENC1, maxWidth=1920, maxHeight=1088,
2146	         maxFrameRate=60, maxPps=62208000, maxBandwidth=4000000
2147	   encodeGroupID=EG1 maxGroupBandwidth=500000
2148	       encodeID=AUDENC0, maxBandwidth=96000
2149	       encodeID=AUDENC1, maxBandwidth=96000
2150	       encodeID=AUDENC2, maxBandwidth=96000

2152	12.1.3. The MCU Case

2154	   This section shows how an MCU might express its Capture Scenes,
2155	   intending to offer different choices for consumers that can handle
2156	   different numbers of streams.  A single audio capture stream is
2157	   provided for all single and multi-screen configurations that can
2158	   be associated (e.g. lip-synced) with any combination of video
2159	   captures at the consumer.

2161	        +-----------------------+---------------------------------+
2162	        | Capture Scene #1      |                                 |
2163	        +-----------------------|---------------------------------+
2164	        | VC0                   | VC for a single screen consumer |
2165	        | VC1, VC2              | VCs for a two screen consumer   |
2166	        | VC3, VC4, VC5         | VCs for a three screen consumer |
2167	        | VC6, VC7, VC8, VC9    | VCs for a four screen consumer  |
2168	        | AC0                   | AC representing all participants|
2169	        | CSE(VC0)              |                                 |
2170	        | CSE(VC1,VC2)          |                                 |
2171	        | CSE(VC3,VC4,VC5)      |                                 |
2172	        | CSE(VC6,VC7,VC8,VC9)  |                                 |
2173	        | CSE(AC0)              |                                 |
2174	        +-----------------------+---------------------------------+
2175	                Table 8: MCU main Capture Scenes

2177	   If / when a presentation stream becomes active within the
2178	   conference the MCU might re-advertise the available media as:

2180	        +------------------+--------------------------------------+
2181	        | Capture Scene #2 | note                                 |
2182	        +------------------+--------------------------------------+
2183	        | VC10             | video capture for presentation       |
2184	        | AC1              | presentation audio to accompany VC10 |
2185	        | CSE(VC10)        |                                      |
2186	        | CSE(AC1)         |                                      |
2187	        +------------------+--------------------------------------+

2189	                Table 9: MCU presentation Capture Scene

2191	12.2. Media Consumer Behavior

2193	   This section gives an example of how a Media Consumer might behave
2194	   when deciding how to request streams from the three screen
2195	   endpoint described in the previous section.

2197	   The receive side of a call needs to balance its requirements,
2198	   based on number of screens and speakers, its decoding capabilities
2199	   and available bandwidth, and the provider's capabilities in order
2200	   to optimally configure the provider's streams.  Typically it would
2201	   want to receive and decode media from each Capture Scene
2202	   advertised by the Provider.

2204	   A sane, basic, algorithm might be for the consumer to go through
2205	   each Capture Scene in turn and find the collection of Video
2206	   Captures that best matches the number of screens it has (this
2207	   might include consideration of screens dedicated to presentation
2208	   video display rather than "people" video) and then decide between
2209	   alternative entries in the video Capture Scenes based either on
2210	   hard-coded preferences or user choice.  Once this choice has been
2211	   made, the consumer would then decide how to configure the
2212	   provider's encoding groups in order to make best use of the
2213	   available network bandwidth and its own decoding capabilities.

2215	12.2.1. One screen Media Consumer

2217	   MCC3, MCC4 and VC5 are all different entries by themselves, not
2218	   grouped together in a single entry, so the receiving device should
2219	   choose between one of those.  The choice would come down to
2220	   whether to see the greatest number of participants simultaneously
2221	   at roughly equal precedence (VC5), a switched view of just the
2222	   loudest region (MCC3) or a switched view with PiPs (MCC4).  An
2223	   endpoint device with a small amount of knowledge of these
2224	   differences could offer a dynamic choice of these options, in-
2225	   call, to the user.

2227	12.2.2. Two screen Media Consumer configuring the example

2229	   Mixing systems with an even number of screens, "2n", and those
2230	   with "2n+1" cameras (and vice versa) is always likely to be the
2231	   problematic case.  In this instance, the behavior is likely to be
2232	   determined by whether a "2 screen" system is really a "2 decoder"
2233	   system, i.e., whether only one received stream can be displayed
2234	   per screen or whether more than 2 streams can be received and
2235	   spread across the available screen area.  To enumerate 3 possible
2236	   behaviors here for the 2 screen system when it learns that the far
2237	   end is "ideally" expressed via 3 capture streams:

2239	   1. Fall back to receiving just a single stream (MCC3, MCC4 or VC5
2240	      as per the 1 screen consumer case above) and either leave one
2241	      screen blank or use it for presentation if / when a
2242	      presentation becomes active.

2244	   2. Receive 3 streams (VC0, VC1 and VC2) and display across 2
2245	      screens (either with each capture being scaled to 2/3 of a
2246	      screen and the center capture being split across 2 screens) or,
2247	      as would be necessary if there were large bezels on the
2248	      screens, with each stream being scaled to 1/2 the screen width
2249	      and height and there being a 4th "blank" panel.  This 4th panel
2250	      could potentially be used for any presentation that became
2251	      active during the call.

2253	   3. Receive 3 streams, decode all 3, and use control information
2254	      indicating which was the most active to switch between showing
2255	      the left and center streams (one per screen) and the center and
2256	      right streams.

2258	   For an endpoint capable of all 3 methods of working described
2259	   above, again it might be appropriate to offer the user the choice
2260	   of display mode.

2262	12.2.3. Three screen Media Consumer configuring the example

2264	   This is the most straightforward case - the Media Consumer would
2265	   look to identify a set of streams to receive that best matched its
2266	   available screens and so the VC0 plus VC1 plus VC2 should match
2267	   optimally.  The spatial ordering would give sufficient information
2268	   for the correct video capture to be shown on the correct screen,
2269	   and the consumer would either need to divide a single encoding
2270	   group's capability by 3 to determine what resolution and frame
2271	   rate to configure the provider with or to configure the individual
2272	   video captures' encoding groups with what makes most sense (taking
2273	   into account the receive side decode capabilities, overall call
2274	   bandwidth, the resolution of the screens plus any user preferences
2275	   such as motion vs sharpness).

2277	12.3. Multipoint Conference utilizing Multiple Content Captures

2279	   The use of MCCs allows the MCU to construct outgoing Advertisements
2280	   describing complex and media switching and composition scenarios.
2281	   The following sections provide several examples.

2283	   Note: In the examples the identities of the CLUE elements (e.g.
2284	   Captures, Capture Scene) in the incoming Advertisements overlap.
2285	   This is because there is no co-ordination between the endpoints.
2286	   The MCU is responsible for making these unique in the outgoing
2287	   advertisement.

2289	12.3.1. Single Media Captures and MCC in the same Advertisement

2291	   Four endpoints are involved in a Conference where CLUE is used. An
2292	   MCU acts as a middlebox between the endpoints with a CLUE channel
2293	   between each endpoint and the MCU. The MCU receives the following
2294	   Advertisements.

2296	        +-----------------------+---------------------------------+
2297	        | Capture Scene #1      | Description=AustralianConfRoom  |
2298	        +-----------------------|---------------------------------+
2299	        | VC1                   | Description=Audience            |
2300	        |                       | EncodeGroupID=1                 |
2301	        | CSE(VC1)              |                                 |
2302	        +---------------------------------------------------------+

2304	            Table 10: Advertisement received from Endpoint A

2306	        +-----------------------+---------------------------------+
2307	        | Capture Scene #1      | Description=ChinaConfRoom       |
2308	        +-----------------------|---------------------------------+
2309	        | VC1                   | Description=Speaker             |
2310	        |                       | EncodeGroupID=1                 |
2311	        | VC2                   | Description=Audience            |
2312	        |                       | EncodeGroupID=1                 |
2313	        | CSE(VC1, VC2)         |                                 |
2314	        +---------------------------------------------------------+

2316	            Table 11: Advertisement received from Endpoint B

2318	        +-----------------------+---------------------------------+
2319	        | Capture Scene #1      | Description=USAConfRoom         |
2320	        +-----------------------|---------------------------------+
2321	        | VC1                   | Description=Audience            |
2322	        |                       | EncodeGroupID=1                 |
2323	        | CSE(VC1)              |                                 |
2324	        +---------------------------------------------------------+

2326	            Table 12: Advertisement received from Endpoint C

2328	   Note: Endpoint B above indicates that it sends two streams.

2330	   If the MCU wanted to provide a Multiple Content Capture containing
2331	   a round robin switched view of the audience from the 3 endpoints
2332	   and the speaker it could construct the following advertisement:

2334	   Advertisement sent to Endpoint F

2336	        +=======================+=================================+
2337	        | Capture Scene #1      | Description=AustralianConfRoom  |
2338	        +-----------------------|---------------------------------+
2339	        | VC1                   | Description=Audience            |
2340	        | CSE(VC1)              |                                 |
2341	        +=======================+=================================+
2342	        | Capture Scene #2      | Description=ChinaConfRoom       |
2343	        +-----------------------|---------------------------------+
2344	        | VC2                   | Description=Speaker             |
2345	        | VC3                   | Description=Audience            |
2346	        | CSE(VC2, VC3)         |                                 |
2347	        +=======================+=================================+
2348	        | Capture Scene #3      | Description=USAConfRoom         |
2349	        +-----------------------|---------------------------------+
2350	        | VC4                   | Description=Audience            |
2351	        | CSE(VC4)              |                                 |
2352	        +=======================+=================================+
2353	        | Capture Scene #4      |                                 |
2354	        +-----------------------|---------------------------------+
2355	        | MCC1(VC1,VC2,VC3,VC4) | Policy=RoundRobin:1             |
2356	        |                       | MaxCaptures=1                   |
2357	        |                       | EncodingGroup=1                 |
2358	        | CSE(MCC1)             |                                 |
2359	        +=======================+=================================+

2361	         Table 13: Advertisement sent to Endpoint F - One Encoding

2363	   Alternatively if the MCU wanted to provide the speaker as one media
2364	   stream and the audiences as another it could assign an encoding
2365	   group to VC2 in Capture Scene 2 and provide a CSE in Capture Scene
2366	   #4 as per the example below.

2368	   Advertisement sent to Endpoint F

2370	        +=======================+=================================+
2371	        | Capture Scene #1      | Description=AustralianConfRoom  |
2372	        +-----------------------|---------------------------------+
2373	        | VC1                   | Description=Audience            |
2374	        | CSE(VC1)              |                                 |
2375	        +=======================+=================================+
2376	        | Capture Scene #2      | Description=ChinaConfRoom       |
2377	        +-----------------------|---------------------------------+
2378	        | VC2                   | Description=Speaker             |
2379	        |                       | EncodingGroup=1                 |
2380	        | VC3                   | Description=Audience            |
2381	        | CSE(VC2, VC3)         |                                 |
2382	        +=======================+=================================+
2383	        | Capture Scene #3      | Description=USAConfRoom         |
2384	        +-----------------------|---------------------------------+
2385	        | VC4                   | Description=Audience            |
2386	        | CSE(VC4)              |                                 |
2387	        +=======================+=================================+
2388	        | Capture Scene #4      |                                 |
2389	        +-----------------------|---------------------------------+
2390	        | MCC1(VC1,VC3,VC4)     | Policy=RoundRobin:1             |
2391	        |                       | MaxCaptures=1                   |
2392	        |                       | EncodingGroup=1                 |
2393	        | MCC2(VC2)             | MaxCaptures=1                   |
2394	        |                       | EncodingGroup=1                 |
2395	        | CSE2(MCC1,MCC2)       |                                 |
2396	        +=======================+=================================+

2398	        Table 14: Advertisement sent to Endpoint F - Two Encodings

2400	   Therefore a Consumer could choose whether or not to have a separate
2401	   speaker related stream and could choose which endpoints to see.  If
2402	   it wanted the second stream but not the Australian conference room
2403	   it could indicate the following captures in the Configure message:

2405	        +-----------------------+---------------------------------+
2406	        | MCC1(VC3,VC4)         | Encoding                        |
2407	        | VC2                   | Encoding                        |
2408	        +-----------------------|---------------------------------+
2409	                      Table 15: MCU case: Consumer Response

2411	12.3.2. Several MCCs in the same Advertisement

2413	   Multiple MCCs can be used where multiple streams are used to carry
2414	   media from multiple endpoints.  For example:

2416	   A conference has three endpoints D, E and F. Each end point has
2417	   three video captures covering the left, middle and right regions of
2418	   each conference room.  The MCU receives the following
2419	   advertisements from D and E.

2421	        +-----------------------+---------------------------------+
2422	        | Capture Scene #1      | Description=AustralianConfRoom  |
2423	        +-----------------------|---------------------------------+
2424	        | VC1                   | CaptureArea=Left                |
2425	        |                       | EncodingGroup=1                 |
2426	        | VC2                   | CaptureArea=Centre              |
2427	        |                       | EncodingGroup=1                 |
2428	        | VC3                   | CaptureArea=Right               |
2429	        |                       | EncodingGroup=1                 |
2430	        | CSE(VC1,VC2,VC3)      |                                 |
2431	        +---------------------------------------------------------+

2433	            Table 16: Advertisement received from Endpoint D

2435	        +-----------------------+---------------------------------+
2436	        | Capture Scene #1      | Description=ChinaConfRoom       |
2437	        +-----------------------|---------------------------------+
2438	        | VC1                   | CaptureArea=Left                |
2439	        |                       | EncodingGroup=1                 |
2440	        | VC2                   | CaptureArea=Centre              |
2441	        |                       | EncodingGroup=1                 |
2442	        | VC3                   | CaptureArea=Right               |
2443	        |                       | EncodingGroup=1                 |
2444	        | CSE(VC1,VC2,VC3)      |                                 |
2445	        +---------------------------------------------------------+

2447	            Table 17: Advertisement received from Endpoint E

2449	   The MCU wants to offer Endpoint F three Capture Encodings.  Each
2450	   Capture Encoding would contain all the Captures from either
2451	   Endpoint D or Endpoint E depending based on the active speaker.
2452	   The MCU sends the following Advertisement:

2454	        +=======================+=================================+
2455	        | Capture Scene #1      | Description=AustralianConfRoom  |
2456	        +-----------------------|---------------------------------+
2457	        | VC1                   |                                 |
2458	        | VC2                   |                                 |
2459	        | VC3                   |                                 |
2460	        | CSE(VC1,VC2,VC3)      |                                 |
2461	        +=======================+=================================+
2462	        | Capture Scene #2      | Description=ChinaConfRoom       |
2463	        +-----------------------|---------------------------------+
2464	        | VC4                   |                                 |
2465	        | VC5                   |                                 |
2466	        | VC6                   |                                 |
2467	        | CSE(VC4,VC5,VC6)      |                                 |
2468	        +=======================+=================================+
2469	        | Capture Scene #3      |                                 |
2470	        +-----------------------|---------------------------------+
2471	        | MCC1(VC1,VC4)         | CaptureArea=Left                |
2472	        |                       | MaxCaptures=1                   |
2473	        |                       | SynchronisationID=1             |
2474	        |                       | EncodingGroup=1                 |
2475	        | MCC2(VC2,VC5)         | CaptureArea=Centre              |
2476	        |                       | MaxCaptures=1                   |
2477	        |                       | SynchronisationID=1             |
2478	        |                       | EncodingGroup=1                 |
2479	        | MCC3(VC3,VC6)         | CaptureArea=Right               |
2480	        |                       | MaxCaptures=1                   |
2481	        |                       | SynchronisationID=1             |
2482	        |                       | EncodingGroup=1                 |
2483	        | CSE(MCC1,MCC2,MCC3)   |                                 |
2484	        +=======================+=================================+
2485	            Table 17: Advertisement received from Endpoint E

2487	12.3.3. Heterogeneous conference with switching and composition

2489	   Consider a conference between endpoints with the following
2490	   characteristics:

2492	      Endpoint A - 4 screens, 3 cameras

2494	      Endpoint B - 3 screens, 3 cameras

2496	      Endpoint C - 3 screens, 3 cameras

2498	      Endpoint D - 3 screens, 3 cameras

2500	      Endpoint E - 1 screen, 1 camera

2502	      Endpoint F - 2 screens, 1 camera

2504	      Endpoint G - 1 screen, 1 camera

2506	   This example focuses on what the user in one of the 3-camera multi-
2507	   screen endpoints sees.  Call this person User A, at Endpoint A.
2508	   There are 4 large display screens at Endpoint A. Whenever somebody
2509	   at another site is speaking, all the video captures from that
2510	   endpoint are shown on the large screens.  If the talker is at a 3-
2511	   camera site, then the video from those 3 cameras fills 3 of the
2512	   screens.  If the talker is at a single-camera site, then video from
2513	   that camera fills one of the screens, while the other screens show
2514	   video from other single-camera endpoints.

2516	   User A hears audio from the 4 loudest talkers.

2518	   User A can also see video from other endpoints, in addition to the
2519	   current talker, although much smaller in size.  Endpoint A has 4
2520	   screens, so one of those screens shows up to 9 other Media Captures
2521	   in a tiled fashion.  When video from a 3 camera endpoint appears in
2522	   the tiled area, video from all 3 cameras appears together across
2523	   the screen with correct spatial relationship among those 3 images.

2525	      +---+---+---+ +-------------+ +-------------+ +-------------+
2526	      |   |   |   | |             | |             | |             |
2527	      +---+---+---+ |             | |             | |             |
2528	      |   |   |   | |             | |             | |             |
2529	      +---+---+---+ |             | |             | |             |
2530	      |   |   |   | |             | |             | |             |
2531	      +---+---+---+ +-------------+ +-------------+ +-------------+
2532	                     Figure 7: Endpoint A - 4 Screen Display

2534	   User B at Endpoint B sees a similar arrangement, except there are
2535	   only 3 screens, so the 9 other Media Captures are spread out across
2536	   the bottom of the 3 displays, in a picture-in-picture (PIP) format.
2537	   When video from a 3 camera endpoint appears in the PIP area, video
2538	   from all 3 cameras appears together across a single screen with
2539	   correct spatial relationship.

2541	              +-------------+ +-------------+ +-------------+
2542	              |             | |             | |             |
2543	              |             | |             | |             |
2544	              |             | |             | |             |
2545	              | +-+ +-+ +-+ | | +-+ +-+ +-+ | | +-+ +-+ +-+ |
2546	              | +-+ +-+ +-+ | | +-+ +-+ +-+ | | +-+ +-+ +-+ |
2547	              +-------------+ +-------------+ +-------------+
2548	                Figure 8: Endpoint B - 3 Screen Display with PiPs

2550	   When somebody at a different endpoint becomes the current talker,
2551	   then User A and User B both see the video from the new talker
2552	   appear on their large screen area, while the previous talker takes
2553	   one of the smaller tiled or PIP areas.  The person who is the
2554	   current talker doesn't see themselves; they see the previous talker
2555	   in their large screen area.

2557	   One of the points of this example is that endpoints A and B each
2558	   want to receive 3 capture encodings for their large display areas,
2559	   and 9 encodings for their smaller areas.  A and B are be able to
2560	   each send the same Configure message to the MCU, and each receive
2561	   the same conceptual Media Captures from the MCU.  The differences
2562	   are in how they are rendered and are purely a local matter at A and
2563	   B.

2565	   The Advertisements for such a scenario are described below.

2567	        +-----------------------+---------------------------------+
2568	        | Capture Scene #1      | Description=Endpoint x          |
2569	        +-----------------------|---------------------------------+
2570	        | VC1                   | EncodingGroup=1                 |
2571	        | VC2                   | EncodingGroup=1                 |
2572	        | VC3                   | EncodingGroup=1                 |
2573	        | AC1                   | EncodingGroup=2                 |
2574	        | CSE1(VC1, VC2, VC3)   |                                 |
2575	        | CSE2(AC1)             |                                 |
2576	        +---------------------------------------------------------+

2578	   Table 19: Advertisement received at the MCU from Endpoints A to D

2580	        +-----------------------+---------------------------------+
2581	        | Capture Scene #1      | Description=Endpoint y          |
2582	        +-----------------------|---------------------------------+
2583	        | VC1                   | EncodingGroup=1                 |
2584	        | AC1                   | EncodingGroup=2                 |
2585	        | CSE1(VC1)             |                                 |
2586	        | CSE2(AC1)             |                                 |
2587	        +---------------------------------------------------------+

2589	   Table 20: Advertisement received at the MCU from Endpoints E to F

2591	   Rather than considering what is displayed the CLUE concentrates
2592	   more on what the MCU sends. The MCU doesn't know anything about
2593	   the number of screens an endpoint has.

2595	   As Endpoints A to D each advertise that three Captures make up a
2596	   Capture Scene, the MCU offers these in a "site" switching mode.
2597	   That is that there are three Multiple Content Captures (and
2598	   Capture Encodings) each switching between Endpoints. The MCU
2599	   switches in the applicable media into the stream based on voice
2600	   activity. Endpoint A will not see a capture from itself.

2602	   Using the MCC concept the MCU would send the following
2603	   Advertisement to endpoint A:

2605	        +=======================+=================================+
2606	        | Capture Scene #1      | Description=Endpoint B          |
2607	        +-----------------------|---------------------------------+
2608	        | VC4                   | Left                            |
2609	        | VC5                   | Center                          |
2610	        | VC6                   | Right                           |
2611	        | AC1                   |                                 |
2612	        | CSE(VC4,VC5,VC6)      |                                 |
2613	        | CSE(AC1)              |                                 |
2614	        +=======================+=================================+
2615	        | Capture Scene #2      | Description=Endpoint C          |
2616	        +-----------------------|---------------------------------+
2617	        | VC7                   | Left                            |
2618	        | VC8                   | Center                          |
2619	        | VC9                   | Right                           |
2620	        | AC2                   |                                 |
2621	        | CSE(VC7,VC8,VC9)      |                                 |
2622	        | CSE(AC2)              |                                 |
2623	        +=======================+=================================+
2624	        | Capture Scene #3      | Description=Endpoint D          |
2625	        +-----------------------|---------------------------------+
2626	        | VC10                  | Left                            |
2627	        | VC11                  | Center                          |
2628	        | VC12                  | Right                           |
2629	        | AC3                   |                                 |
2630	        | CSE(VC10,VC11,VC12)   |                                 |
2631	        | CSE(AC3)              |                                 |
2632	        +=======================+=================================+
2633	        | Capture Scene #4      | Description=Endpoint E          |
2634	        +-----------------------|---------------------------------+
2635	        | VC13                  |                                 |
2636	        | AC4                   |                                 |
2637	        | CSE(VC13)             |                                 |
2638	        | CSE(AC4)              |                                 |
2639	        +=======================+=================================+
2640	        | Capture Scene #5      | Description=Endpoint F          |
2641	        +-----------------------|---------------------------------+
2642	        | VC14                  |                                 |
2643	        | AC5                   |                                 |
2644	        | CSE(VC14)             |                                 |
2645	        | CSE(AC5)              |                                 |
2646	        +=======================+=================================+
2647	        | Capture Scene #6      | Description=Endpoint G          |
2648	        +-----------------------|---------------------------------+
2649	        | VC15                  |                                 |
2650	        | AC6                   |                                 |
2651	        | CSE(VC15)             |                                 |
2652	        | CSE(AC6)              |                                 |
2653	        +=======================+=================================+

2655	         Table 21: Advertisement sent to endpoint A - Source Part

2657	   The above part of the Advertisement presents information about the
2658	   sources to the MCC. The information is effectively the same as the
2659	   received Advertisements except that there are no Capture Encodings
2660	   associated with them and the identities have been re-numbered.

2662	   In addition to the source Capture information the MCU advertises
2663	   "site" switching of Endpoints B to G in three streams.

2665	        +=======================+=================================+
2666	        | Capture Scene #7      | Description=Output3streammix    |
2667	        +-----------------------|---------------------------------+
2668	        | MCC1(VC4,VC7,VC10,    | CaptureArea=Left                |
2669	        |      VC13)            | MaxCaptures=1                   |
2670	        |                       | SynchronisationID=1             |
2671	        |                       | Policy=SoundLevel:0             |
2672	        |                       | EncodingGroup=1                 |
2673	        |                       |                                 |
2674	        | MCC2(VC5,VC8,VC11,    | CaptureArea=Center              |
2675	        |      VC14)            | MaxCaptures=1                   |
2676	        |                       | SynchronisationID=1             |
2677	        |                       | Policy=SoundLevel:0             |
2678	        |                       | EncodingGroup=1                 |
2679	        |                       |                                 |
2680	        | MCC3(VC6,VC9,VC12,    | CaptureArea=Right               |
2681	        |      VC15)            | MaxCaptures=1                   |
2682	        |                       | SynchronisationID=1             |
2683	        |                       | Policy=SoundLevel:0             |
2684	        |                       | EncodingGroup=1                 |
2685	        |                       |                                 |
2686	        | MCC4() (for audio)    | CaptureArea=whole scene         |
2687	        |                       | MaxCaptures=1                   |
2688	        |                       | Policy=SoundLevel:0             |
2689	        |                       | EncodingGroup=2                 |
2690	        |                       |                                 |
2691	        | MCC5() (for audio)    | CaptureArea=whole scene         |
2692	        |                       | MaxCaptures=1                   |
2693	        |                       | Policy=SoundLevel:1             |
2694	        |                       | EncodingGroup=2                 |
2695	        |                       |                                 |
2696	        | MCC6() (for audio)    | CaptureArea=whole scene         |
2697	        |                       | MaxCaptures=1                   |
2698	        |                       | Policy=SoundLevel:2             |
2699	        |                       | EncodingGroup=2                 |
2700	        |                       |                                 |
2701	        | MCC7() (for audio)    | CaptureArea=whole scene         |
2702	        |                       | MaxCaptures=1                   |
2703	        |                       | Policy=SoundLevel:3             |
2704	        |                       | EncodingGroup=2                 |
2705	        |                       |                                 |
2706	        | CSE(MCC1,MCC2,MCC3)   |                                 |
2707	        | CSE(MCC4,MCC5,MCC6,   |                                 |
2708	        |     MCC7)             |                                 |
2709	        +=======================+=================================+

2711	       Table 22: Advertisement send to endpoint A - switching part

2713	   The above part describes the switched 3 main streams that relate to
2714	   site switching. MaxCaptures=1 indicates that only one Capture from
2715	   the MCC is sent at a particular time. SynchronisationID=1 indicates
2716	   that the source sending is synchronised. The provider can choose to
2717	   group together VC13, VC14, and VC15 for the purpose of switching
2718	   according to the SynchronisationID.  Therefore when the provider
2719	   switches one of them into an MCC, it can also switch the others
2720	   even though they are not part of the same Capture Scene.

2722	   All the audio for the conference is included in this Scene #7.
2723	   There isn't necessarily a one to one relation between any audio
2724	   capture and video capture in this scene.  Typically a change in
2725	   loudest talker will cause the MCU to switch the audio streams more
2726	   quickly than switching video streams.

2728	   The MCU can also supply nine media streams showing the active and
2729	   previous eight speakers. It includes the following in the
2730	   Advertisement:

2732	        +=======================+=================================+
2733	        | Capture Scene #8      | Description=Output9stream       |
2734	        +-----------------------|---------------------------------+
2735	        | MCC8(VC4,VC5,VC6,VC7, | MaxCaptures=1                   |
2736	        |   VC8,VC9,VC10,VC11,  | Policy=SoundLevel:0             |
2737	        |   VC12,VC13,VC14,VC15)| EncodingGroup=1                 |
2738	        |                       |                                 |
2739	        | MCC9(VC4,VC5,VC6,VC7, | MaxCaptures=1                   |
2740	        |   VC8,VC9,VC10,VC11,  | Policy=SoundLevel:1             |
2741	        |   VC12,VC13,VC14,VC15)| EncodingGroup=1                 |
2742	        |                       |                                 |
2743	                    to                           to               |
2744	        |                       |                                 |
2745	        | MCC16(VC4,VC5,VC6,VC7,| MaxCaptures=1                   |
2746	        |   VC8,VC9,VC10,VC11,  | Policy=SoundLevel:8             |
2747	        |   VC12,VC13,VC14,VC15)| EncodingGroup=1                 |
2748	        |                       |                                 |
2749	        | CSE(MCC8,MCC9,MCC10,  |                                 |
2750	        |     MCC11,MCC12,MCC13,|                                 |
2751	        |     MCC14,MCC15,MCC16)|                                 |
2752	        +=======================+=================================+

2754	       Table 23: Advertisement sent to endpoint A - 9 switched part

2756	   The above part indicates that there are 9 capture encodings. Each
2757	   of the Capture Encodings may contain any captures from any source
2758	   site with a maximum of one Capture at a time. Which Capture is
2759	   present is determined by the policy.  The MCCs in this scene do not
2760	   have any spatial attributes.

2762	   Note: The Provider alternatively could provide each of the MCCs
2763	   above in its own Capture Scene.

2765	   If the MCU wanted to provide a composed Capture Encoding containing
2766	   all of the 9 captures it could Advertise in addition:

2768	        +=======================+=================================+
2769	        | Capture Scene #9      | Description=NineTiles           |
2770	        +-----------------------|---------------------------------+
2771	        | MCC13(MCC8,MCC9,MCC10,| MaxCaptures=9                   |
2772	        |     MCC11,MCC12,MCC13,| EncodingGroup=1                 |
2773	        |     MCC14,MCC15,MCC16)|                                 |
2774	        |                       |                                 |
2775	        | CSE(MCC13)            |                                 |
2776	        +=======================+=================================+

2778	      Table 24: Advertisement sent to endpoint A - 9 composed part

2780	   As MaxCaptures is 9 it indicates that the capture encoding contains
2781	   information from 9 sources at a time.

2783	   The Advertisement to Endpoint B is identical to the above other
2784	   than the captures from Endpoint A would be added and the captures
2785	   from Endpoint B would be removed. Whether the Captures are rendered
2786	   on a four screen display or a three screen display is up to the
2787	   Consumer to determine.  The Consumer wants to place video captures
2788	   from the same original source endpoint together, in the correct
2789	   spatial order, but the MCCs do not have spatial attributes.  So the
2790	   Consumer needs to associate incoming media packets with the
2791	   original individual captures in the advertisement (such as VC4,
2792	   VC5, and VC6) in order to know the spatial information it needs for
2793	   correct placement on the screens.

2795	   Editor's note: this is an open issue, about how to associate
2796	   incoming packets with the original capture that is a constituent of
2797	   an MCC.  This document probably should mention it in an earlier
2798	   section, after the solution is worked out in the other CLUE
2799	   documents.

2801	12.3.4. Heterogeneous conference with voice activated switching

2803	   This example illustrates how multipoint "voice activated switching"
2804	   behavior can be realized, with an endpoint making its own decision
2805	   about which of its outgoing video streams is considered the "active
2806	   talker" from that endpoint.  Then an MCU can decide which is the
2807	   active talker among the whole conference.

2809	   Consider a conference between endpoints with the following
2810	   characteristics:

2812	      Endpoint A - 3 screens, 3 cameras

2814	      Endpoint B - 3 screens, 3 cameras

2816	      Endpoint C - 1 screen, 1 camera

2818	   This example focuses on what the user at endpoint C sees.  The
2819	   user would like to see the video capture of the current talker,
2820	   without composing it with any other video capture.  In this
2821	   example endpoint C is capable of receiving only a single video
2822	   stream.  The following tables describe advertisements from A and B
2823	   to the MCU, and from the MCU to C, that can be used to accomplish
2824	   this.

2826	        +-----------------------+---------------------------------+
2827	        | Capture Scene #1      | Description=Endpoint x          |
2828	        +-----------------------|---------------------------------+
2829	        | VC1                   | CaptureArea=Left                |
2830	        |                       | EncodingGroup=1                 |
2831	        | VC2                   | CaptureArea=Center              |
2832	        |                       | EncodingGroup=1                 |
2833	        | VC3                   | CaptureArea=Right               |
2834	        |                       | EncodingGroup=1                 |
2835	        | MCC1(VC1,VC2,VC3)     | MaxCaptures=1                   |
2836	        |                       | CaptureArea=whole scene         |
2837	        |                       | Policy=SoundLevel:0             |
2838	        |                       | EncodingGroup=1                 |
2839	        | AC1                   | CaptureArea=whole scene         |
2840	        |                       | EncodingGroup=2                 |
2841	        | CSE1(VC1, VC2, VC3)   |                                 |
2842	        | CSE2(MCC1)            |                                 |
2843	        | CSE3(AC1)             |                                 |
2844	        +---------------------------------------------------------+

2846	   Table 25: Advertisement received at the MCU from Endpoints A and B

2848	   Endpoints A and B are advertising each individual video capture,
2849	   and also a switched capture MCC1 which switches between the other
2850	   three based on who is the active talker.  These endpoints do not
2851	   advertise distinct audio captures associated with each individual
2852	   video capture, so it would be impossible for the MCU (as a media
2853	   consumer) to make its own determination of which video capture is
2854	   the active talker based just on information in the audio streams.

2856	        +-----------------------+---------------------------------+
2857	        | Capture Scene #1      | Description=conference          |
2858	        +-----------------------|---------------------------------+
2859	        | MCC1()                | CaptureArea=Left                |
2860	        |                       | MaxCaptures=1                   |
2861	        |                       | SynchronisationID=1             |
2862	        |                       | Policy=SoundLevel:0             |
2863	        |                       | EncodingGroup=1                 |
2864	        |                       |                                 |
2865	        | MCC2()                | CaptureArea=Center              |
2866	        |                       | MaxCaptures=1                   |
2867	        |                       | SynchronisationID=1             |
2868	        |                       | Policy=SoundLevel:0             |
2869	        |                       | EncodingGroup=1                 |
2870	        |                       |                                 |
2871	        | MCC3()                | CaptureArea=Right               |
2872	        |                       | MaxCaptures=1                   |
2873	        |                       | SynchronisationID=1             |
2874	        |                       | Policy=SoundLevel:0             |
2875	        |                       | EncodingGroup=1                 |
2876	        |                       |                                 |
2877	        | MCC4()                | CaptureArea=whole scene         |
2878	        |                       | MaxCaptures=1                   |
2879	        |                       | Policy=SoundLevel:0             |
2880	        |                       | EncodingGroup=1                 |
2881	        |                       |                                 |
2882	        | MCC5() (for audio)    | CaptureArea=whole scene         |
2883	        |                       | MaxCaptures=1                   |
2884	        |                       | Policy=SoundLevel:0             |
2885	        |                       | EncodingGroup=2                 |
2886	        |                       |                                 |
2887	        | MCC6() (for audio)    | CaptureArea=whole scene         |
2888	        |                       | MaxCaptures=1                   |
2889	        |                       | Policy=SoundLevel:1             |
2890	        |                       | EncodingGroup=2                 |
2891	        | CSE1(MCC1,MCC2,MCC3   |                                 |
2892	        | CSE2(MCC4)            |                                 |
2893	        | CSE3(MCC5,MCC6)       |                                 |
2894	        +---------------------------------------------------------+

2896	            Table 26: Advertisement sent from the MCU to C

2898	   The MCU advertises one scene, with four video MCCs.  Three of them
2899	   in CSE1 give a left, center, right view of the conference, with
2900	   "site switching". MCC4 provides a single video capture
2901	   representing a view of the whole conference.  The MCU intends for
2902	   MCC4 to be switched between all the other original source
2903	   captures.  In this example advertisement the MCU is not giving all
2904	   the information about all the other endpoints' scenes and which of
2905	   those captures is included in the MCCs.  The MCU could include all
2906	   that information if it wants to give the consumers more
2907	   information, but it is not necessary for this example scenario.

2909	   The Provider advertises MCC5 and MCC6 for audio.  Both are
2910	   switched captures, with different SoundLevel policies indicating
2911	   they are the top two dominant talkers.  The Provider advertises
2912	   CSE3 with both MCCs, suggesting the Consumer should use both if it
2913	   can.

2915	   Endpoint C, in its configure message to the MCU, requests to
2916	   receive MCC4 for video, and MCC5 and MCC6 for audio.  In order for
2917	   the MCU to get the information it needs to construct MCC4, it has
2918	   to send configure messages to A and B asking to receive MCC1 from
2919	   each of them, along with their AC1 audio.  Now the MCU can use
2920	   audio energy information from the two incoming audio streams from
2921	   A and B to determine which of those alternatives is the current
2922	   talker.  Based on that, the MCU uses either MCC1 from A or MCC1
2923	   from B as the source of MCC4 to send to C.

2925	13. Acknowledgements

2927	   Allyn Romanow and Brian Baldino were authors of early versions.
2928	   Mark Gorzyinski contributed much to the approach.  We want to
2929	   thank Stephen Botzko for helpful discussions on audio.

2931	14. IANA Considerations

2933	   None.

2935	15. Security Considerations

2937	   There are several potential attacks related to telepresence, and
2938	   specifically the protocols used by CLUE, in the case of
2939	   conferencing sessions, due to the natural involvement of multiple
2940	   endpoints and the many, often user-invoked, capabilities provided
2941	   by the systems.

2943	   A middle box involved in a CLUE session can experience many of the
2944	   same attacks as that of a conferencing system such as that enabled
2945	   by the XCON framework [RFC 6503]. Examples of attacks include the
2946	   following: an endpoint attempting to listen to sessions in which
2947	   it is not authorized to participate, an endpoint attempting to
2948	   disconnect or mute other users, and theft of service by an
2949	   endpoint in attempting to create telepresence sessions it is not
2950	   allowed to create. Thus, it is RECOMMENDED that a middle box
2951	   implementing the protocols necessary to support CLUE, follow the
2952	   security recommendations specified in the conference control
2953	   protocol documents.  In the case of CLUE, SIP is the default
2954	   conferencing protocol, thus the security considerations in RFC
2955	   4579 MUST be followed.

2957	   One primary security concern, surrounding the CLUE framework
2958	   introduced in this document, involves securing the actual
2959	   protocols and the associated authorization mechanisms.  These
2960	   concerns apply to endpoint to endpoint sessions, as well as
2961	   sessions involving multiple endpoints and middle boxes. Figure 2
2962	   in section 5 provides a basic flow of information exchange for
2963	   CLUE and the protocols involved.

2965	   As described in section 5, CLUE uses SIP/SDP to establish the
2966	   session prior to exchanging any CLUE specific information. Thus
2967	   the security mechanisms recommended for SIP [RFC 3261], including
2968	   user authentication and authorization, SHOULD be followed. In
2969	   addition, the media is based on RTP and thus existing RTP security
2970	   mechanisms, such as DTLS/SRTP, MUST be supported.

2972	   A separate data channel is established to transport the CLUE
2973	   protocol messages. The contents of the CLUE protocol messages are
2974	   based on information introduced in this document, which is
2975	   represented by an XML schema for this information defined in the
2976	   CLUE data model [ref]. Some of the information which could
2977	   possibly introduce privacy concerns is the xCard information as
2978	   described in section x.  In addition, the (text) description field
2979	   in the Media Capture attribute (section 7.1.1.7) could possibly
2980	   reveal sensitive information or specific identities. The same
2981	   would be true for the descriptions in the Capture Scene (section
2982	   7.3.1) and Capture Scene Entry (7.3.2) attributes.   One other
2983	   important consideration for the information in the xCard as well
2984	   as the description field in the Media Capture and Capture Scene
2985	   Entry attributes is that while the endpoints involved in the
2986	   session have been authenticated, there is no assurance that the
2987	   information in the xCard or description fields is authentic.
2988	   Thus, this information SHOULD not be used to make any
2989	   authorization decisions and the participants in the sessions
2990	   SHOULD be made aware of this.

2992	   While other information in the CLUE protocol messages does not
2993	   reveal specific identities, it can reveal characteristics and
2994	   capabilities of the endpoints.  That information could possibly
2995	   uniquely identify specific endpoints.  It might also be possible
2996	   for an attacker to manipulate the information and disrupt the CLUE
2997	   sessions.  It would also be possible to mount a DoS attack on the
2998	   CLUE endpoints if a malicious agent has access to the data
2999	   channel.  Thus, It MUST be possible for the endpoints to establish
3000	   a channel which is secure against both message recovery and
3001	   message modification. Further details on this are provided in the
3002	   CLUE data channel solution document.

3004	   There are also security issues associated with the authorization
3005	   to perform actions at the CLUE endpoints to invoke specific
3006	   capabilities (e.g., re-arranging screens, sharing content, etc.).
3007	   However, the policies and security associated with these actions
3008	   are outside the scope of this document and the overall CLUE
3009	   solution.

3011	16. Changes Since Last Version

3013	   NOTE TO THE RFC-Editor: Please remove this section prior to
3014	   publication as an RFC.

3016	   Changes from 14 to 15:

3018	     1. Add "=" and "<=" qualifiers to MaxCaptures attribute, and
3019	        clarify the meaning regarding switched and composed MCC.

3021	     2. Add section 7.3.3 Global Capture Scene Entry List, and a few
3022	        other sentences elsewhere that refer to global CSE sets.

3024	     3. Clarify: The Provider MUST be capable of encoding and sending
3025	        all Captures (*that have an encoding group*) in a single
3026	        Capture Scene Entry simultaneously.

3028	     4. Add voice activated switching example in section 12.

3030	     5. Change name of attributes Participant Info/Type to Person
3031	        Info/Type.

3033	     6. Clarify the Person Info/Type attributes have the same meaning
3034	        regardless of whether or not the capture has a Presentation
3035	        attribute.

3037	     7. Update example section 12.1 to be consistent with the rest of
3038	        the document, regarding MCC and capture attributes.

3040	     8. State explicitly each CSE has a unique ID.

3042	   Changes from 13 to 14:

3044	     1. Fill in section for Security Considerations.

3046	     2. Replace Role placeholder with Participant Information,
3047	        Participant Type, and Scene Information attributes.

3049	     3. Spatial information implies nothing about how constituent
3050	        media captures are combined into a composed MCC.

3052	     4. Clean up MCC example in Section 12.3.3.  Clarify behavior of
3053	        tiled and PIP display windows.  Add audio.  Add new open
3054	        issue about associating incoming packets to original source
3055	        capture.

3057	     5. Remove editor's note and associated statement about RTP
3058	        multiplexing at end of section 5.

3060	     6. Remove editor's note and associated paragraph about
3061	        overloading media channel with both CLUE and non-CLUE usage,
3062	        in section 5.

3064	     7. In section 10, clarify intent of media encodings conforming
3065	        to SDP, even with multiple CLUE message exchanges.  Remove
3066	        associated editor's note.

3068	   Changes from 12 to 13:

3070	     1. Added the MCC concept including updates to existing sections
3071	        to incorporate the MCC concept. New MCC attributes:
3072	        MaxCaptures, SynchronisationID and Policy.

3074	     2. Removed the "composed" and "switched" Capture attributes due
3075	        to overlap with the MCC concept.

3077	     3. Removed the "Scene-switch-policy" CSE attribute, replaced by
3078	        MCC and SynchronisationID.

3080	     4. Editorial enhancements including numbering of the Capture
3081	        attribute sections, tables, figures etc.

3083	   Changes from 11 to 12:

3085	     1. Ticket #44. Remove note questioning about requiring a
3086	        Consumer to send a Configure after receiving Advertisement.

3088	     2. Ticket #43. Remove ability for consumer to choose value of
3089	        attribute for scene-switch-policy.

3091	     3. Ticket #36. Remove computational complexity parameter,
3092	        MaxGroupPps, from Encoding Groups.

3094	     4. Reword the Abstract and parts of sections 1 and 4 (now 5)
3095	        based on Mary's suggestions as discussed on the list.  Move
3096	        part of the Introduction into a new section Overview &
3097	        Motivation.

3099	     5. Add diagram of an Advertisement, in the Overview of the
3100	        Framework/Model section.

3102	     6. Change Intended Status to Standards Track.

3104	     7. Clean up RFC2119 keyword language.

3106	   Changes from 10 to 11:

3108	     1. Add description attribute to Media Capture and Capture Scene
3109	        Entry.

3111	     2. Remove contradiction and change the note about open issue
3112	        regarding always responding to Advertisement with a Configure
3113	        message.

3115	     3. Update example section, to cleanup formatting and make the
3116	        media capture attributes and encoding parameters consistent
3117	        with the rest of the document.

3119	   Changes from 09 to 10:

3121	     1. Several minor clarifications such as about SDP usage, Media
3122	        Captures, Configure message.

3124	     2. Simultaneous Set can be expressed in terms of Capture Scene
3125	        and Capture Scene Entry.

3127	     3. Removed Area of Scene attribute.

3129	     4. Add attributes from draft-groves-clue-capture-attr-01.

3131	     5. Move some of the Media Capture attribute descriptions back
3132	        into this document, but try to leave detailed syntax to the
3133	        data model.  Remove the OUTSOURCE sections, which are already
3134	        incorporated into the data model document.

3136	   Changes from 08 to 09:

3138	     1. Use "document" instead of "memo".

3140	     2. Add basic call flow sequence diagram to introduction.

3142	     3. Add definitions for Advertisement and Configure messages.

3144	     4. Add definitions for Capture and Provider.

3146	     5. Update definition of Capture Scene.

3148	     6. Update definition of Individual Encoding.

3150	     7. Shorten definition of Media Capture and add key points in the
3151	        Media Captures section.

3153	     8. Reword a bit about capture scenes in overview.

3155	     9. Reword about labeling Media Captures.

3157	     10. Remove the Consumer Capability message.

3159	     11. New example section heading for media provider behavior

3161	     12. Clarifications in the Capture Scene section.

3163	     13. Clarifications in the Simultaneous Transmission Set section.

3165	     14. Capitalize defined terms.

3167	     15. Move call flow example from introduction to overview section

3169	     16. General editorial cleanup

3171	     17. Add some editors' notes requesting input on issues

3173	     18. Summarize some sections, and propose details be outsourced
3174	        to other documents.

3176	   Changes from 06 to 07:

3178	     1. Ticket #9.  Rename Axis of Capture Point attribute to Point
3179	        on Line of Capture.  Clarify the description of this
3180	        attribute.

3182	     2. Ticket #17.  Add "capture encoding" definition.  Use this new
3183	        term throughout document as appropriate, replacing some usage
3184	        of the terms "stream" and "encoding".

3186	     3. Ticket #18.  Add Max Capture Encodings media capture
3187	        attribute.

3189	     4. Add clarification that different capture scene entries are
3190	        not necessarily mutually exclusive.

3192	   Changes from 05 to 06:

3194	   1. Capture scene description attribute is a list of text strings,
3195	      each in a different language, rather than just a single string.

3197	   2. Add new Axis of Capture Point attribute.

3199	   3. Remove appendices A.1 through A.6.

3201	   4. Clarify that the provider must use the same coordinate system
3202	      with same scale and origin for all coordinates within the same
3203	      capture scene.

3205	   Changes from 04 to 05:

3207	   1. Clarify limitations of "composed" attribute.

3209	   2. Add new section "capture scene entry attributes" and add the
3210	      attribute "scene-switch-policy".

3212	   3. Add capture scene description attribute and description
3213	      language attribute.

3215	   4. Editorial changes to examples section for consistency with the
3216	      rest of the document.

3218	   Changes from 03 to 04:

3220	   1. Remove sentence from overview - "This constitutes a significant
3221	      change ..."

3223	   2. Clarify a consumer can choose a subset of captures from a
3224	      capture scene entry or a simultaneous set (in section "capture
3225	      scene" and "consumer's choice...").

3227	   3. Reword first paragraph of Media Capture Attributes section.

3229	   4. Clarify a stereo audio capture is different from two mono audio
3230	      captures (description of audio channel format attribute).

3232	   5. Clarify what it means when coordinate information is not
3233	      specified for area of capture, point of capture, area of scene.

3235	   6. Change the term "producer" to "provider" to be consistent (it
3236	      was just in two places).

3238	   7. Change name of "purpose" attribute to "content" and refer to
3239	      RFC4796 for values.

3241	   8. Clarify simultaneous sets are part of a provider advertisement,
3242	      and apply across all capture scenes in the advertisement.

3244	   9. Remove sentence about lip-sync between all media captures in a
3245	      capture scene.

3247	   10.   Combine the concepts of "capture scene" and "capture set"
3248	      into a single concept, using the term "capture scene" to
3249	      replace the previous term "capture set", and eliminating the
3250	      original separate capture scene concept.

3252	   Informative References

3254	   Edt. Note: Decide which of these really are Normative References.

3256	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
3257	              Requirement Levels", BCP 14, RFC 2119, March 1997.

3259	   [RFC3261]  Rosenberg, J., Schulzrinne, H., Camarillo, G.,
3260	   Johnston,
3261	              A., Peterson, J., Sparks, R., Handley, M., and E.
3262	              Schooler, "SIP: Session Initiation Protocol", RFC 3261,
3263	              June 2002.

3265	   [RFC3264]  Rosenberg, J., Schulzrinne, H., "An Offer/Answer Model
3266	              with the Session Description Protocol (SDP)", RFC 3264,
3267	              June 2002.

3269	   [RFC3550]  Schulzrinne, H., Casner, S., Frederick, R., and V.
3270	              Jacobson, "RTP: A Transport Protocol for Real-Time
3271	              Applications", STD 64, RFC 3550, July 2003.

3273	   [RFC4353]  Rosenberg, J., "A Framework for Conferencing with the
3274	              Session Initiation Protocol (SIP)", RFC 4353,
3275	              February 2006.

3277	   [RFC4579]  Johnston, A., Levin, O., "SIP Call Control -
3278	              Conferencing for User Agents", RFC 4579, August 2006

3280	   [RFC5117]  Westerlund, M. and S. Wenger, "RTP Topologies", RFC
3281	   5117,
3282	              January 2008.

3284	17. Authors' Addresses

3286	   Mark Duckworth (editor)
3287	   Polycom
3288	   Andover, MA  01810
3289	   USA
3290	   Email: mark.duckworth@polycom.com

3292	   Andrew Pepperell
3293	   Acano
3294	   Uxbridge, England
3295	   UK

3297	   Email: apeppere@gmail.com

3299	   Stephan Wenger
3300	   Vidyo, Inc.
3301	   433 Hackensack Ave.
3302	   Hackensack, N.J. 07601
3303	   USA

3305	   Email: stewe@stewe.org